Scientific Data

Journal Information
ISSN / EISSN : 2052-4463 / 2052-4463
Total articles ≅ 1,761
Current Coverage
Archived in

Latest articles in this journal

Huijuan Xiao, Weichen Zhao, ,
Published: 13 July 2021
Scientific Data, Volume 8, pp 1-10; doi:10.1038/s41597-021-00966-z

Constituent entities which make up Russia have wide-ranging powers and are considered as important policymakers and implementers of climate change mitigation. Formulation of CO2 emission inventories for Russia’s constituent entities is the priority step in achieving emission reduction. Russia is the world’s largest exporter of oil and gas combined and the fourth biggest CO2 emitter, so it’s efforts in mitigating CO2 emissions are globally significant in curbing climate change. However, the existing emission inventories only present national CO2 emissions; the subnational emission details are missing. In addition, the emission factors are not country-specific and energy activity data by fossil energy types and sectors are not sufficiently detailed. In this study, the CO2 emission inventories of Russia and its 82 constituent entities from 2005 to 2019 are constructed. The emission inventories include energy-related emissions with 89 socio-economic sectors and 17 energy types and process-related emissions. The uniformly formatted emission inventories can be a reference for in-depth analysis of emission characteristics and emission-related studies of Russia.
Dijuan Liang, Xi Lu, Minghao Zhuang, Guang Shi, Chengyu Hu, , Jiming Hao
Published: 13 July 2021
Scientific Data, Volume 8, pp 1-10; doi:10.1038/s41597-021-00960-5

China has committed to reaching carbon neutrality by 2060, which will require a drastic cut in greenhouse gas (GHG) emissions from all sectors, including those from agricultural activities. A comprehensive, long-term, and spatially-precise profile of agricultural GHG emissions can help to accurately understand drivers of historical emissions and their implications for future mitigation. This study constructs province-level agricultural GHG emissions in China from 1978 to 2016. It considers primary and secondary emissions from a full range of agricultural activities related to crop farming, including crop residue open burning, rice cultivation, cropland change, cropland emissions, machinery use, nitrogen fertilizer production, and pesticide production. Annual or interpolated activity data from official sources and the latest emission factors available for China were adopted in this study. The data can be used in spatial and temporal analysis of emissions from cropping systems as well as the design of mitigation strategy in China.
Sungmin O.,
Published: 12 July 2021
Scientific Data, Volume 8, pp 1-14; doi:10.1038/s41597-021-00964-1

While soil moisture information is essential for a wide range of hydrologic and climate applications, spatially-continuous soil moisture data is only available from satellite observations or model simulations. Here we present a global, long-term dataset of soil moisture derived through machine learning trained with in-situ measurements, We train a Long Short-Term Memory (LSTM) model to extrapolate daily soil moisture dynamics in space and in time, based on in-situ data collected from more than 1,000 stations across the globe. provides multi-layer soil moisture data (0–10 cm, 10–30 cm, and 30–50 cm) at 0.25° spatial and daily temporal resolution over the period 2000–2019. The performance of the resulting dataset is evaluated through cross validation and inter-comparison with existing soil moisture datasets. performs especially well in terms of temporal dynamics, making it particularly useful for applications requiring time-varying soil moisture, such as anomaly detection and memory analyses. complements the existing suite of modelled and satellite-based datasets given its distinct derivation, to support large-scale hydrological, meteorological, and ecological analyses.
Holly M. Mortensen, Jonathan Senn, Trevor Levey, Phillip Langley, Antony J. Williams
Published: 12 July 2021
Scientific Data, Volume 8, pp 1-9; doi:10.1038/s41597-021-00962-3

The EPA developed the Adverse Outcome Pathway Database (AOP-DB) to better characterize adverse outcomes of toxicological interest that are relevant to human health and the environment. Here we present the most recent version of the EPA Adverse Outcome Pathway Database (AOP-DB), version 2. AOP-DB v.2 introduces several substantial updates, which include automated data pulls from the AOP-Wiki 2.0, the integration of tissue-gene network data, and human AOP-gene data by population, semantic mapping and SPARQL endpoint creation, in addition to the presentation of the first publicly available AOP-DB web user interface. Potential users of the data may investigate specific molecular targets of an AOP, the relation of those gene/protein targets to other AOPs, cross-species, pathway, or disease-AOP relationships, or frequencies of AOP-related functional variants in particular populations, for example. Version updates described herein help inform new testable hypotheses about the etiology and mechanisms underlying adverse outcomes of environmental and toxicological concern.
, M. Cade Lawson, Camila Z. Apablaza
Published: 7 July 2021
Scientific Data, Volume 8, pp 1-7; doi:10.1038/s41597-021-00956-1

Problems of poor network interoperability in electric vehicle (EV) infrastructure, where data about real-time usage or consumption is not easily shared across service providers, has plagued the widespread analysis of energy used for transportation. In this article, we present a high-resolution dataset of real-time EV charging transactions resolved to the nearest second over a one-year period at a multi-site corporate campus. This includes 105 charging stations across 25 different facilities operated by a single firm in the U.S. Department of Energy Workplace Charging Challenge. The high-resolution data has 3,395 real-time transactions and 85 users with both paid and free sessions. The data has been expanded for re-use such as identifying charging behaviour and segmenting user groups by frequency of usage, stage of adoption, and employee type. Potential applications include but are not limited to simulating and parameterizing energy demand models; investigating flexible charge scheduling and optimal power flow problems; characterizing transportation emissions and electric mobility patterns at high temporal resolution; and evaluating characteristics of early adopters and lead user innovation.
, Priscille de Dumast, Hamza Kebiri, Ivan Ezhov, Johannes C. Paetzold, Suprosanna Shit, Asim Iqbal, Romesa Khan, Raimund Kottke, Patrice Grehten, et al.
Published: 6 July 2021
Scientific Data, Volume 8, pp 1-14; doi:10.1038/s41597-021-00946-3

It is critical to quantitatively analyse the developing human fetal brain in order to fully understand neurodevelopment in both normal fetuses and those with congenital disorders. To facilitate this analysis, automatic multi-tissue fetal brain segmentation algorithms are needed, which in turn requires open datasets of segmented fetal brains. Here we introduce a publicly available dataset of 50 manually segmented pathological and non-pathological fetal magnetic resonance brain volume reconstructions across a range of gestational ages (20 to 33 weeks) into 7 different tissue categories (external cerebrospinal fluid, grey matter, white matter, ventricles, cerebellum, deep grey matter, brainstem/spinal cord). In addition, we quantitatively evaluate the accuracy of several automatic multi-tissue segmentation algorithms of the developing human fetal brain. Four research groups participated, submitting a total of 10 algorithms, demonstrating the benefits the dataset for the development of automatic algorithms.
Victor D. Martinez, , , Brenda C. Minatel, Michelle E. Pewarchuk, , E. Magda Price, Wendy P. Robinson,
Published: 2 July 2021
Scientific Data, Volume 8, pp 1-8; doi:10.1038/s41597-021-00948-1

Proper functioning of the human placenta is critical for maternal and fetal health. While microRNAs (miRNAs) are known to impact placental gene expression, the effects of other small non-coding RNAs (sncRNAs) on the placental transcriptome are not well-established, and are emerging topics in the study of environmental influence on fetal development and reproductive health. Here, we assembled a cohort of 30 placental chorionic villi samples of varying gestational ages (M ± SD = 23.7 ± 11.3 weeks) to delineate the human placental sncRNA transcriptome through small RNA sequence analysis. We observed expression of 1544 sncRNAs, which include 48 miRNAs previously unannotated in humans. Additionally, 18,003 miRNA variants (isomiRs) were identified from the 654 observed miRNA species. This characterization of the term and pre-term placental sncRNA transcriptomes provides data fundamental to future investigations of their regulatory functions in the human placenta, and the baseline expression pattern needed for identifying changes in response to environmental factors, or under disease conditions.
Published: 2 July 2021
Scientific Data, Volume 8, pp 1-12; doi:10.1038/s41597-021-00954-3

Owing to the popularization of electric vehicles worldwide and the development of renewable energy supply, Li-ion batteries are widely used from small-scale personal mobile products to large-scale energy storage systems. Recently, the number of retired power batteries has largely increased, causing environmental protection threats and waste of resources. Since most of the retired power batteries still possess about 80% of their initial capacity, their second use becomes a possible route to solve the emergent problem. Safety and performance are important when using these second-use repurposed batteries. Underwriters Laboratories (UL), a global safety certification company, published the standard for evaluating the safety and performance of repurposed batteries, i.e., UL 1974. In this work, the test procedures are designed according to UL 1974, and the charge and discharge profile datasets of the LiFePO4 repurposed batteries are provided. Researchers and engineers can use the characteristic curves to evaluate the quality of the repurposed batteries. Furthermore, the profile datasets can be applied in the model-based engineering of repurposed batteries, e.g., fitting the variables of an empirical model or validating the results of a theoretical model.
Jeffrey Shih-Chieh Chu, Bo Peng, Kuanqiang Tang, Xingxing Yi, Huangkai Zhou, Huan Wang, Guang Li, Jiantian Leng, ,
Published: 1 July 2021
Scientific Data, Volume 8, pp 1-8; doi:10.1038/s41597-021-00947-2

Comparative analysis of multiple reference genomes representing diverse genetic backgrounds is critical for understanding the role of key alleles important in domestication and genetic breeding of important crops such as soybean. To enrich the genetic resources for soybean, we describe the generation, technical assessment, and preliminary genomic variation analysis of eight de novo reference-grade soybean genome assemblies from wild and cultivated accessions. These resources represent soybeans cultured at different latitudes and exhibiting different agronomical traits. Of these eight soybeans, five are from new accessions that have not been sequenced before. We demonstrate the usage of these genomes to identify small and large genomic variations affecting known genes as well as screening for genic PAV regions for identifying candidates for further functional studies.
, Jouni Pulliainen, Matias Takala, Juha Lemmetyinen, Colleen Mortimer, , , Mikko Moisander, , Tuomo Smolander, et al.
Published: 1 July 2021
Scientific Data, Volume 8, pp 1-16; doi:10.1038/s41597-021-00939-2

We describe the Northern Hemisphere terrestrial snow water equivalent (SWE) time series covering 1979–2018, containing daily, monthly and monthly bias-corrected SWE estimates. The GlobSnow v3.0 SWE dataset combines satellite-based passive microwave radiometer data (Nimbus-7 SMMR, DMSP SSM/I and DMSP SSMIS) with ground based synoptic snow depth observations using bayesian data assimilation, incorporating the HUT Snow Emission model. The original GlobSnow SWE retrieval methodology has been further developed and is presented in its current form in this publication. The described GlobSnow v3.0 monthly bias-corrected dataset was applied to provide continental scale estimates on the annual maximum snow mass and its trend during the period 1980 to 2018.
Back to Top Top