Scientific Data

Journal Information
ISSN / EISSN : 2052-4463 / 2052-4463
Current Publisher: Springer Science and Business Media LLC (10.1038)
Former Publisher:
Total articles ≅ 1,729
Current Coverage
Archived in

Latest articles in this journal

Tashina Petersson, , , Marta Antonelli, Katarzyna Dembska, , Alessandra Varotto,
Scientific Data, Volume 8, pp 1-12; doi:10.1038/s41597-021-00909-8

Informing and engaging citizens to adopt sustainable diets is a key strategy for reducing global environmental impacts of the agricultural and food sectors. In this respect, the first requisite to support citizens and actors of the food sector is to provide them a publicly available, reliable and ready to use synthesis of environmental pressures associated to food commodities. Here we introduce the SU-EATABLE LIFE database, a multilevel database of carbon (CF) and water (WF) footprint values of food commodities, based on a standardized methodology to extract information and assign optimal footprint values and uncertainties to food items, starting from peer-reviewed articles and grey literature. The database and its innovative methodological framework for uncertainty treatment and data quality assurance provides a solid basis for evaluating the impact of dietary shifts on global environmental policies, including climate mitigation through greenhouse gas emission reductions. The database ensures repeatability and further expansion, providing a reliable science-based tool for managers and researcher in the food sector.
, , Guido Lemoine, , , Ian McCallum, Hadi, Florian Kraxner, Frédéric Achard, Steffen Fritz
Scientific Data, Volume 8, pp 1-1; doi:10.1038/s41597-021-00917-8

Scientific Data, Volume 8, pp 1-8; doi:10.1038/s41597-021-00905-y

Here, we describe a dataset with information about monogenic, rare diseases with a known genetic background, supplemented with manually extracted provenance for the disease itself and the discovery of the underlying genetic cause. We assembled a collection of 4166 rare monogenic diseases and linked them to 3163 causative genes, annotated with OMIM and Ensembl identifiers and HGNC symbols. The PubMed identifiers of the scientific publications, which for the first time described the rare diseases, and the publications, which found the genes causing the diseases were added using information from OMIM, PubMed, Wikipedia,, and Google Scholar. The data are available under CC0 license as spreadsheet and as RDF in a semantic model modified from DisGeNET, and was added to Wikidata. This dataset relies on publicly available data and publications with a PubMed identifier, but by our effort to make the data interoperable and linked, we can now analyse this data. Our analysis revealed the timeline of rare disease and causative gene discovery and links them to developments in methods.
Michael Getachew Tadesse, Thomas Wahl
Scientific Data, Volume 8, pp 1-10; doi:10.1038/s41597-021-00906-x

Storm surges are among the deadliest coastal hazards and understanding how they have been affected by climate change and variability in the past is crucial to prepare for the future. However, tide gauge records are often too short to assess trends and perform robust statistical analyses. Here we use a data-driven modeling framework to simulate daily maximum surge values at 882 tide gauge locations across the globe. We use five different atmospheric reanalysis products for the storm surge reconstruction, the longest one going as far back as 1836. The data that we generate can be used, for example, for long-term trend analyses of the storm surge climate and identification of regions where changes in the intensity and/or frequency of storms surges have occurred in the past. It also provides a better basis for robust extreme value analysis, especially for tide gauges where observational records are short. The data are made available for public use through an interactive web-map as well as a public data repository.
, , , Marwan Cheikh Albassatneh, Juan Arroyo, Gianluigi Bacchetta, Francesca Bagnoli, Zoltán Barina, Manuel Cartereau, , et al.
Scientific Data, Volume 8, pp 1-1; doi:10.1038/s41597-021-00911-0

, Milan Kilibarda, Dragutin Protić,
Scientific Data, Volume 8, pp 1-12; doi:10.1038/s41597-021-00901-2

We produced the first daily gridded meteorological dataset at a 1-km spatial resolution across Serbia for 2000–2019, named MeteoSerbia1km. The dataset consists of five daily variables: maximum, minimum and mean temperature, mean sea-level pressure, and total precipitation. In addition to daily summaries, we produced monthly and annual summaries, and daily, monthly, and annual long-term means. Daily gridded data were interpolated using the Random Forest Spatial Interpolation methodology, based on using the nearest observations and distances to them as spatial covariates, together with environmental covariates to make a random forest model. The accuracy of the MeteoSerbia1km daily dataset was assessed using nested 5-fold leave-location-out cross-validation. All temperature variables and sea-level pressure showed high accuracy, although accuracy was lower for total precipitation, due to the discontinuity in its spatial distribution. MeteoSerbia1km was also compared with the E-OBS dataset with a coarser resolution: both datasets showed similar coarse-scale patterns for all daily meteorological variables, except for total precipitation. As a result of its high resolution, MeteoSerbia1km is suitable for further environmental analyses.
, Zijing Dong, , Congyu Liao, Qiuyun Fan, W. Scott Hoge, Boris Keil, , Lawrence L. Wald, , et al.
Scientific Data, Volume 8, pp 1-12; doi:10.1038/s41597-021-00904-z

We present a whole-brain in vivo diffusion MRI (dMRI) dataset acquired at 760 μm isotropic resolution and sampled at 1260 q-space points across 9 two-hour sessions on a single healthy participant. The creation of this benchmark dataset is possible through the synergistic use of advanced acquisition hardware and software including the high-gradient-strength Connectom scanner, a custom-built 64-channel phased-array coil, a personalized motion-robust head stabilizer, a recently developed SNR-efficient dMRI acquisition method, and parallel imaging reconstruction with advanced ghost reduction algorithm. With its unprecedented resolution, SNR and image quality, we envision that this dataset will have a broad range of investigational, educational, and clinical applications that will advance the understanding of human brain structures and connectivity. This comprehensive dataset can also be used as a test bed for new modeling, sub-sampling strategies, denoising and processing algorithms, potentially providing a common testing platform for further development of in vivo high resolution dMRI techniques. Whole brain anatomical T1-weighted and T2-weighted images at submillimeter scale along with field maps are also made available.
Parnian Afshar, Shahin Heidarian, Nastaran Enshaei, Farnoosh Naderkhani, Moezedin Javad Rafiee, , Faranak Babaki Fard, Kaveh Samimi, Konstantinos N. Plataniotis,
Scientific Data, Volume 8, pp 1-8; doi:10.1038/s41597-021-00900-3

Novel Coronavirus (COVID-19) has drastically overwhelmed more than 200 countries affecting millions and claiming almost 2 million lives, since its emergence in late 2019. This highly contagious disease can easily spread, and if not controlled in a timely fashion, can rapidly incapacitate healthcare systems. The current standard diagnosis method, the Reverse Transcription Polymerase Chain Reaction (RT- PCR), is time consuming, and subject to low sensitivity. Chest Radiograph (CXR), the first imaging modality to be used, is readily available and gives immediate results. However, it has notoriously lower sensitivity than Computed Tomography (CT), which can be used efficiently to complement other diagnostic methods. This paper introduces a new COVID-19 CT scan dataset, referred to as COVID-CT-MD, consisting of not only COVID-19 cases, but also healthy and participants infected by Community Acquired Pneumonia (CAP). COVID-CT-MD dataset, which is accompanied with lobe-level, slice-level and patient-level labels, has the potential to facilitate the COVID-19 research, in particular COVID-CT-MD can assist in development of advanced Machine Learning (ML) and Deep Neural Network (DNN) based solutions.
Dheeraj Rathee, , Sujit Roy,
Scientific Data, Volume 8, pp 1-10; doi:10.1038/s41597-021-00899-7

Recent advancements in magnetoencephalography (MEG)-based brain-computer interfaces (BCIs) have shown great potential. However, the performance of current MEG-BCI systems is still inadequate and one of the main reasons for this is the unavailability of open-source MEG-BCI datasets. MEG systems are expensive and hence MEG datasets are not readily available for researchers to develop effective and efficient BCI-related signal processing algorithms. In this work, we release a 306-channel MEG-BCI data recorded at 1KHz sampling frequency during four mental imagery tasks (i.e. hand imagery, feet imagery, subtraction imagery, and word generation imagery). The dataset contains two sessions of MEG recordings performed on separate days from 17 healthy participants using a typical BCI imagery paradigm. The current dataset will be the only publicly available MEG imagery BCI dataset as per our knowledge. The dataset can be used by the scientific community towards the development of novel pattern recognition machine learning methods to detect brain activities related to motor imagery and cognitive imagery tasks using MEG signals.
Back to Top Top