A resource to explore the discovery of rare diseases and their causative genes
Open Access
- 4 May 2021
- journal article
- research article
- Published by Springer Science and Business Media LLC in Scientific Data
- Vol. 8 (1), 1-8
- https://doi.org/10.1038/s41597-021-00905-y
Abstract
Here, we describe a dataset with information about monogenic, rare diseases with a known genetic background, supplemented with manually extracted provenance for the disease itself and the discovery of the underlying genetic cause. We assembled a collection of 4166 rare monogenic diseases and linked them to 3163 causative genes, annotated with OMIM and Ensembl identifiers and HGNC symbols. The PubMed identifiers of the scientific publications, which for the first time described the rare diseases, and the publications, which found the genes causing the diseases were added using information from OMIM, PubMed, Wikipedia, whonamedit.com, and Google Scholar. The data are available under CC0 license as spreadsheet and as RDF in a semantic model modified from DisGeNET, and was added to Wikidata. This dataset relies on publicly available data and publications with a PubMed identifier, but by our effort to make the data interoperable and linked, we can now analyse this data. Our analysis revealed the timeline of rare disease and causative gene discovery and links them to developments in methods.This publication has 27 references indexed in Scilit:
- The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discoveryJournal of Biomedical Semantics, 2014
- CyTargetLinker: A Cytoscape App to Integrate Regulatory Interactions in Network AnalysisPLOS ONE, 2013
- Libraries, languages of description, and linked data: a Dublin Core perspectiveLibrary Hi Tech, 2012
- Disease gene identification strategies for exome sequencingEuropean Journal of Human Genetics, 2012
- Identifiers.org and MIRIAM Registry: community resources to provide persistent identificationNucleic Acids Research, 2011
- Mendelian Inheritance in Man and Its Online Version, OMIMAmerican Journal of Human Genetics, 2007
- NCI Thesaurus: A semantic model integrating cancer-related clinical and molecular informationJournal of Biomedical Informatics, 2007
- Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction NetworksGenome Research, 2003
- Enzyme Defect Associated with a Sex-Linked Human Neurological Disorder and Excessive Purine SynthesisScience, 1967
- A familial disorder of uric acid metabolism and central nervous system functionThe American Journal of Medicine, 1964