AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature
- 20 May 2020
- journal article
- research article
- Published by American Association for the Advancement of Science (AAAS) in Science Translational Medicine
- Vol. 12 (544)
- https://doi.org/10.1126/scitranslmed.aau9113
Abstract
The diagnosis of Mendelian disorders requires labor-intensive literature research. Trained clinicians can spend hours looking for the right publication(s) supporting a single gene that best explains a patient’s disease. AMELIE (Automatic Mendelian Literature Evaluation) greatly accelerates this process. AMELIE parses all 29 million PubMed abstracts and downloads and further parses hundreds of thousands of full-text articles in search of information supporting the causality and associated phenotypes of most published genetic variants. AMELIE then prioritizes patient candidate variants for their likelihood of explaining any patient’s given set of phenotypes. Diagnosis of singleton patients (without relatives’ exomes) is the most time-consuming scenario, and AMELIE ranked the causative gene at the very top for 66% of 215 diagnosed singleton Mendelian patients from the Deciphering Developmental Disorders project. Evaluating only the top 11 AMELIE-scored genes of 127 (median) candidate genes per patient resulted in a rapid diagnosis in more than 90% of cases. AMELIE-based evaluation of all cases was 3 to 19 times more efficient than hand-curated database–based approaches. We replicated these results on a retrospective cohort of clinical cases from Stanford Children’s Health and the Manton Center for Orphan Disease Research. An analysis web portal with our most recent update, programmatic interface, and code is available at AMELIE.stanford.edu.Keywords
Funding Information
- National Human Genome Research Institute (NHGRI U41HG002371-15)
- Stanford University
- Microsoft Research
- Defense Sciences Office, DARPA
This publication has 57 references indexed in Scilit:
- PubTator: a web-based text mining tool for assisting biocurationNucleic Acids Research, 2013
- A framework for variation discovery and genotyping using next-generation DNA sequencing dataNature Genetics, 2011
- Using text to build semantic networks for pharmacogenomicsJournal of Biomedical Informatics, 2010
- Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literatureBioinformatics, 2010
- ANNOVAR: functional annotation of genetic variants from high-throughput sequencing dataNucleic Acids Research, 2010
- Exome sequencing identifies the cause of a mendelian disorderNature Genetics, 2009
- Clinical Diagnostics in Human Genetics with Semantic Similarity Searches in OntologiesAmerican Journal of Human Genetics, 2009
- Identifying gene-disease associations using centrality on a literature mined gene-interaction networkBioinformatics, 2008
- PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolitesNucleic Acids Research, 2008
- UniProt: the Universal Protein knowledgebaseNucleic Acids Research, 2004