AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature

20 May 2020

journal article
research article
Published by American Association for the Advancement of Science (AAAS) in Science Translational Medicine

Vol. 12 (544)
https://doi.org/10.1126/scitranslmed.aau9113

Abstract

The diagnosis of Mendelian disorders requires labor-intensive literature research. Trained clinicians can spend hours looking for the right publication(s) supporting a single gene that best explains a patient’s disease. AMELIE (Automatic Mendelian Literature Evaluation) greatly accelerates this process. AMELIE parses all 29 million PubMed abstracts and downloads and further parses hundreds of thousands of full-text articles in search of information supporting the causality and associated phenotypes of most published genetic variants. AMELIE then prioritizes patient candidate variants for their likelihood of explaining any patient’s given set of phenotypes. Diagnosis of singleton patients (without relatives’ exomes) is the most time-consuming scenario, and AMELIE ranked the causative gene at the very top for 66% of 215 diagnosed singleton Mendelian patients from the Deciphering Developmental Disorders project. Evaluating only the top 11 AMELIE-scored genes of 127 (median) candidate genes per patient resulted in a rapid diagnosis in more than 90% of cases. AMELIE-based evaluation of all cases was 3 to 19 times more efficient than hand-curated database–based approaches. We replicated these results on a retrospective cohort of clinical cases from Stanford Children’s Health and the Manton Center for Orphan Disease Research. An analysis web portal with our most recent update, programmatic interface, and code is available at AMELIE.stanford.edu.

Keywords

Funding Information

National Human Genome Research Institute (NHGRI U41HG002371-15)
Stanford University
Microsoft Research
Defense Sciences Office, DARPA

This publication has 57 references indexed in Scilit:

PubTator: a web-based text mining tool for assisting biocuration
Nucleic Acids Research, 2013
A framework for variation discovery and genotyping using next-generation DNA sequencing data
Nature Genetics, 2011
Using text to build semantic networks for pharmacogenomics
Journal of Biomedical Informatics, 2010
Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature
Bioinformatics, 2010
ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data
Nucleic Acids Research, 2010
Exome sequencing identifies the cause of a mendelian disorder
Nature Genetics, 2009
Clinical Diagnostics in Human Genetics with Semantic Similarity Searches in Ontologies
American Journal of Human Genetics, 2009
Identifying gene-disease associations using centrality on a literature mined gene-interaction network
Bioinformatics, 2008
PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites
Nucleic Acids Research, 2008
UniProt: the Universal Protein knowledgebase
Nucleic Acids Research, 2004

Cited by 58 articles