Automated extraction of information in molecular biology
Open Access
- 26 June 2000
- journal article
- review article
- Published by Wiley in FEBS Letters
- Vol. 476 (1-2), 12-17
- https://doi.org/10.1016/s0014-5793(00)01661-6
Abstract
We review data mining techniques in molecular biology, specifically those that extract information from the scientific literature itself. As more of the biological literature is published electronically, there is an opportunity, and even a need, to automatically summarize the literature in a customized way, for example by associating keywords to a topic. These keywords can be extracted from relevant publications. The process of keyword extraction can be automated and optimized to keep literature pointers automatically up‐to‐date or to filter relevant information from the literature. To illustrate these points, OMIM (Online Mendelian Inheritance in Man), a database of human inherited diseases, was linked to the literature and keywords were derived that covered distinct aspects such as genetic information on the one hand and disease‐specific protein and phenotypic information on the other. They were used to extract information that is helpful for keeping entries about disease up‐to‐date.Keywords
This publication has 11 references indexed in Scilit:
- Gene Ontology: tool for the unification of biologyNature Genetics, 2000
- A novel method for automatic functional annotation of proteins.Bioinformatics, 1999
- Information extraction: beyond document retrievalJournal of Documentation, 1998
- Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families.Bioinformatics, 1998
- GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support.Bioinformatics, 1998
- Information retrieval: Still butting heads with natural language processing?Lecture Notes in Computer Science, 1997
- Information extractionCommunications of the ACM, 1996
- Natural language processing for information retrievalCommunications of the ACM, 1996
- RUBRIC: A System for Rule-Based Information RetrievalIEEE Transactions on Software Engineering, 1985
- A practical stemming algorithm for online search assistanceOnline Review, 1983