Feature engineering for MEDLINE citation categorization with MeSH
Open Access
- 8 April 2015
- journal article
- Published by Springer Science and Business Media LLC in BMC Bioinformatics
- Vol. 16 (1), 113
- https://doi.org/10.1186/s12859-015-0539-7
Abstract
Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representations of biomedical texts as features for reproducing the MeSH annotations of some of the most frequent MeSH headings. In addition to unigrams and bigrams, these features include noun phrases, citation meta-data, citation structure, and semantic annotation of the citations.Keywords
This publication has 35 references indexed in Scilit:
- GeneRIF indexing: sentence selection based on machine learningBMC Bioinformatics, 2013
- Using cited references to improve the retrieval of related biomedical documentsBMC Bioinformatics, 2013
- Evaluating the use of different positional strategies for sentence selection in biomedical literature summarizationBMC Bioinformatics, 2013
- A semantic graph-based approach to biomedical summarisationArtificial Intelligence in Medicine, 2011
- MEDRank: Using graph-based concept ranking to index biomedical textsInternational Journal of Medical Informatics, 2011
- A retrospective cohort study of structured abstracts in MEDLINE, 1992–2006Journal of the Medical Library Association, 2011
- Automatic inference of indexing rules for MEDLINEBMC Bioinformatics, 2008
- Optimal Training Sets for Bayesian Prediction of MeSH(R) AssignmentJournal of the American Medical Informatics Association, 2008
- PubMed related articles: a probabilistic topic-based model for content similarityBMC Bioinformatics, 2007
- Using argumentation to retrieve articles with similar citations: An inquiry into improving related articles search in the MEDLINE digital libraryInternational Journal of Medical Informatics, 2006