Mixed-membership models of scientific publications
Top Cited Papers
- 6 April 2004
- journal article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences of the United States of America
- Vol. 101 (suppl_1), 5220-5227
- https://doi.org/10.1073/pnas.0307760101
Abstract
PNAS is one of world's most cited multidisciplinary scientific journals. The PNAS official classification structure of subjects is reflected in topic labels submitted by the authors of articles, largely related to traditionally established disciplines. These include broad field classifications into physical sciences, biological sciences, social sciences, and further subtopic classifications within the fields. Focusing on biological sciences, we explore an internal soft-classification structure of articles based only on semantic decompositions of abstracts and bibliographies and compare it with the formal discipline classifications. Our model assumes that there is a fixed number of internal categories, each characterized by multinomial distributions over words (in abstracts) and references (in bibliographies). Soft classification for each article is based on proportions of the article's content coming from each category. We discuss the appropriateness of the model for the PNAS database as well as other features of the data relevant to soft classification.Keywords
This publication has 8 references indexed in Scilit:
- Finding scientific topicsProceedings of the National Academy of Sciences of the United States of America, 2004
- Genetic Structure of Human PopulationsScience, 2002
- Unsupervised Learning by Probabilistic Latent Semantic AnalysisMachine Learning, 2001
- Dirichlet Generalizations of Latent-Class ModelsJournal of Classification, 2000
- Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiationProceedings of the National Academy of Sciences of the United States of America, 1999
- Cluster analysis and display of genome-wide expression patternsProceedings of the National Academy of Sciences of the United States of America, 1998
- The PNAS way back thenProceedings of the National Academy of Sciences of the United States of America, 1997
- Mathematical typology: A grade of membership technique for obtaining disease definitionComputers and Biomedical Research, 1978