Hybrid self-optimized clustering model based on citation links and textual features to detect research topics
Open Access
- 27 October 2017
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 12 (10), e0187164
- https://doi.org/10.1371/journal.pone.0187164
Abstract
The challenge of detecting research topics in a specific research field has attracted attention from researchers in the bibliometrics community. In this study, to solve two problems of clustering papers, i.e., the influence of different distributions of citation links and involved textual features on similarity computation, the authors propose a hybrid self-optimized clustering model to detect research topics by extending the hybrid clustering model to identify “core documents”. First, the Amsler network, consisting of bibliographic coupling and co-citation links, is created to calculate the citation-based similarity based on the cosine angle of papers. Second, the cosine similarity is also used to compute the text-based similarity, which consists of the textual statistical and topological features. Then, the cosine angle of the linear combination of citation- and text-based similarity is considered as the hybrid similarity. Finally, the Louvain method is applied to cluster papers, and the terms based on term frequency are used to label clusters. To test the performance of the proposed model, a dataset related to the data envelopment analysis field is used for comparison and analysis of clustering results. Based on the benchmark built, different clustering methods with different citation links or textual features are compared according to evaluation measures. The results show that the proposed model can obtain reasonable and effective clustering results, and the research topics of data envelopment analysis field are also analyzed based on the proposed model. As different features are considered in the proposed model compared with previous hybrid clustering models, the proposed clustering model can provide inspiration for further studies on topic identification by other researchers.Funding Information
- National Natural Science Foundation of China (51375429)
- National Natural Science Foundation of China (51475410)
- Natural Science Foundation of Zhejiang Province (LY17E050010)
- Natural Science Foundation of Zhejiang Province (LY17G010007)
- Zhejiang Science & Technology Plan of China (2015C33024)
This publication has 48 references indexed in Scilit:
- Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity ApproachesPLOS ONE, 2011
- Weighted hybrid clustering by combining text mining and bibliometrics on a large‐scale journal databaseJournal of the American Society for Information Science and Technology, 2010
- Enhanced soft subspace clustering integrating within-cluster and between-cluster informationPattern Recognition, 2010
- Document–document similarity approaches and science mapping: Experimental comparison of five approachesJournal of Informetrics, 2009
- Fast unfolding of communities in large networksJournal of Statistical Mechanics: Theory and Experiment, 2008
- Modularity and community structure in networksProceedings of the National Academy of Sciences of the United States of America, 2006
- Link‐based similarity measures for the classification of Web documentsJournal of the American Society for Information Science and Technology, 2005
- Fast algorithm for detecting community structure in networksPhysical Review E, 2004
- Co‐citation in the scientific literature: A new measure of the relationship between two documentsJournal of the American Society for Information Science, 1973
- Objective Criteria for the Evaluation of Clustering MethodsJournal of the American Statistical Association, 1971