A Text Classification Model via Multi-Level Semantic Features
Open Access
- 17 September 2022
- Vol. 14 (9), 1938
- https://doi.org/10.3390/sym14091938
Abstract
Text classification is a major task of NLP (Natural Language Processing) and has been the focus of attention for years. News classification as a branch of text classification is characterized by complex structure, large amounts of information and long text length, which in turn leads to a decrease in the accuracy of classification. To improve the classification accuracy of Chinese news texts, we present a text classification model based on multi-level semantic features. First, we add the category correlation coefficient to TF-IDF (Term Frequency-Inverse Document Frequency) and the frequency concentration coefficient to CHI (Chi-Square), and extract the keyword semantic features with the improved algorithm. Then, we extract local semantic features with TextCNN with symmetric-channel and global semantic information from a BiLSTM with attention. Finally, we fuse the three semantic features for the prediction of text categories. The results of experiments on THUCNews, LTNews and MCNews show that our presented method is highly accurate, with 98.01%, 90.95% and 94.24% accuracy, respectively. With model parameters two magnitudes smaller than Bert, the improvements relative to the baseline Bert+FC are 1.27%, 1.2%, and 2.81%, respectively.Keywords
Funding Information
- Basic Public Welfare Research Project of Zhejiang Province (LGG22F020014)
- National Natural Science Foundation of China (62072410)
This publication has 29 references indexed in Scilit:
- Densely Connected Convolutional NetworksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2017
- LSTM network: a deep learning approach for short‐term traffic forecastIET Intelligent Transport Systems, 2017
- LSTM: A Search Space OdysseyIEEE Transactions on Neural Networks and Learning Systems, 2016
- Deep Residual Learning for Image RecognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- Attention-Based Bidirectional Long Short-Term Memory Networks for Relation ClassificationPublished by Association for Computational Linguistics (ACL) ,2016
- Glove: Global Vectors for Word RepresentationPublished by Association for Computational Linguistics (ACL) ,2014
- Supervised Sequence Labelling with Recurrent Neural NetworksPublished by Springer Science and Business Media LLC ,2012
- Long Short-Term MemoryNeural Computation, 1997
- Karl Pearson and the Chi-Squared TestInternational Statistical Review, 1983
- A STATISTICAL INTERPRETATION OF TERM SPECIFICITY AND ITS APPLICATION IN RETRIEVALJournal of Documentation, 1972