A Text Classification Model via Multi-Level Semantic Features

Open Access

17 September 2022

journal article
research article
Published by MDPI AG in Symmetry

Vol. 14 (9), 1938
https://doi.org/10.3390/sym14091938

Abstract

Text classification is a major task of NLP (Natural Language Processing) and has been the focus of attention for years. News classification as a branch of text classification is characterized by complex structure, large amounts of information and long text length, which in turn leads to a decrease in the accuracy of classification. To improve the classification accuracy of Chinese news texts, we present a text classification model based on multi-level semantic features. First, we add the category correlation coefficient to TF-IDF (Term Frequency-Inverse Document Frequency) and the frequency concentration coefficient to CHI (Chi-Square), and extract the keyword semantic features with the improved algorithm. Then, we extract local semantic features with TextCNN with symmetric-channel and global semantic information from a BiLSTM with attention. Finally, we fuse the three semantic features for the prediction of text categories. The results of experiments on THUCNews, LTNews and MCNews show that our presented method is highly accurate, with 98.01%, 90.95% and 94.24% accuracy, respectively. With model parameters two magnitudes smaller than Bert, the improvements relative to the baseline Bert+FC are 1.27%, 1.2%, and 2.81%, respectively.

Keywords

Funding Information

Basic Public Welfare Research Project of Zhejiang Province (LGG22F020014)
National Natural Science Foundation of China (62072410)

This publication has 29 references indexed in Scilit:

Densely Connected Convolutional Networks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2017
LSTM network: a deep learning approach for short‐term traffic forecast
IET Intelligent Transport Systems, 2017
LSTM: A Search Space Odyssey
IEEE Transactions on Neural Networks and Learning Systems, 2016
Deep Residual Learning for Image Recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification
Published by Association for Computational Linguistics (ACL) ,2016
Glove: Global Vectors for Word Representation
Published by Association for Computational Linguistics (ACL) ,2014
Supervised Sequence Labelling with Recurrent Neural Networks
Published by Springer Science and Business Media LLC ,2012
Long Short-Term Memory
Neural Computation, 1997
Karl Pearson and the Chi-Squared Test
International Statistical Review, 1983
A STATISTICAL INTERPRETATION OF TERM SPECIFICITY AND ITS APPLICATION IN RETRIEVAL
Journal of Documentation, 1972

Cited by 4 articles