An Effective Text Classification Model Based on Ensemble Strategy | Zendy

Hong Zhu | Zendy; Jin Wenzhen | Zendy; Yang Guocai | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

An Effective Text Classification Model Based on Ensemble Strategy

Author(s) -

Hong Zhu,

Jin Wenzhen,

Yang Guocai

Publication year - 2019

Publication title -

journal of physics conference series

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.21

H-Index - 85

eISSN - 1742-6596

pISSN - 1742-6588

DOI - 10.1088/1742-6596/1229/1/012058

Subject(s) - word2vec , computer science , artificial intelligence , feature (linguistics) , feature vector , support vector machine , representation (politics) , pattern recognition (psychology) , tf–idf , feature learning , convolutional neural network , bag of words model , classifier (uml) , machine learning , philosophy , linguistics , physics , embedding , quantum mechanics , politics , term (time) , political science , law

Automatic text classification is a classic topic for natural language processing. Text classification research mainly focuses on feature representation of text documents or designing an efficient machine learning model. Although various approaches have been proposed to address these problems, they are still far from being solved. In this paper, we proposed a novel method called LAC_DNN to achieve the text classification based on diverse feature representation approaches and classifiers. More specifically, LAC_DNN firstly introduces a novel feature representation approach called LATW to extract feature information of the documents, which integrates the feature information extracted by LSI model, TF-IDF weighted vector space model (TF-IDF_VSM), TF-IDF weighted word2vec (TF-IDF_word2vec) and average word2vec (Avg_word2vec), respectively. Secondly, it trains different classifiers including support vector machine, k nearest neighbor, logistic regression and convolutional neural networks based on the feature encoded by LATW. Finally, LAC_DNN integrates these classifiers into an ensemble predictor to leverage complimentary information of feature representation methods and classifiers, and predict the topic of text documents. LAC_DNN achieves superior performance with accuracy of 97.44% and 97.43% on the text datasets of Fudan and Netease news, respectively. Extensive experiments show that LAC_DNN is prominent and useful for text classification.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research