Research on improved text classification method based on combined weighted model | Zendy

Wang Yongchang | Zendy; Zhu Ligu | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Research on improved text classification method based on combined weighted model

Author(s) -

Wang Yongchang,

Zhu Ligu

Publication year - 2019

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.5140

Subject(s) - word2vec , tf–idf , computer science , word (group theory) , data mining , artificial intelligence , data pre processing , bag of words model , preprocessor , document classification , statistical classification , pattern recognition (psychology) , information retrieval , machine learning , mathematics , physics , geometry , embedding , quantum mechanics , term (time)

Summary Text classification is very important in information retrieval, but the traditional text classification model has many problems, such as the feature dimension disaster, the lack of semantic features, etc. Aiming at the problems, this paper proposes an improved TFIDF model combined with the Word2vec model for weighing word vectors. In view of the inability of the Word2vec model to distinguish the importance of words with the text, TFIDF is further introduced to weighing Word2vec word vectors to achieve a weighted Word2vec classification model. For data preprocessing, we optimized the traditional StringToWordVector algorithm. The main improvement of StringToWordVector is the introduction to a new algorithm of stem extraction. First, this paper gives a simple description of the basic steps and algorithms of traditional text classification, and then, the ideas and steps of the improved StringToWordVector algorithm are proposed. Finally, experimental results using our improved algorithm are tested for four different data sets (WEBO_SINA and three standard UCI data sets). The experimental results show that the improved StringToWordVector algorithm combined with the combined weighted model has higher classification accuracy, recall, and F1 values than the traditional text classification model only using the Word2vec model or using TFIDF. The experimental results are satisfactory.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research