
A Short Text Classification Method Based on N ‐Gram and CNN
Author(s) -
Wang Haitao,
He Jie,
Zhang Xiaohong,
Liu Shufen
Publication year - 2020
Publication title -
chinese journal of electronics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.267
H-Index - 25
eISSN - 2075-5597
pISSN - 1022-4653
DOI - 10.1049/cje.2020.01.001
Subject(s) - computer science , artificial intelligence , feature (linguistics) , sentence , convolutional neural network , pooling , task (project management) , filter (signal processing) , word (group theory) , natural language processing , representation (politics) , key (lock) , n gram , pattern recognition (psychology) , language model , mathematics , linguistics , philosophy , geometry , management , computer security , politics , political science , law , economics , computer vision
Text classification is a fundamental task in Nature language process (NLP) application. Most existing research work relied on either explicate or implicit text representation to settle this kind of problems, while these techniques work well for sentence and can not simply apply to short text because of its shortness and sparseness feature. Given these facts that obtaining the simple word vector feature and ignoring the important feature by utilizing the traditional multi‐size filter Convolution neural network (CNN) during the course of text classification task, we offer a kind of short text classification model by CNN, which can obtain the abundant text feature by adopting none linear sliding method and N ‐gram language model, and picks out the key features by using the concentration mechanism, in addition employing the pooling operation can preserve the text features at the most certain as far as possible. The experiment shows that this method we offered, comparing the traditional machine learning algorithm and convolutional neural network, can markedly improve the classification result during the short text classification.