Data Mining Technology Application in False Text Information Recognition
Author(s) -
Jie Wan,
Xue Feng Cao,
Kun Yao,
Donghui Yang,
E Peng,
Yong Cao
Publication year - 2021
Publication title -
mobile information systems
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.346
H-Index - 34
eISSN - 1875-905X
pISSN - 1574-017X
DOI - 10.1155/2021/4206424
Subject(s) - computer science , support vector machine , feature selection , harm , classifier (uml) , artificial intelligence , data mining , the internet , feature (linguistics) , information retrieval , feature vector , machine learning , world wide web , philosophy , linguistics , political science , law
False information on the Internet is being heralded as serious social harm to our society. To recognize false text information, in this paper, an effective method for mining text features is proposed in the field of false drug advertisements. Firstly, the data of false drug advertisements and real drug advertisements were collected from the official websites to build a database of false and real drug advertisements. Secondly, by performing feature extraction on the text of drug advertisements, this work built a characteristic matrix based on the effective features and assigned positive or negative labels to the feature vector of the matrix according to whether it is a fake medical advertisement or not. Thirdly, this study trained and tested several different classifiers, selected the classification model with the best performance in identifying false drug advertisements, and found the key characteristics that can determine the classification. Finally, the model with the best performance was used to predict new false drug advertisements collected from Sina Weibo. In the case of identifying false drug advertisements, the classification effect of the support vector machine (SVM) classifier established on the feature set after feature selection was the most effective. The findings of this study can provide an effective method for the government to identify and combat false advertisements. This study has a certain reference significance in demonstrating the use of text data mining technology to identify and detect information fraud behavior.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom