z-logo
open-access-imgOpen Access
Chinese spam filtering based on Stacked Denoising Autoencoders
Author(s) -
Liuyan Zhang,
Yiming Nie,
Shengyue Duan
Publication year - 2019
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1237/3/032012
Subject(s) - softmax function , artificial intelligence , pattern recognition (psychology) , computer science , autoencoder , noise reduction , feature selection , robustness (evolution) , feature vector , deep learning , biochemistry , chemistry , gene
Aimed at the problem that the traditional feature selection method extracts the feature items and the filtering accuracy is degraded in the Chinese spam filtering process, this paper proposes a Chinese spam filtering method based on Stacked Denoising Autoencoder (SDA). Firstly, use the Continuous Bag-of-Words (CBOW) model training the Word2vec tool set for the processed corpus to transform word segments into vectors; the inputs are the word vectors; then apply the Stacked Denoising Autoencoder (SDA) to effectively extract the features text in unsupervised learning. Finally, the improved softmax classifier is used for regression classification. The test was carried out on the TREC06C dataset, the experimental results show that compared with Bayesian model, KNN classification algorithm and traditional Stacked Denoising Autoencoder, the accuracy, precision, recall, and f1 score of the method reached 93.5%, 94.8%, 92% and 93.2%, and had better dichotomous effect and robustness in the application.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here