Spam text classification using LSTM Recurrent Neural Network | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Spam text classification using LSTM Recurrent Neural Network

Publication year - 2021

Publication title -

international journal of emerging trends in engineering research

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.218

H-Index - 14

ISSN - 2347-3983

DOI - 10.30534/ijeter/2021/11992021

Subject(s) - artificial intelligence , computer science , recurrent neural network , support vector machine , naive bayes classifier , machine learning , class (philosophy) , convolutional neural network , f1 score , false positive paradox , set (abstract data type) , artificial neural network , pattern recognition (psychology) , natural language processing , programming language

Sequence Classification is one of the on-demand research projects in the field of Natural Language Processing (NLP). Classifying a set of images or text into an appropriate category or class is a complex task that a lot of Machine Learning (ML) models fail to accomplish accurately and end up under-fitting the given dataset. Some of the ML algorithms used in text classification are KNN, Naïve Bayes, Support Vector Machines, Convolutional Neural Networks (CNNs), Recursive CNNs, Recurrent Neural Networks (RNNs), Long Short Term Memory (LSTM), etc. For this experimental study, LSTM and a few other algorithms were chosen for a more comparative study. The dataset used is the SMS Spam Collection Dataset from Kaggle and 150 more entries were additionally added from different sources. Two possible class labels for the data points are spam and ham. Each entry consists of the class label, a few sentences of text followed by a few useless features that are eliminated. After converting the text to the required format, the models are run and then evaluated using various metrics. In experimental studies, the LSTM gives much better classification accuracy than the other machine learning models. F1-Scores in the high nineties were achieved using LSTM for classifying the text. The other models showed very low F1-Scores and Cosine Similarities indicating that they had underperformed on the dataset. Another interesting observation is that the LSTM had reduced the number of false positives and false negatives than any other model.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore