
Evading obscure communication from spam emails
Author(s) -
Khan Farhan Rafat,
AUTHOR_ID,
Qin Xin,
Abdul Rehman Javed,
Zunera Jalil,
Rana Zeeshan Ahmad,
AUTHOR_ID,
AUTHOR_ID
Publication year - 2021
Publication title -
mathematical biosciences and engineering
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.451
H-Index - 45
eISSN - 1551-0018
pISSN - 1547-1063
DOI - 10.3934/mbe.2022091
Subject(s) - computer science , artificial intelligence , recall , machine learning , offensive , bag of words model , speech recognition , natural language processing , philosophy , linguistics , management , economics
Spam is any form of annoying and unsought digital communication sent in bulk and may contain offensive content feasting viruses and cyber-attacks. The voluminous increase in spam has necessitated developing more reliable and vigorous artificial intelligence-based anti-spam filters. Besides text, an email sometimes contains multimedia content such as audio, video, and images. However, text-centric email spam filtering employing text classification techniques remains today's preferred choice. In this paper, we show that text pre-processing techniques nullify the detection of malicious contents in an obscure communication framework. We use Spamassassin corpus with and without text pre-processing and examined it using machine learning (ML) and deep learning (DL) algorithms to classify these as ham or spam emails. The proposed DL-based approach consistently outperforms ML models. In the first stage, using pre-processing techniques, the long-short-term memory (LSTM) model achieves the highest results of 93.46% precision, 96.81% recall, and 95% F1-score. In the second stage, without using pre-processing techniques, LSTM achieves the best results of 95.26% precision, 97.18% recall, and 96% F1-score. Results show the supremacy of DL algorithms over the standard ones in filtering spam. However, the effects are unsatisfactory for detecting encrypted communication for both forms of ML algorithms.