Towards A new Spam Filter Based on PV-DM (Paragraph Vector-Distributed Memory Approach)
Author(s) -
Samira Douzi,
Meryem Amar,
Bouabid El Ouahidi,
Hicham Laanaya
Publication year - 2017
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2017.06.130
Subject(s) - computer science , paragraph , bag of words model , filter (signal processing) , representation (politics) , word (group theory) , order (exchange) , presentation (obstetrics) , artificial intelligence , forum spam , natural language processing , spamming , world wide web , spambot , the internet , computer vision , medicine , finance , politics , political science , law , economics , radiology , linguistics , philosophy
The increasing volume of emails has led to the emergence of problems caused by unsolicited email, commonly referred to as Spam. One of the most commonly presentation used in Spam Filter is the BoW (Bag-of-words). However, this approach has a number of weaknesses, mainly the fact that the word order is lost; hence different emails can have the same representation since the same words are used, and it ignores the relationship between words, which can lead to poor performance. This paper proposes a new Spam filter based on PV-DM (Paragraph Vector-Distributed Memory) in order to overcome the limitations of the BoW representation.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom