Mining e-mail content for author identification forensics
Author(s) -
Olivier De Vel,
Alison Anderson,
Malcolm Corney,
George Mohay
Publication year - 2001
Publication title -
acm sigmod record
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.372
H-Index - 142
eISSN - 1943-5835
pISSN - 0163-5808
DOI - 10.1145/604264.604272
Subject(s) - computer science , identification (biology) , set (abstract data type) , focus (optics) , information retrieval , authorship attribution , electronic mail , data mining , world wide web , natural language processing , botany , physics , optics , biology , programming language
We describe an investigation into e-mail content mining for author identification, or authorship attribution, for the purpose of forensic investigation. We focus our discussion on the ability to discriminate between authors for the case of both aggregated e-mail topics as well as across different e-mail topics. An extended set of e-mail document features including structural characteristics and linguistic patterns were derived and, together with a Support Vector Machine learning algorithm, were used for mining the e-mail content. Experiments using a number of e-mail documents generated by different authors on a set of topics gave promising results for both aggregated and multi-topic author categorisation.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom