z-logo
open-access-imgOpen Access
On the influence of categorical features in ranking anomalies using mixed data
Author(s) -
Mathieu Garchery,
Michael Granitzer
Publication year - 2018
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2018.07.211
Subject(s) - categorical variable , computer science , anomaly detection , benchmarking , ranking (information retrieval) , data mining , artificial intelligence , entropy (arrow of time) , probabilistic logic , machine learning , pattern recognition (psychology) , physics , marketing , quantum mechanics , business
Most unsupervised anomaly ranking approaches are compatible with numeric data only, leading to categorical features often being ignored in practice. Even though some methods address this issue, few support mixed data and the influence of excluding or including categorical attributes has not been studied well yet. In this paper, we take a first step towards considering categorical and numeric attributes jointly for unsupervised anomaly ranking by benchmarking selected methods. We introduce three new approaches: two entropy-based methods based on individual and collective entropy contribution, as well as an extension of Isolation Forest supporting mixed data, and benchmark them against SPAD, a state-of-the-art probabilistic anomaly ranker. We observe that our entropy methods detect very similar anomalies in practice, and these anomalies are mostly globally isolated observations. Both entropy methods are also closely related to SPAD. Our empirical study additionally shows that categorical features can have high impact on anomaly ranking performance and thus should not be blindly ignored.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom