z-logo
open-access-imgOpen Access
Processing Intrusion Data with Machine Learning and MapReduce
Author(s) -
Csaba Brunner
Publication year - 2017
Publication title -
academic and applied research in military and public management science
Language(s) - English
Resource type - Journals
eISSN - 2786-0744
pISSN - 2498-5392
DOI - 10.32565/aarms.2017.1.4
Subject(s) - computer science , machine learning , operand , artificial intelligence , intrusion detection system , key (lock) , decision tree , data mining , algorithm , operating system
These past years, cyber-attacks became a daily issue for enterprises. A possible defence against this kind of threat is intrusion detection. One of the key challenges is information extraction from this large amount of logged data. My paper aims to identify cyber-attack types as patterns in log files using advanced parallel computing approach and machine learning techniques. The MapReduce programming model is applied to parallel computing, while decision tree algorithms are used from machine learning.I discuss two research questions in this paper. First, despite parallelization, are machine learning algorithms still able to provide results with acceptable accuracy measured by traditional data mining figures (accuracy, precision, recall, area under receiver operand characteristic [ROC] curve [AUC])? Second, is it possible to achieve significant performance improvement by measuring runtime execution of the algorithm by introducing several measurement points?I proved that the machine learning model with two categories in the target variable is preferred to the one having five categories. The average performance improvement was 4–5 times faster for the whole algorithm compared to a single core solution. I achieved most of these improvements during the data transfer phase.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here