
Towards Optimization of Malware Detection using Chi-square Feature Selection on Ensemble Classifiers
Author(s) -
Fadare Oluwaseun Gbenga,
Adetunmbi Adebayo Olusola Prof.,
Oyinloye Oghenerukevwe Eloho,
Mogaji Stephen Alaba
Publication year - 2021
Publication title -
international journal of engineering and advanced technology
Language(s) - English
Resource type - Journals
ISSN - 2249-8958
DOI - 10.35940/ijeat.d2359.0410421
Subject(s) - feature selection , random forest , computer science , naive bayes classifier , ensemble learning , artificial intelligence , decision tree , boosting (machine learning) , malware , pattern recognition (psychology) , machine learning , support vector machine , k nearest neighbors algorithm , information gain ratio , classifier (uml) , gradient boosting , data mining , operating system
The multiplication of malware variations isprobably the greatest problem in PC security and the protectionof information in form of source code against unauthorized accessis a central issue in computer security. In recent times, machinelearning has been extensively researched for malware detectionand ensemble technique has been established to be highly effectivein terms of detection accuracy. This paper proposes a frameworkthat combines combining the exploit of both Chi-square as thefeature selection method and eight ensemble learning classifierson five base learners- K-Nearest Neighbors, Naïve Bayes, SupportVector Machine, Decision Trees, and Logistic Regression.K-Nearest Neighbors returns the highest accuracy of 95.37%,87.89% on chi-square, and without feature selection respectively.Extreme Gradient Boosting Classifier ensemble accuracy is thehighest with 97.407%, 91.72% with Chi-square as featureselection, and ensemble methods without feature selectionrespectively. Extreme Gradient Boosting Classifier and RandomForest are leading in the seven evaluative measures of chi-squareas a feature selection method and ensemble methods withoutfeature selection respectively. The study results show that thetree-based ensemble model is compelling for malwareclassification.