
Software Defect Prediction Using AWEIG+ADACOST Bayesian Algorithm for Handling High Dimensional Data and Class Imbalance Problem
Author(s) -
Joko Suntoro,
Febrian Wahyu Christanto,
Henny Indriyawati
Publication year - 2018
Publication title -
international journal of information technology and business
Language(s) - English
Resource type - Journals
eISSN - 2655-9293
pISSN - 2655-495X
DOI - 10.24246/ijiteb.112018.36-41
Subject(s) - computer science , software , naive bayes classifier , data mining , algorithm , machine learning , bayesian probability , artificial intelligence , cluster analysis , software bug , bayes' theorem , support vector machine , programming language
The most important part in software engineering is a software defect prediction. Software defect prediction is defined as a software prediction process from errors, failures, and system errors. Machine learning methods are used by researchers to predict software defects including estimation, association, classification, clustering, and datasets analysis. Datasets of NASA Metrics Data Program (NASA MDP) is one of the metric software that researchers use to predict software defects. NASA MDP datasets contain unbalanced classes and high dimensional data, so they will affect the classification evaluation results to be low. In this research, data with unbalanced classes will be solved by the AdaCost method and high dimensional data will be handled with the Average Weight Information Gain (AWEIG) method, while the classification method that will be used is the Naïve Bayes algorithm. The proposed method is named AWEIG + AdaCost Bayesian. In this experiment, the AWEIG + AdaCost Bayesian algorithm is compared to the Naïve Bayesian algorithm. The results showed the mean of Area Under the Curve (AUC) algorithm AWEIG + AdaCost Bayesian yields better than just a Naïve Bayes algorithm with respectively mean of AUC values are 0.752 and 0.696.