Metrics Based Feature Selection for Software Defect Prediction
Author(s) -
Radityo Adi Nugroho,
Friska Abadi,
Muhammad Faisal,
Rudy Herteno,
Rahmat Ramadhani
Publication year - 2020
Publication title -
jurnal komputasi
Language(s) - English
Resource type - Journals
eISSN - 2541-0350
pISSN - 2541-0296
DOI - 10.23960/komputasi.v8i2.2670
Subject(s) - random forest , naive bayes classifier , feature selection , computer science , machine learning , software metric , artificial intelligence , software , classifier (uml) , metric (unit) , data mining , support vector machine , software quality , software bug , feature (linguistics) , software development , engineering , linguistics , operations management , philosophy , programming language
Nowadays, software is very influential on various sectors of life, both to solve business needs, as well as personal needs. To have a Software with high quality, testing is needed to avoid software defect. Research on software defects involving Machine Learning is currently being carried out by many researchers. This method contains one important step, which is called feature selection. In this study, researchers conducted a feature selection based on the software metric category to determine the level of accuracy of the prediction of software defects by utilizing 13 (thirteen) datasets from NASA MDP namely CM1, JM1, KC1, KC3, KC4, MC1, MC2, MW1, PC1, PC2, PC3, PC4, and PC5. To classify, the researchers involved 5 (five) classifiers, namely Naive Bayes, Decision Trees, Random Forests, K-Nearest Neighbor, and Support Vector Machines. The research result shows that each attribure on software metric categories has effect on each dataset. Naive Bayes Algorithm and Random Forest Algorithm can give better performance than other algorithm in classifieng software defect with feature selection based on metrics. On the other hand, the best metrics category on each classifier algorithm is metric Misc. From average AUC value, it can be concluded that metrics category which can give best performance is metric LoC, followed by metric Misc. Both categories have achieved highest AUC value in Random Forest classifier.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom