z-logo
open-access-imgOpen Access
Impact of imbalanced data on the performance of software defect prediction classifiers
Author(s) -
Lichao Wang,
Wei Wang,
Bingyou Liu,
Shuqiao Geng
Publication year - 2019
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1345/2/022026
Subject(s) - computer science , software , software quality , data mining , software metric , machine learning , measure (data warehouse) , stability (learning theory) , variation (astronomy) , artificial intelligence , software development , programming language , physics , astrophysics
Software defect prediction plays an important role in analysing software quality and balancing software cost. However, it lacks suggestions for project managers and software engineers in selecting classifiers. Firstly, a method for building imbalanced distribution data is proposed. Then, Matthews correlation coefficient is used to measure the performance of different classifiers, and the coefficient of variation is utilised to evaluate the stability of classifiers on imbalanced distribution data. Finally, an experiment is conducted on 8 common classifiers and 12 publicly available and widely used data sets. Results show that NaiveBayes behaves steadily when the imbalance rate of data sets changes significantly. The experimental results provide a basis for project managers and software engineers to select classifiers.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here