z-logo
open-access-imgOpen Access
Random Forest and Novel Under-Sampling Strategy for Data Imbalance in Software Defect Prediction
Author(s) -
Utomo Pujianto
Publication year - 2018
Publication title -
international journal of engineering and technology
Language(s) - English
Resource type - Journals
ISSN - 2227-524X
DOI - 10.14419/ijet.v7i4.15.21368
Subject(s) - random forest , sampling (signal processing) , software , centroid , data mining , statistics , computer science , measure (data warehouse) , systematic sampling , mathematics , artificial intelligence , computer vision , filter (signal processing) , programming language
Data imbalance is one among characteristics of software quality data sets that can have a negative effect on the performance of software defect prediction models. This study proposed an alternative to random under-sampling strategy by using only a subset of non-defective data which have been calculated as having biggest distance value to the centroid of defective data. Combined with random forest classification, the proposed method outperformed both the random under-sampling and non-sampling method on the basis of accuracy, AUC, f-measure, and true positive rate performance measures.  

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here