TACKLING IMBALANCED CLASS IN SOFTWARE DEFECT PREDICTION USING TWO-STEP CLUSTER BASED RANDOM UNDERSAMPLING AND STACKING TECHNIQUE | Zendy

Adi Wijaya | Zendy; Romi Satria Wahono | Zendy

Open Access

TACKLING IMBALANCED CLASS IN SOFTWARE DEFECT PREDICTION USING TWO-STEP CLUSTER BASED RANDOM UNDERSAMPLING AND STACKING TECHNIQUE

Author(s) -

Adi Wijaya,

Romi Satria Wahono

Publication year - 2017

Publication title -

jurnal teknologi

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.191

H-Index - 22

eISSN - 2180-3722

pISSN - 0127-9696

DOI - 10.11113/jt.v79.11874

Subject(s) - undersampling , computer science , naive bayes classifier , data mining , stacking , software , decision tree , random forest , software bug , cluster (spacecraft) , machine learning , artificial intelligence , position (finance) , support vector machine , physics , nuclear magnetic resonance , finance , economics , programming language

The cost of finding and correcting the software defects are high and increases exponentially in the software development. The software defect prediction (SDP) can be used in the early phases to reduce the testing and maintenance time, cost and effort; thus, improves the quality of the software. SDP performance is poor caused by imbalanced class in datasets where defective modules as minority compared to defect-free ones. In this study, we propose the combination of random undersampling based on two-step cluster and stacking technique for improving the accuracy of SDP. In stacking technique, Decision Tree, Logistic Regression and k-Nearest Neighbor are used as base learner while Naive Bayes as stacking model learner. The proposed method is evaluated using nine datasets from NASA metrics data program repository and area under curve (AUC) as main evaluation. Results have indicated that the proposed method yield excellent performance for 5 of 9 datasets (AUC > 0.9). Compared to the prior researches, the proposed method has first position for 3 datasets, second position for 5 datasets and only 1 dataset in third position for AUC value comparison. Therefore, it can be concluded that the proposed method has an impressive and promising result in prediction performance for most datasets compared with prior research performance.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research