Random Forest and Novel Under-Sampling Strategy for Data Imbalance in Software Defect Prediction | Zendy

Utomo Pujianto | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Random Forest and Novel Under-Sampling Strategy for Data Imbalance in Software Defect Prediction

Author(s) -

Utomo Pujianto

Publication year - 2018

Publication title -

international journal of engineering and technology

Language(s) - English

Resource type - Journals

ISSN - 2227-524X

DOI - 10.14419/ijet.v7i4.15.21368

Subject(s) - random forest , sampling (signal processing) , software , centroid , data mining , statistics , computer science , measure (data warehouse) , systematic sampling , mathematics , artificial intelligence , computer vision , filter (signal processing) , programming language

Data imbalance is one among characteristics of software quality data sets that can have a negative effect on the performance of software defect prediction models. This study proposed an alternative to random under-sampling strategy by using only a subset of non-defective data which have been calculated as having biggest distance value to the centroid of defective data. Combined with random forest classification, the proposed method outperformed both the random under-sampling and non-sampling method on the basis of accuracy, AUC, f-measure, and true positive rate performance measures.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore