Premium
Software defect prediction with imbalanced distribution by radius‐synthetic minority over‐sampling technique
Author(s) -
Guo Shikai,
Dong Jian,
Li Hui,
Wang Jiahui
Publication year - 2021
Publication title -
journal of software: evolution and process
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.371
H-Index - 29
eISSN - 2047-7481
pISSN - 2047-7473
DOI - 10.1002/smr.2362
Subject(s) - computer science , software , sampling (signal processing) , software quality , task (project management) , software bug , software metric , class (philosophy) , data mining , machine learning , scope (computer science) , artificial intelligence , reliability engineering , software development , engineering , systems engineering , computer vision , filter (signal processing) , programming language
Software defect prediction, which can identify the defect‐prone modules, is an effective technology to ensure the quality of software products. Due to the importance in software maintenance, many learning‐based software defect prediction models are presented in recent years. Actually, the defects usually occupy a very small proportions in software source codes; thus, the imbalanced distributions between defect‐prone modules and non‐defect‐prone modules increase the learning difficulty of the classification task. To address this issue, we present a random over‐sampling mechanism used to generate minority‐class samples from high‐dimensional sampling space to deal with the imbalanced distributions in software defect prediction, in which two constraints are applied to provide a robust way to generate new synthetic samples, that is, scaling the random over‐sampling scope to a reasonable area and distinguishing the majority‐class samples in a critical region. Based on nine open datasets of software projects, we experimentally verify that our presented method is effective on predict the defect‐prone modules, and the effect is superior to the traditional imbalanced processing methods.