Premium
Non‐negative sparse‐based SemiBoost for software defect prediction
Author(s) -
Wang Tiejian,
Zhang Zhiwu,
Jing Xiaoyuan,
Liu Yanli
Publication year - 2016
Publication title -
software testing, verification and reliability
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.216
H-Index - 49
eISSN - 1099-1689
pISSN - 0960-0833
DOI - 10.1002/stvr.1610
Subject(s) - computer science , machine learning , boosting (machine learning) , artificial intelligence , software , exploit , software bug , data mining , ensemble learning , class (philosophy) , similarity (geometry) , cluster analysis , pattern recognition (psychology) , computer security , image (mathematics) , programming language
Summary Software defect prediction is an important decision support activity in software quality assurance. The limitation of the labelled modules usually makes the prediction difficult, and the class‐imbalance characteristic of software defect data leads to negative influence on decision of classifiers. Semi‐supervised learning can build high‐performance classifiers by using large amount of unlabelled modules together with the labelled modules. Ensemble learning achieves a better prediction capability for class‐imbalance data by using a series of weak classifiers to reduce the bias generated by the majority class. In this paper, we propose a new semi‐supervised software defect prediction approach, non‐negative sparse‐based SemiBoost learning. The approach is capable of exploiting both labelled and unlabelled data and is formulated in a boosting framework. In order to enhance the prediction ability, we design a flexible non‐negative sparse similarity matrix, which can fully exploit the similarity of historical data by incorporating the non‐negativity constraint into sparse learning for better learning the latent clustering relationship among software modules. The widely used datasets from NASA projects are employed as test data to evaluate the performance of all compared methods. Experimental results show that non‐negative sparse‐based SemiBoost learning outperforms several representative state‐of‐the‐art semi‐supervised software defect prediction methods. Copyright © 2016 John Wiley & Sons, Ltd.