Premium
Defect prediction as a multiobjective optimization problem
Author(s) -
Canfora Gerardo,
Lucia Andrea De,
Penta Massimiliano Di,
Oliveto Rocco,
Panichella Annibale,
Panichella Sebastiano
Publication year - 2015
Publication title -
software testing, verification and reliability
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.216
H-Index - 49
eISSN - 1099-1689
pISSN - 0960-0833
DOI - 10.1002/stvr.1570
Subject(s) - computer science , proxy (statistics) , ranking (information retrieval) , machine learning , multi objective optimization , genetic algorithm , genetic programming , code (set theory) , artificial intelligence , data mining , mathematical optimization , mathematics , set (abstract data type) , programming language
Summary In this paper, we formalize the defect‐prediction problem as a multiobjective optimization problem. Specifically, we propose an approach, coined as multiobjective defect predictor (MODEP), based on multiobjective forms of machine learning techniques—logistic regression and decision trees specifically—trained using a genetic algorithm. The multiobjective approach allows software engineers to choose predictors achieving a specific compromise between the number of likely defect‐prone classes or the number of defects that the analysis would likely discover (effectiveness), and lines of code to be analysed/tested (which can be considered as a proxy of the cost of code inspection). Results of an empirical evaluation on 10 datasets from the PROMISE repository indicate the quantitative superiority of MODEP with respect to single‐objective predictors, and with respect to trivial baseline ranking classes by size in ascending or descending order. Also, MODEP outperforms an alternative approach for cross‐project prediction, based on local prediction upon clusters of similar classes. Copyright © 2015 John Wiley & Sons, Ltd.