data:image/s3,"s3://crabby-images/2c3fd/2c3fd2c05ec175716150fd2054ac6d9c19b5c66f" alt="open-access-img"
Low-Cost Machine Learning for Effective and Efficient Bad Smells Detection
Author(s) -
José Solenir Lima Figuerêdo,
V. T. Sarinho,
Rodrigo Tripodi Calumby
Publication year - 2021
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5753/kdmile.2021.17468
Subject(s) - computer science , machine learning , artificial intelligence , feature selection , preprocessor , code smell , random forest , naive bayes classifier , source code , selection (genetic algorithm) , model selection , set (abstract data type) , code (set theory) , software , software quality , support vector machine , software development , operating system , programming language
Bad smells are characteristics of software that indicate a code or design problem which can make information system hard to understand, evolve, and maintain. To address this problem, different approaches, manual and automated, have been proposed over the years, including more recently machine learning alternatives. However, despite the advances achieved, some machine learning techniques have not yet been effectively explored, such as the use of feature selection techniques. Moreover, it is not clear to what extent the use of numerous source-code features are necessary for reasonable bad smell detection success. Therefore, in this work we propose an approach using low-cost machine learning for effective and efficient detection of bad smells, through explicit feature selection. Our results showed that the selection allowed to statistically improve the effectiveness of the models. For some cases, the models achieved statistical equivalence, but relying on a highly reduced set of features. Indeed, by using explicit feature selection, simpler models such as Naive Bayes became statistically equivalent to robust models such as Random Forest. Therefore, the selection of features allowed keeping competitive or even superior effectiveness while also improving the efficiency of the models, demanding less computational resources for source-code preprocessing, model training and bad smell detection.