Premium
Selecting best predictors from large software repositories for highly accurate software effort estimation
Author(s) -
Tariq Sidra,
Usman Muhammad,
Fong Alvis C.M.
Publication year - 2020
Publication title -
journal of software: evolution and process
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.371
H-Index - 29
eISSN - 2047-7481
pISSN - 2047-7473
DOI - 10.1002/smr.2271
Subject(s) - computer science , feature selection , machine learning , software , preprocessor , data pre processing , data mining , artificial intelligence , software development , task (project management) , scheduling (production processes) , data science , systems engineering , engineering , operations management , economics , programming language
Accurate prediction of software effort is important for planning, scheduling, and allocating resources. However, software effort estimation has been a challenging task. Although numerous estimation models have been proposed, few achieve anything close to accurate prediction of software development effort. To achieve optimal results, machine learning techniques have recently been employed for predicting software development effort using relatively large software repositories. However, some issues remain unresolved, and this paper aims to address the following issues. First, feature selection methods often neglected the information rich variables present in the dataset. Second, selection of important features was done through statistical methods, which lack domain knowledge. Third, missing values in the data that significantly influence the prediction outcome was not efficiently handled. Fourth, majority of the literature neglected advanced evaluation measures, which thoroughly evaluate the ability of learning models to produce accurate results. To address the above issues, a machine learning‐based model has been proposed in this paper, which not only allows effective preprocessing of data but also provides highly accurate prediction results with minimum error rate. The purpose is to best identify attributes (predictors) from large software repositories that are most influential in the estimation of effort. In addition, we apply MMRE for better performance analysis.