An empirical comparison of validation methods for software prediction models | Zendy

Ali Asad | Zendy; Gravino Carmine | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

An empirical comparison of validation methods for software prediction models

Author(s) -

Ali Asad,

Gravino Carmine

Publication year - 2021

Publication title -

journal of software: evolution and process

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.371

H-Index - 29

eISSN - 2047-7481

pISSN - 2047-7473

DOI - 10.1002/smr.2367

Subject(s) - computer science , cross validation , data mining , model validation , software , data validation , predictive modelling , variance (accounting) , machine learning , programming language , data science , accounting , database , business

Model validation methods (e.g., k‐ fold cross‐validation) use historical data to predict how well an estimation technique (e.g., random forest) performs on the current (or future) data. Studies in the contexts of software development effort estimation (SDEE) and software fault prediction (SFP) have used and investigated different model validation methods. However, no conclusive indications to suggest which model validation method has a major impact on the prediction accuracy and stability of estimation techniques. Some studies have investigated model validation methods specific to data about either SDEE or SFP. To the best of our knowledge, there is no study in the literature, which has employed different validation methods both with SDEE and SFP data. The aim of this paper is to consider different methods (10) from the family of cross‐validation (CV) and bootstrap validation methods to identify which one contributes to obtaining a better prediction accuracy for both types of data. We also evaluate which model validation methods allow the estimation techniques to provide stable performances (i.e., with lower variance). To this aim, we present an empirical study involving six datasets from the domain of SDEE and six datasets from the SFP domain. The results reveal that repeated 10‐fold CV with SDEE and optimistic boot with SFP data are the model validation methods that provide a better prediction accuracy in a greater number of experiments than the other model validation methods. Furthermore, a model validation method can improve the prediction accuracy up to 60% with SDEE data and up to 36% when employing SFP data. The analysis also reveals that repeated fivefold CV produces more stable performances when the experiments are repeated on the same data.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore