The use of decision trees for cost‐sensitive classification: an empirical study in software quality prediction | Zendy

Seliya Naeem | Zendy; Khoshgoftaar Taghi M. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

The use of decision trees for cost‐sensitive classification: an empirical study in software quality prediction

Author(s) -

Seliya Naeem,

Khoshgoftaar Taghi M.

Publication year - 2011

Publication title -

wiley interdisciplinary reviews: data mining and knowledge discovery

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.506

H-Index - 47

eISSN - 1942-4795

pISSN - 1942-4787

DOI - 10.1002/widm.38

Subject(s) - computer science , machine learning , decision tree , undersampling , artificial intelligence , random forest , context (archaeology) , weighting , software quality , process (computing) , decision tree learning , data mining , quality (philosophy) , quality assurance , software , software development , engineering , operations management , medicine , paleontology , philosophy , external quality assessment , radiology , epistemology , biology , programming language , operating system

This empirical study investigates two commonly used decision tree classification algorithms in the context of cost‐sensitive learning. A review of the literature shows that the cost‐based performance of a software quality prediction model is usually determined after the model‐training process has been completed. In contrast, we incorporate cost‐sensitive learning during the model‐training process. The C4.5 and Random Forest decision tree algorithms are used to build defect predictors either with, or without, any cost‐sensitive learning technique. The paper investigates six different cost‐sensitive learning techniques: AdaCost, Adc2, Csb2, MetaCost, Weighting, and Random Undersampling (RUS). The data come from case study include 15 software measurement datasets obtained from several high‐assurance systems. In addition, to a unique insight into the cost‐based performance of defection prediction models, this study is one of the first to use misclassification cost as a parameter during the model‐training process. The practical appeal of this research is that it provides a software quality practitioner with a clear process for how to consider (during model training) and analyze (during model evaluation) the cost‐based performance of a defect prediction model. RUS is ranked as the best cost‐sensitive technique among those considered in this study. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 448–459 DOI: 10.1002/widm.38 This article is categorized under: Algorithmic Development > Hierarchies and Trees Technologies > Classification Technologies > Prediction

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore