Premium
Linear models for cost‐sensitive classification
Author(s) -
Pendharkar Parag C.
Publication year - 2015
Publication title -
expert systems
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.365
H-Index - 38
eISSN - 1468-0394
pISSN - 0266-4720
DOI - 10.1111/exsy.12114
Subject(s) - computer science , support vector machine , heuristic , linear discriminant analysis , machine learning , artificial intelligence , integer programming , linear programming , data mining , discriminant , pattern recognition (psychology) , algorithm
Abstract In this paper, we investigate the performance of statistical, mathematical programming and heuristic linear models for cost‐sensitive classification. In particular, we use five cost‐sensitive techniques including Fisher's discriminant analysis (DA), asymmetric misclassification cost mixed integer programming (AMC‐MIP), cost‐sensitive support vector machine (CS‐SVM), a hybrid support vector machine and mixed integer programming (SVMIP) and heuristic cost‐sensitive genetic algorithm (CGA) techniques. Using simulated datasets of varying group overlaps, data distributions and class biases, and real‐world datasets from financial and medical domains, we compare the performances of our five techniques based on overall holdout sample misclassification cost. The results of our experiments on simulated datasets indicate that when group overlap is low and data distribution is exponential, DA appears to provide superior performance. For all other situations with simulated datasets, CS‐SVM provides superior performance. In case of real‐world datasets from financial domain, CGA and AMC‐MIP hold a slight edge over the two SVM‐based classifiers. However, for medical domains with mixed continuous and discrete attributes, SVM classifiers perform better than heuristic (CGA) and AMC‐MIP classifiers. The SVMIP model is the most computationally inefficient model and poor performing model.