Premium
Measurement of data complexity for classification problems with unbalanced data
Author(s) -
Anwar Nafees,
Jones Geoff,
Ganesh Siva
Publication year - 2014
Publication title -
statistical analysis and data mining: the asa data science journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.381
H-Index - 33
eISSN - 1932-1872
pISSN - 1932-1864
DOI - 10.1002/sam.11228
Subject(s) - computer science , measure (data warehouse) , data mining , classifier (uml) , visualization , metric (unit) , data visualization , pattern recognition (psychology) , artificial intelligence , machine learning , operations management , economics
We introduce a complexity measure for classification problems that takes account of deterioration in classifier performance as a result of class imbalance. The measure is based on k ‐nearest neighbors. We explore the choices of k and the distance metric through a simulation study, and illustrate the use of our measure, and related data visualization techniques, with real datasets from the literature.