z-logo
Premium
Weighted k ‐nearest neighbor based data complexity metrics for imbalanced datasets
Author(s) -
Singh Deepika,
Gosain Anjana,
Saha Anju
Publication year - 2020
Publication title -
statistical analysis and data mining: the asa data science journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.381
H-Index - 33
eISSN - 1932-1872
pISSN - 1932-1864
DOI - 10.1002/sam.11463
Subject(s) - computer science , classifier (uml) , k nearest neighbors algorithm , data mining , artificial intelligence , machine learning , pattern recognition (psychology)
Empirical behavior of a classifier depends strongly on the characteristics of the underlying imbalanced dataset; therefore, an analysis of intrinsic data complexity would appear to be vital in order to choose classifiers suitable for particular problems. Data complexity metrics (CMs), a fairly recent proposal, identify dataset features which imply some difficulty for the classification task and identify relationships with classifier accuracy. In this paper, we introduce two CMs for imbalanced datasets, which help in explaining the factors responsible for the deterioration in classifier performance. These metrics are based on the weighted k ‐nearest neighbors approach. The experiments are performed in MATLAB software using 48 simulated datasets and 22 real‐world datasets for different choices of neighborhood size k considered as 3, 5, 7, 9, 11. The results help to illustrate the usefulness of the proposed metrics.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here