Weighted  k ‐nearest neighbor based data complexity metrics for imbalanced datasets | Zendy

Singh Deepika | Zendy; Gosain Anjana | Zendy; Saha Anju | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Weighted k ‐nearest neighbor based data complexity metrics for imbalanced datasets

Author(s) -

Singh Deepika,

Gosain Anjana,

Saha Anju

Publication year - 2020

Publication title -

statistical analysis and data mining: the asa data science journal

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.381

H-Index - 33

eISSN - 1932-1872

pISSN - 1932-1864

DOI - 10.1002/sam.11463

Subject(s) - computer science , classifier (uml) , k nearest neighbors algorithm , data mining , artificial intelligence , machine learning , pattern recognition (psychology)

Empirical behavior of a classifier depends strongly on the characteristics of the underlying imbalanced dataset; therefore, an analysis of intrinsic data complexity would appear to be vital in order to choose classifiers suitable for particular problems. Data complexity metrics (CMs), a fairly recent proposal, identify dataset features which imply some difficulty for the classification task and identify relationships with classifier accuracy. In this paper, we introduce two CMs for imbalanced datasets, which help in explaining the factors responsible for the deterioration in classifier performance. These metrics are based on the weighted k ‐nearest neighbors approach. The experiments are performed in MATLAB software using 48 simulated datasets and 22 real‐world datasets for different choices of neighborhood size k considered as 3, 5, 7, 9, 11. The results help to illustrate the usefulness of the proposed metrics.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research