z-logo
Premium
Weighted ReliefF with threshold constraints of feature selection for imbalanced data classification
Author(s) -
Song Yan,
Si Weiyun,
Dai Feifan,
Yang Guisong
Publication year - 2020
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.5691
Subject(s) - feature selection , computer science , redundancy (engineering) , feature (linguistics) , data mining , artificial intelligence , class (philosophy) , minimum redundancy feature selection , selection (genetic algorithm) , pattern recognition (psychology) , machine learning , philosophy , linguistics , operating system
Summary Feature selection is a useful method for fulfilling the data classification since the inherent heterogeneity of data and the redundancy of features are often encountered in the current data exploding era. Some commonly used feature selection algorithms, which include but are not limited to Pearson, maximal information coefficient, and ReliefF, are well‐posed under the assumption that instances are distributed homogenously in datasets. However, such an assumption might be not true in the practice. As such, in the presence of data imbalance, these traditional feature selection algorithms might be invalid due to their prejudices to the minority class, which includes few samples. The purpose of the addressed problem in this article is to develop an effective feature selection algorithm for imbalanced judicial datasets, which is capable of extracting essential features while deleting negligible ones according to the practical feature requirements. To achieve this goal, the number and the distribution of samples in each class are fully taken into consideration for the correlation analysis. Compared with the traditional feature selection algorithms, the proposed improved ReliefF algorithm is equipped with: (i) different weights of features according to the characteristics of heterogeneous samples in different classes; (ii) justice for imbalanced datasets; and (iii) threshold constraints resulting from the practical feature requirements. Finally, experiments on a judicial dataset and six public datasets well illustrate the effectiveness and the superiority of the proposed feature selection algorithm in improving the classification accuracy for imbalanced datasets.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here