Premium
A new unsupervised feature selection algorithm using similarity‐based feature clustering
Author(s) -
Zhu Xiaoyan,
Wang Yu,
Li Yingbin,
Tan Yonghui,
Wang Guangtao,
Song Qinbao
Publication year - 2019
Publication title -
computational intelligence
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.353
H-Index - 52
eISSN - 1467-8640
pISSN - 0824-7935
DOI - 10.1111/coin.12192
Subject(s) - cluster analysis , pattern recognition (psychology) , feature (linguistics) , feature selection , artificial intelligence , computer science , similarity (geometry) , minimum redundancy feature selection , flame clustering , correlation clustering , data mining , single linkage clustering , feature vector , cure data clustering algorithm , philosophy , linguistics , image (mathematics)
Unsupervised feature selection is an important problem, especially for high‐dimensional data. However, until now, it has been scarcely studied and the existing algorithms cannot provide satisfying performance. Thus, in this paper, we propose a new unsupervised feature selection algorithm using similarity‐based feature clustering, Feature Selection‐based Feature Clustering (FSFC). FSFC removes redundant features according to the results of feature clustering based on feature similarity. First, it clusters the features according to their similarity. A new feature clustering algorithm is proposed, which overcomes the shortcomings of K‐means. Second, it selects a representative feature from each cluster, which contains most interesting information of features in the cluster. The efficiency and effectiveness of FSFC are tested upon real‐world data sets and compared with two representative unsupervised feature selection algorithms, Feature Selection Using Similarity (FSUS) and Multi‐Cluster‐based Feature Selection (MCFS) in terms of runtime, feature compression ratio, and the clustering results of K‐means. The results show that FSFC can not only reduce the feature space in less time, but also significantly improve the clustering performance of K‐means.