A Kernel-Based Multivariate Feature Selection Method for Microarray Data Classification | Zendy

Shiquan Sun | Zendy; Qinke Peng | Zendy; Adnan Shakoor | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A Kernel-Based Multivariate Feature Selection Method for Microarray Data Classification

Author(s) -

Shiquan Sun,

Qinke Peng,

Adnan Shakoor

Publication year - 2014

Publication title -

plos one

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.99

H-Index - 332

ISSN - 1932-6203

DOI - 10.1371/journal.pone.0102541

Subject(s) - computer science , overfitting , artificial intelligence , feature selection , pattern recognition (psychology) , multivariate statistics , support vector machine , kernel method , kernel (algebra) , linear discriminant analysis , machine learning , curse of dimensionality , data mining , mathematics , artificial neural network , combinatorics

High dimensionality and small sample sizes, and their inherent risk of overfitting, pose great challenges for constructing efficient classifiers in microarray data classification. Therefore a feature selection technique should be conducted prior to data classification to enhance prediction performance. In general, filter methods can be considered as principal or auxiliary selection mechanism because of their simplicity, scalability, and low computational complexity. However, a series of trivial examples show that filter methods result in less accurate performance because they ignore the dependencies of features. Although few publications have devoted their attention to reveal the relationship of features by multivariate-based methods, these methods describe relationships among features only by linear methods. While simple linear combination relationship restrict the improvement in performance. In this paper, we used kernel method to discover inherent nonlinear correlations among features as well as between feature and target. Moreover, the number of orthogonal components was determined by kernel Fishers linear discriminant analysis (FLDA) in a self-adaptive manner rather than by manual parameter settings. In order to reveal the effectiveness of our method we performed several experiments and compared the results between our method and other competitive multivariate-based features selectors. In our comparison, we used two classifiers (support vector machine,-nearest neighbor) on two group datasets, namely two-class and multi-class datasets. Experimental results demonstrate that the performance of our method is better than others, especially on three hard-classify datasets, namely Wang's Breast Cancer, Gordon's Lung Adenocarcinoma and Pomeroy's Medulloblastoma.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research