A Hybrid Data Mining Technique for Improving the Classification Accuracy of Microarray Data Set
Author(s) -
S. Dash,
Bichitrananda Patra,
B. K. Tripathy
Publication year - 2012
Publication title -
international journal of information engineering and electronic business
Language(s) - English
Resource type - Journals
eISSN - 2074-9023
pISSN - 2074-9031
DOI - 10.5815/ijieeb.2012.02.07
Subject(s) - support vector machine , computer science , artificial intelligence , feature selection , radial basis function , pattern recognition (psychology) , multilayer perceptron , dimensionality reduction , classifier (uml) , perceptron , radial basis function kernel , polynomial kernel , data mining , machine learning , kernel method , artificial neural network
— A major challenge in biomedical studies in recent years has been the classification of gene expression profiles into categories, such as cases and controls. This is done by first training a classifier by using a labeled training set containing labeled samples from the two populations, and then using that classifier to predict the labels of new samples. Such predictions have recently been shown to improve the diagnosis and treatment selection practices for several diseases. This procedure is complicated, however, by the high dimensionality of the data. While microarrays can measure the levels of thousands of genes per sample, case-control microarray studies usually involve no more than several dozen samples. Standard classifiers do not work well in these situations where the number of features (gene expression levels measured in these microarrays) far exceeds the number of samples. Selecting only the features that are most relevant for discriminating between the two categories can help construct better classifiers, in terms of both accuracy and efficiency. This paper provides a comparison between dimension reduction technique, namely Partial Least Squares (PLS)method and a hybrid feature selection scheme, and evaluates the relative performance of four different supervised classification procedures such as Radial Basis Function Network (RBFN), Multilayer Perceptron Network (MLP), Support Vector Machine using Polynomial kernel function(Polynomial- SVM) and Support Vector Machine using RBF kernel function (RBF-SVM) incorporating those methods. Experimental results show that the Partial Least-Squares(PLS) regression method is an appropriate feature selection method and a combined use of different classification and feature selection approaches makes it possible to construct high performance classification models for microarray data.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom