
Cancer Classification using Ensemble Feature Selection and Random Forest Classifier
Author(s) -
Nimrita Koul,
Sunilkumar S. Manvi
Publication year - 2021
Publication title -
iop conference series. materials science and engineering
Language(s) - English
Resource type - Journals
eISSN - 1757-899X
pISSN - 1757-8981
DOI - 10.1088/1757-899x/1074/1/012004
Subject(s) - random forest , feature selection , classifier (uml) , gene selection , computer science , ensemble learning , microarray analysis techniques , artificial intelligence , data mining , pattern recognition (psychology) , random subspace method , selection (genetic algorithm) , gene , computational biology , gene expression , biology , genetics
High volumes of genomic data made available by high through put gene expression sequencing technologies like next generation sequencing, microarray gene expression data have made it possible to develop models to computationally analyse this data and infer meaningful insights like presence of a disease, nature of disease, place of localization of the tumour in cancers etc. Since gene expression data is very high dimensional, each gene stands for one dimension, and has very small number of observations, it is imperative to apply feature selection on the data before using it for classification task. In this paper, we have proposed a method for classification of human cancer types by analysis of microarray gene expression data. We have used an ensemble feature selection algorithm for selecting subsets of 5, 10, 20 and 30 genes and applied random forest classifiers to obtain the classification accuracy and other performance parameters for comparison with existing solutions. We have been able to obtain 100% classification accuracy with just 5 genes on colon cancer data set with our algorithm.