
Optimal Classification of Lung Cancer Related Genes using Enhanced reliefF Algorithm and Multiclass Support Vector Machine
Author(s) -
Ashok Kumar Patil,
Siddanagoda S Patil,
M Prabhakar
Publication year - 2019
Publication title -
international journal of innovative technology and exploring engineering
Language(s) - English
Resource type - Journals
ISSN - 2278-3075
DOI - 10.35940/ijitee.j8901.0881019
Subject(s) - support vector machine , computer science , classifier (uml) , artificial intelligence , curse of dimensionality , gene selection , feature selection , multiclass classification , word error rate , machine learning , firefly algorithm , pattern recognition (psychology) , algorithm , data mining , gene , gene expression , biology , genetics , microarray analysis techniques , particle swarm optimization
Currently, the automatic lung cancer classification remains a challenging issue for the researchers, due to noisy gene expression data, high dimensional data, and the small sample size. To address these problems, an enhanced gene selection algorithm and multiclass classifier are developed. In this research, the lung cancer-related genes (GEO IDs: GSE10245, GSE19804, GSE7670, GSE10072, and GSE6044) were collected from Gene Expression Omnibus (GEO) dataset. After acquiring the lung cancer-related genes, gene selection was carried out by using enhanced reliefF algorithm for selecting the optimal genes. In enhanced reliefF gene selection algorithm, earthmover distance measure and firefly optimizer were used instead of Manhattan distance measure for identifying the nearest miss and nearest hit instances, which significantly lessens the “curse of dimensionality” issue. These optimal genes were given as the input for Multiclass Support Vector Machine (MSVM) classifier for classifying the sub-classes of lung cancer. The experimental section showed that the proposed system improved the classification accuracy up to 3-10% related to the existing systems in light of accuracy, False Positive Rate (FPR), error rate, and True Positive Rate (TPR).