z-logo
Premium
Graph coloring for extracting discriminative genes in cancer data
Author(s) -
Mahfouz Mohamed A.,
Nepomuceno Juan A.
Publication year - 2019
Publication title -
annals of human genetics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.537
H-Index - 77
eISSN - 1469-1809
pISSN - 0003-4800
DOI - 10.1111/ahg.12297
Subject(s) - gene , dimensionality reduction , graph , discriminative model , correlation , microarray analysis techniques , computational biology , computer science , biology , data mining , genetics , mathematics , artificial intelligence , gene expression , theoretical computer science , geometry
Background and objective The major difficulty of the analysis of the input gene expression data in a microarray‐based approach for an automated diagnosis of cancer is the large number of genes (high dimensionality) with many irrelevant genes (noise) compared to the very small number of samples. This research study tackles the dimensionality reduction challenge in this area. Methods This research study introduces a dimension‐reduction technique termed graph coloring approach (GCA) for microarray data‐based cancer classification based on analyzing the absolute correlation between gene–gene pairs and partitioning genes into several hubs using graph coloring. GCA starts by a gene‐selection step in which top relevant genes are selected using a biserial correlation. Each time, a gene from an ordered list of top relevant genes is selected as the hub gene (representative) and redundant genes are added to its group; the process is repeated recursively for the remaining genes. A gene is considered redundant if its absolute correlation with the hub gene is greater than a controlling threshold. A suitable range for the threshold is estimated by computing a percentage graph for the absolute correlation between gene–gene pairs. Each value in the estimated range for the threshold can efficiently produce a new feature subset. Results GCA achieved significant improvement over several existing techniques in terms of higher accuracy and a smaller number of features. Also, genes selected by this method are relevant genes according to the information stored in scientific repositories. Conclusions The proposed dimension‐reduction technique can help biologists accurately predict cancer in several areas of the body.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here