A Hybridized Feature Selection and Extraction Approach for Enhancing Cancer Prediction Based on DNA Methylation
Author(s) -
Abeer A. Raweh,
Mohammed Nassef,
Amr Badr
Publication year - 2018
Publication title -
ieee access
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.587
H-Index - 127
ISSN - 2169-3536
DOI - 10.1109/access.2018.2812734
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Due to the vital role of the aberrant DNA methylation during the disease development such as cancer, the comprehension of its mechanism had become essential in the recent years for early detection and diagnosis. With the advent of the high-throughput technologies, there are still several challenges to achieve the classification process using the DNA methylation data. The high-dimensionality and high-noisiness of the DNA methylation data may lead to the degradation of the prediction accuracy. Thus, it becomes increasingly important in a wide range to employ robust computational tools such as feature selection and extraction methods to extract the informative features amongst thousands of them, and hence improving cancer prediction. By using the DNA methylation degree in promoters and probes regions, this paper aims at predicting cancer with a hybridized approach based on the feature selection and feature extraction techniques. The suggested approach exploits a filter feature selection method called (F-score) to overcome the high-dimensionality problem of the DNA methylation data, and proposes an extraction model which employs the peaks of the mean methylation density, the fast Fourier transform algorithm, and the symmetry between the methylation density of a sample and the mean methylation density of both sample types normal and cancer as novel feature extraction methods, in order to accurate cancer classification and reduce training time. To evaluate the reliability of our approach, The naïve base, random forest, and support vector machine algorithms are introduced to predict different cancer types like: breast, colon, head, kidney, lung, thyroid, and uterine with and without the hybridized approach. The results show that, the classification accuracy improves in all most cases and it also proves the reliability indirectly.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom