Neighbor Embedding Feature Selected Light Gradient Boosting Classification for Breast Cancer Detection with Gene Expression Data | Zendy

Sanguthevar Rajasekaran | Zendy; S Sathyabama | Zendy

Open Access

Neighbor Embedding Feature Selected Light Gradient Boosting Classification for Breast Cancer Detection with Gene Expression Data

Author(s) -

Sanguthevar Rajasekaran,

S Sathyabama

Publication year - 2019

Publication title -

international journal of innovative technology and exploring engineering

Language(s) - English

Resource type - Journals

ISSN - 2278-3075

DOI - 10.35940/ijitee.k1108.09811s19

Subject(s) - boosting (machine learning) , pattern recognition (psychology) , computer science , breast cancer , artificial intelligence , feature selection , random forest , gradient boosting , data mining , machine learning , cancer , biology , genetics

Breast cancer is one of the most frequently diagnosed cancers among women worldwide. Accurate detection of Breast cancer is essential for providing better treatment and risk minimization of the patients. Recently, the collection of biological data like gene expression, protein sequences, DNA sequences are used due to improvements of accessible data mining techniques to diagnosis the disease at an earlier stage. The current state-of-art methods reported to have certain limitations in their diagnostic capability. In order to improve the breast cancer classification, an efficient technique called Gaussian Kernelized Neighbor Embedding based Light Gradient Boost Classification (GKNE-LGBC) technique is introduced. The GKNE-LGBC technique considers the benchmark microarray dataset and performs two processes such as feature selection and classification for detecting breast cancer using gene expression data. The number of gene and the data are collected from the microarray dataset. After collecting, the Gaussian Kernelized stochastic neighbor embedding algorithm is applied to select the relevant features (i.e. genes) and remove the irrelevant features based on the distance similarity. Next, the classification of the gene expression data is done with the help of steepest descent light gradient boosting algorithm. The boosting algorithm initially constructs’ number of weak learners i.e. bivariate regression tree to classify the input expression data into normal or cancerous with the selected features. Then the weak classifiers are combined into strong by minimizing the training error. This helps to improve breast cancer detection accuracy and minimizes the false positive rate. The experimental evaluation is carried out using gene microarray dataset with various parameters such as breast cancer detection accuracy, false positive rate and breast cancer detection time with a number of genes. The experimental results confirm that the proposed GKNE-LGBC technique accurately identifies breast cancer with higher accuracy, and minimal time complexity as well as false positive rate as compared to the state-of-art- methods.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research