Predicting Characteristics Associated with Breast Cancer Survival Using Multiple Machine Learning Approaches
Author(s) -
Mohammad Nazmul Haque,
Tahia Tazin,
Mohammad Monirujjaman Khan,
Shahla Faisal,
Sobhee Md. Ibraheem,
Haneen Algethami,
Faris A. Almalki
Publication year - 2022
Publication title -
computational and mathematical methods in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.462
H-Index - 48
eISSN - 1748-6718
pISSN - 1748-670X
DOI - 10.1155/2022/1249692
Subject(s) - random forest , breast cancer , logistic regression , machine learning , decision tree , artificial intelligence , medicine , support vector machine , adaboost , population , oncology , cancer , computer science , environmental health
Breast cancer is one of the most commonly diagnosed female disorders globally. Numerous studies have been conducted to predict survival markers, although the majority of these analyses were conducted using simple statistical techniques. In lieu of that, this research employed machine learning approaches to develop models for identifying and visualizing relevant prognostic indications of breast cancer survival rates. A comprehensive hospital-based breast cancer dataset was collected from the National Cancer Institute’s SEER Program’s November 2017 update, which offers population-based cancer statistics. The dataset included female patients diagnosed between 2006 and 2010 with infiltrating duct and lobular carcinoma breast cancer (SEER primary cites recode NOS histology codes 8522/3). The dataset included nine predictor factors and one predictor variable that were linked to the patients’ survival status (alive or dead). To identify important prognostic markers associated with breast cancer survival rates, prediction models were constructed using K -nearest neighbor (K-NN), decision tree (DT), gradient boosting (GB), random forest (RF), AdaBoost, logistic regression (LR), voting classifier, and support vector machine (SVM). All methods yielded close results in terms of model accuracy and calibration measures, with the lowest achieved from logistic regression ( accuracy = 80.57 percent ) and the greatest acquired from the random forest ( accuracy = 94.64 percent ). Notably, the multiple machine learning algorithms utilized in this research achieved high accuracy, suggesting that these approaches might be used as alternative prognostic tools in breast cancer survival studies, especially in the Asian area.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom