
Machine Learning Application on Prediction of Male Breast Cancer with PLCO Dataset
Author(s) -
Juntao Li,
Ganesh Mani
Publication year - 2021
Publication title -
journal of student research
Language(s) - English
Resource type - Journals
ISSN - 2167-1907
DOI - 10.47611/jsrhs.v10i3.2199
Subject(s) - decision tree , logistic regression , machine learning , breast cancer , random forest , support vector machine , artificial intelligence , computer science , receiver operating characteristic , prostate cancer , cancer , medicine , oncology
The objective of the paper is to explore and examine the applicability of machine learning models on Male Breast Cancer with PLCO dataset. People who are unaware of the potential danger of getting breast cancer like males would not have the medical awareness beforehand for predictions. Therefore, the PLCO trials dataset consisting of ages, prostate status, marriage status etc. from National Institute of Cancer is used in this research for detection. The main purpose of using PLCO test is to discover the potential risk of getting an Male Breast Cancer (MBC) as soon as possible with low cost and easy collection. It is the rarity of MBC that imposes the threat for males who are unaware of the danger. To explore the relatively most suitable models to use for detecting MBC using non-traditional PLCO test dataset, different existing models including decision tree, random forest, DBSCAN, One Class SVM and so on were used to fit the data. Due to its extremity of imbalance, evaluation comes from the combination of standard accuracy and Area Under the Receiver Operating Characteristics(AUROC) for the overall accuracy of those models mentioned above. K-means and Logistic Regression models performed best with the AUC score of 0.62 and 0.67. Results suggested that more efficient approaches for common male breast cancer diagnosis or more advanced models and algorithms are needed in further study.