
Prediction Breast Cancer as Benign or Malignant in Apache Spark Framework
Author(s) -
Wafaa S. Albaldawi,
Rafah M. Almuttairi
Publication year - 2020
Publication title -
iop conference series. materials science and engineering
Language(s) - English
Resource type - Journals
eISSN - 1757-899X
pISSN - 1757-8981
DOI - 10.1088/1757-899x/928/3/032046
Subject(s) - random forest , support vector machine , logistic regression , computer science , breast cancer , classifier (uml) , spark (programming language) , machine learning , data mining , artificial intelligence , cancer , medicine , programming language
There are number of diseases that increase the number of deaths over the world. Breast cancer can be considered as the most common of them. Therefore, there is a need to use classification and others data mining methods to study the health datasets in order to diagnosis and make decisions. In this paper, Support Vector Classifier model, Logistic Regression algorithm, and Random Forest algorithm are conducted on the public available Wisconsin Breast Cancer dataset. The experiment is executed in a Scala environment. Moreover, in single and multi-nodes spark cluster. The results show the high accuracy in Support Vector Classifier model and the low error rate in less time consumed when compared with other studies. The authentication in spark are applied in the application by using shared secrete method.