
Data Mining and Principal Component Analysis on Coimbra Breast Cancer Dataset
Author(s) -
Anupam Sen
Publication year - 2021
Publication title -
aijr proceedings
Language(s) - English
Resource type - Conference proceedings
ISSN - 2582-3922
DOI - 10.21467/proceedings.115.5
Subject(s) - boosting (machine learning) , principal component analysis , artificial intelligence , computer science , mean squared error , statistic , machine learning , feature selection , feature extraction , cohen's kappa , gradient boosting , cross validation , pattern recognition (psychology) , data mining , mathematics , statistics , random forest
Machine Learning (ML) techniques play an important role in the medical field. Early diagnosis is required to improve the treatment of carcinoma. During this analysis Breast Cancer Coimbra dataset (BCCD) with ten predictors are analyzed to classify carcinoma. In this paper method for feature selection and Machine learning algorithms are applied to the dataset from the UCI repository. WEKA (“Waikato Environment for Knowledge Analysis”) tool is used for machine learning techniques. In this paper Principal Component Analysis (PCA) is used for feature extraction. Different Machine Learning classification algorithms are applied through WEKA such as Glmnet, Gbm, ada Boosting, Adabag Boosting, C50, Cforest, DcSVM, fnn, Ksvm, Node Harvest compares the accuracy and also compare values such as Kappa statistic, Mean Absolute Error (MAE), Root Mean Square Error (RMSE). Here the 10-fold cross validation method is used for training, testing and validation purposes.