Duo Bundling Algorithms for Data Preprocessing: Case Study of Breast Cancer Data Prediction
Author(s) -
Janjira Jojan,
Agnart Srivihok
Publication year - 2014
Publication title -
lecture notes on software engineering
Language(s) - English
Resource type - Journals
ISSN - 2301-3559
DOI - 10.7763/lnse.2014.v2.153
Subject(s) - breast cancer , data pre processing , computer science , preprocessor , algorithm , data mining , artificial intelligence , cancer , medicine
—Classification of imbalanced dataset is the most popular and challenged problems for researchers to solve in nowadays. This paper proposed a two-steps approach to improve the quality of class prediction imbalanced breast cancer dataset. The two-steps approach consists of two main techniques: 1) using feature selection techniques to filter out unimportant features from the dataset; and 2) using the over-sampling technique to adjust the size of the minority class to be similar to the size of the majority class. The three different classification algorithms: artificial neural network (MLP), decision tree (C4.5) and Nai ve Bayes, were applied. The classification result indicated that C4.5 was the most suitable to classify this dataset which can give the highest accuracy of 83.80%.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom