z-logo
open-access-imgOpen Access
Ovarian Cancer Prediction Using PCA, K-PCA, ICA and Random Forest
Author(s) -
Asiye Şahin,
Nermin Özcan,
Gökhan Nur
Publication year - 2021
Publication title -
akıllı sistemler ve uygulamaları dergisi
Language(s) - English
Resource type - Journals
ISSN - 2667-6893
DOI - 10.54856/jiswa.202112168
Subject(s) - random forest , ovarian cancer , dimension (graph theory) , cancer , set (abstract data type) , training set , principal component analysis , data set , ovary , computer science , artificial intelligence , oncology , medicine , pattern recognition (psychology) , statistics , mathematics , pure mathematics , programming language
Ovarian cancer, which is the most common in women and occurs mostly in the post-menopausal period, develops with the uncontrolled proliferation of the cells in the ovaries and the formation of tumors. Early diagnosis is very difficult and in most cases, it is a type of cancer that is in advanced stages when first diagnosed. While it tends to be treated successfully in the early stages where it is confined to the ovary, it is more difficult to treat in the advanced stages and is often fatal. For this reason, it has been focused on studies that predict whether people have ovarian cancer. In our study, we designed a RF-based ovarian cancer prediction model using a data set consisting of 49 features including blood routine tests, general chemistry tests and tumor marker data of 349 real patients. Since the data set containing too many dimensions will increase the time and resources that need to be spent, we reduced the dimension of the data with PCA, K-PCA and ICA methods and examined its effect on the result and time saving. The best result was obtained with a score of 0.895 F1 by using the new smaller-sized data obtained by the PCA method, in which the dimension was reduced from 49 to 6, in the RF method, and the training of the model took 18.191 seconds. This result was both better as a success and more economical in terms of time spent during model training compared to the prediction made over larger data with 49 features, where no dimension reduction method was used. The study has shown that in predictions made with machine learning models over large-scale medical data, dimension reduction methods will provide advantages in terms of time and resources by improving the prediction results.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here