
Novel Approach to Choosing Principal Components Number in Logistic Regression
Author(s) -
Borislava Vrigazova
Publication year - 2021
Publication title -
proceedings of the entrenova - enterprise research innovation conference
Language(s) - English
Resource type - Journals
eISSN - 2706-4735
pISSN - 1849-7950
DOI - 10.54820/pucr5250
Subject(s) - principal component analysis , selection (genetic algorithm) , variance (accounting) , logistic regression , computer science , artificial intelligence , principal (computer security) , statistics , data mining , machine learning , mathematics , accounting , business , operating system
The confirmed approach to choosing the number of principal components for prediction models includes exploring the contribution of each principal component to the total variance of the target variable. A combination of possible important principal components can be chosen to explain a big part of the variance in the target. Sometimes several combinations of principal components should be explored to achieve the highest accuracy in classification. This research proposes a novel automatic way of deciding how many principal components should be retained to improve classification accuracy. We do that by combining principal components with the ANOVA selection. To improve the accuracy resulting from our automatic approach, we use the bootstrap procedure for model selection. We call this procedure the Bootstrapped-ANOVA PCA selection. Our results suggest that this procedure can automate the principal components selection and improve the accuracy of classification models, in our example, the logistic regression.This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.