
High dimensional Selection with Interactions for Binary Outcome (HDSI-BO) Algorithm in Classifying Height Indicators Through Social-life and Well-being Factors
Author(s) -
Ziqian Zhuang,
Wei Xu,
Richa Jain
Publication year - 2021
Publication title -
university of toronto journal of public health
Language(s) - English
Resource type - Journals
ISSN - 2563-1454
DOI - 10.33137/utjph.v2i2.36764
Subject(s) - lasso (programming language) , feature selection , hyperparameter , confidence interval , logistic regression , elastic net regularization , selection (genetic algorithm) , binary number , computer science , feature (linguistics) , artificial intelligence , measure (data warehouse) , binary classification , cross validation , machine learning , algorithm , statistics , mathematics , data mining , support vector machine , linguistics , philosophy , arithmetic , world wide web
High dimensional Selection with Interactions for Binary Outcome (HDSI-BO) algorithm can incorporate interaction terms and combine with existing techniques for feature selection. Simulation studies have validated the ability of HDSI-BO to select true features and consequently, improve prediction accuracy compared to standard algorithms. Our goal is to assess the applicability of HDSI-BO in combining different techniques and measure its predictive performance in a real data study of predicting height indicators by social-life and well-being factors.
Methods: HDSI-BO was combined with logistic regression, ridge regression, LASSO, adaptive LASSO, and elastic net. Two-way interaction terms were considered. Hyperparameters used in HDSI-BO were optimized through genetic algorithms with five-fold cross-validation. To measure the performance of feature selection, we fitted final models by logistic regression based on the sets of selected features and used the model’s AUC as a measure. 30 trials were repeated to generate a range of the number of selected features and a 95% confidence interval for AUC.
Results: When combined with all of the above methods, HDSI-BO methods achieved higher final AUC values both in terms of mean and confidence interval. In addition, HDSI-BO methods effectively narrowed down the sets of selected features and interaction terms compared with standard methods.
Conclusion: The HDSI-BO algorithm combines well with multiple standard methods and has comparable or better predictive performance compared with the standard methods. The computational and time complexity of HDSI-BO is higher but still acceptable. Considering AUC as the single metric cannot comprehensively measure the feature selection performance. More effective metrics of performance should be explored for future work.