z-logo
open-access-imgOpen Access
High dimensional Selection with Interactions for Binary Outcome (HDSI-BO) Algorithm in Classifying Height Indicators Through Social-life and Well-being Factors
Author(s) -
Ziqian Zhuang,
Wei Xu,
Richa Jain
Publication year - 2021
Publication title -
university of toronto journal of public health
Language(s) - English
Resource type - Journals
ISSN - 2563-1454
DOI - 10.33137/utjph.v2i2.36764
Subject(s) - lasso (programming language) , feature selection , hyperparameter , confidence interval , logistic regression , elastic net regularization , selection (genetic algorithm) , binary number , computer science , feature (linguistics) , artificial intelligence , measure (data warehouse) , binary classification , cross validation , machine learning , algorithm , statistics , mathematics , data mining , support vector machine , linguistics , philosophy , arithmetic , world wide web
High dimensional Selection with Interactions for Binary Outcome (HDSI-BO) algorithm can incorporate interaction terms and combine with existing techniques for feature selection. Simulation studies have validated the ability of HDSI-BO to select true features and consequently, improve prediction accuracy compared to standard algorithms. Our goal is to assess the applicability of HDSI-BO in combining different techniques and measure its predictive performance in a real data study of predicting height indicators by social-life and well-being factors. Methods: HDSI-BO was combined with logistic regression, ridge regression, LASSO, adaptive LASSO, and elastic net. Two-way interaction terms were considered. Hyperparameters used in HDSI-BO were optimized through genetic algorithms with five-fold cross-validation. To measure the performance of feature selection, we fitted final models by logistic regression based on the sets of selected features and used the model’s AUC as a measure. 30 trials were repeated to generate a range of the number of selected features and a 95% confidence interval for AUC. Results: When combined with all of the above methods, HDSI-BO methods achieved higher final AUC values both in terms of mean and confidence interval. In addition, HDSI-BO methods effectively narrowed down the sets of selected features and interaction terms compared with standard methods. Conclusion: The HDSI-BO algorithm combines well with multiple standard methods and has comparable or better predictive performance compared with the standard methods. The computational and time complexity of HDSI-BO is higher but still acceptable. Considering AUC as the single metric cannot comprehensively measure the feature selection performance. More effective metrics of performance should be explored for future work.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here