z-logo
open-access-imgOpen Access
Integrating Biological Knowledge Into Case–Control Analysis Through Iterated Conditional Modes/Medians Algorithm
Author(s) -
Vitara Pungpapong,
Min Zhang,
Dabao Zhang
Publication year - 2020
Publication title -
journal of computational biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.585
H-Index - 95
eISSN - 1557-8666
pISSN - 1066-5277
DOI - 10.1089/cmb.2019.0319
Subject(s) - lasso (programming language) , computer science , markov chain monte carlo , logistic regression , feature selection , algorithm , benchmark (surveying) , machine learning , artificial intelligence , data mining , statistics , mathematics , bayesian probability , geodesy , world wide web , geography
Logistic regression is an effective tool in case-control analysis. With the advanced high throughput technology, a quest to seek a fast and efficient method in fitting high-dimensional logistic regression has gained much interest. An empirical Bayes model for logistic regression is considered in this article. A spike-and-slab prior is used for variable selection purpose, which plays a vital role in building an effective predictive model while making model interpretable. To increase the power of variable selection, we incorporate biological knowledge through the Ising prior. The development of the iterated conditional modes/medians (ICM/M) algorithm is proposed to fit the logistic model that has computational advantage over Markov Chain Monte Carlo (MCMC) algorithms. The implementation of the ICM/M algorithm for both linear and logistic models can be found in R package icmm that is freely available on Comprehensive R Archive Network (CRAN). Simulation studies were carried out to assess the performances of our method, with lasso and adaptive lasso as benchmark. Overall, the simulation studies show that the ICM/M outperform the others in terms of number of false positives and have competitive predictive ability. An application to a real data set from Parkinson's disease study was also carried out for illustration. To identify important variables, our approach provides flexibility to select variables based on local posterior probabilities while controlling false discovery rate at a desired level rather than relying only on regression coefficients.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here