Integrating Biological Knowledge Into Case–Control Analysis Through Iterated Conditional Modes/Medians Algorithm | Zendy

Vitara Pungpapong | Zendy; Min Zhang | Zendy; Dabao Zhang | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Integrating Biological Knowledge Into Case–Control Analysis Through Iterated Conditional Modes/Medians Algorithm

Author(s) -

Vitara Pungpapong,

Min Zhang,

Dabao Zhang

Publication year - 2019

Publication title -

journal of computational biology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.585

H-Index - 95

eISSN - 1557-8666

pISSN - 1066-5277

DOI - 10.1089/cmb.2019.0319

Subject(s) - lasso (programming language) , computer science , markov chain monte carlo , logistic regression , feature selection , algorithm , benchmark (surveying) , machine learning , artificial intelligence , data mining , statistics , mathematics , bayesian probability , geodesy , world wide web , geography

Logistic regression is an effective tool in case-control analysis. With the advanced high throughput technology, a quest to seek a fast and efficient method in fitting high-dimensional logistic regression has gained much interest. An empirical Bayes model for logistic regression is considered in this article. A spike-and-slab prior is used for variable selection purpose, which plays a vital role in building an effective predictive model while making model interpretable. To increase the power of variable selection, we incorporate biological knowledge through the Ising prior. The development of the iterated conditional modes/medians (ICM/M) algorithm is proposed to fit the logistic model that has computational advantage over Markov Chain Monte Carlo (MCMC) algorithms. The implementation of the ICM/M algorithm for both linear and logistic models can be found in R package icmm that is freely available on Comprehensive R Archive Network (CRAN). Simulation studies were carried out to assess the performances of our method, with lasso and adaptive lasso as benchmark. Overall, the simulation studies show that the ICM/M outperform the others in terms of number of false positives and have competitive predictive ability. An application to a real data set from Parkinson's disease study was also carried out for illustration. To identify important variables, our approach provides flexibility to select variables based on local posterior probabilities while controlling false discovery rate at a desired level rather than relying only on regression coefficients.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research