The Spike-and-Slab Lasso Generalized Linear Models for Prediction and Associated Genes Detection | Zendy

Zaixiang Tang | Zendy; Yueping Shen | Zendy; Xinyan Zhang | Zendy; Nengjun Yi | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

The Spike-and-Slab Lasso Generalized Linear Models for Prediction and Associated Genes Detection

Author(s) -

Zaixiang Tang,

Yueping Shen,

Xinyan Zhang,

Nengjun Yi

Publication year - 2016

Publication title -

genetics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 2.792

H-Index - 246

eISSN - 1943-2631

pISSN - 0016-6731

DOI - 10.1534/genetics.116.192195

Subject(s) - lasso (programming language) , bayesian probability , computer science , data set , feature selection , algorithm , scale (ratio) , set (abstract data type) , data mining , biology , computational biology , pattern recognition (psychology) , artificial intelligence , physics , quantum mechanics , world wide web , programming language

Large-scale "omics" data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, there are considerable challenges in analyzing high-dimensional molecular data, including the large number of potential molecular predictors, limited number of samples, and small effect of each predictor. We propose new Bayesian hierarchical generalized linear models, called spike-and-slab lasso GLMs, for prognostic prediction and detection of associated genes using large-scale molecular data. The proposed model employs a spike-and-slab mixture double-exponential prior for coefficients that can induce weak shrinkage on large coefficients, and strong shrinkage on irrelevant coefficients. We have developed a fast and stable algorithm to fit large-scale hierarchal GLMs by incorporating expectation-maximization (EM) steps into the fast cyclic coordinate descent algorithm. The proposed approach integrates nice features of two popular methods, i.e., penalized lasso and Bayesian spike-and-slab variable selection. The performance of the proposed method is assessed via extensive simulation studies. The results show that the proposed approach can provide not only more accurate estimates of the parameters, but also better prediction. We demonstrate the proposed procedure on two cancer data sets: a well-known breast cancer data set consisting of 295 tumors, and expression data of 4919 genes; and the ovarian cancer data set from TCGA with 362 tumors, and expression data of 5336 genes. Our analyses show that the proposed procedure can generate powerful models for predicting outcomes and detecting associated genes. The methods have been implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/).

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research