z-logo
Premium
A split‐and‐merge Bayesian variable selection approach for ultrahigh dimensional regression
Author(s) -
Song Qifan,
Liang Faming
Publication year - 2015
Publication title -
journal of the royal statistical society: series b (statistical methodology)
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 6.523
H-Index - 137
eISSN - 1467-9868
pISSN - 1369-7412
DOI - 10.1111/rssb.12095
Subject(s) - feature selection , lasso (programming language) , elastic net regularization , merge (version control) , computer science , bayesian probability , independence (probability theory) , regression , linear regression , mathematics , algorithm , data mining , artificial intelligence , machine learning , statistics , world wide web , information retrieval
Summary We propose a Bayesian variable selection approach for ultrahigh dimensional linear regression based on the strategy of split and merge. The approach proposed consists of two stages: split the ultrahigh dimensional data set into a number of lower dimensional subsets and select relevant variables from each of the subsets, and aggregate the variables selected from each subset and then select relevant variables from the aggregated data set. Since the approach proposed has an embarrassingly parallel structure, it can be easily implemented in a parallel architecture and applied to big data problems with millions or more of explanatory variables. Under mild conditions, we show that the approach proposed is consistent, i.e. the true explanatory variables can be correctly identified by the approach as the sample size becomes large. Extensive comparisons of the approach proposed have been made with penalized likelihood approaches, such as the lasso, elastic net, sure independence screening and iterative sure independence screening. The numerical results show that the approach proposed generally outperforms penalized likelihood approaches: the models selected by the approach tend to be more sparse and closer to the true model.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here