z-logo
Premium
Empirical Bayes methods in variable selection
Author(s) -
Bar Haim,
Liu Kangyan
Publication year - 2018
Publication title -
wiley interdisciplinary reviews: computational statistics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.693
H-Index - 38
eISSN - 1939-0068
pISSN - 1939-5108
DOI - 10.1002/wics.1455
Subject(s) - bayes' theorem , computer science , feature selection , bayes factor , machine learning , regression analysis , bayesian probability , bayesian linear regression , data mining , big data , naive bayes classifier , model selection , variable (mathematics) , false discovery rate , artificial intelligence , linear regression , bayesian inference , mathematics , support vector machine , mathematical analysis , biochemistry , chemistry , gene
The emergence of technologies that produce massive amounts of data such as gene sequencing and distributed sensor systems, created an urgent need to develop statistical methodologies and computational tools to deal with Big Data. Linear regression, which is arguably among the most commonly used inferential tools in statistics, was not designed with Big Data applications in mind. Modern applications in which the number of predictors is often in the thousands, and may even be in the millions, renders the traditional solution to regression analysis, useless. Thus, before any regression analysis can be done, it is necessary to begin with “variable selection”. The goal of any good variable selection algorithm is to select only the variables which are useful in predicting the outcome. At the same time, such algorithms have to be computationally efficient. The empirical Bayes approach provides a sound statistical framework for developing variable selection methods. Empirical Bayes allows to “borrow strength” across predictors, thus increasing the power to detect significant ones, while controlling the false discovery rate. While the modeling approach is Bayesian, the crucial step in the empirical Bayes approach replaces the potentially cumbersome and slow integration via Monte Carlo simulations with a simple approximation to the posterior distribution of the regression coefficients. This article is categorized under: Statistical Models > Bayesian Models

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here