Premium
Modeling considerations for using expression data from multiple species
Author(s) -
Siewert Elizabeth,
Kechris Katerina J.
Publication year - 2013
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.5850
Subject(s) - bayesian probability , multivariate statistics , regression , linear regression , expression (computer science) , computer science , linear model , term (time) , bayesian multivariate linear regression , regression analysis , statistics , bayesian inference , bayesian hierarchical modeling , bayesian linear regression , mathematics , artificial intelligence , physics , quantum mechanics , programming language
Although genome‐wide expression data sets from multiple species are now more commonly generated, there have been few studies on how to best integrate this type of correlated data into models. Starting with a single‐species, linear regression model that predicts transcription factor binding sites as a case study, we investigated how best to take into account the correlated expression data when extending this model to multiple species. Using a multivariate regression model, we accounted for the phylogenetic relationships among the species in two ways: (i) a repeated‐measures model, where the error term is constrained; and (ii) a Bayesian hierarchical model, where the prior distributions of the regression coefficients are constrained. We show that both multiple‐species models improve predictive performance over the single‐species model. When compared with each other, the repeated‐measures model outperformed the Bayesian model. We suggest a possible explanation for the better performance of the model with the constrained error term. Copyright © 2013 John Wiley & Sons, Ltd.