Premium
Bayesian non‐parametric hidden Markov models with applications in genomics
Author(s) -
Yau C.,
Papaspiliopoulos O.,
Roberts G. O.,
Holmes C.
Publication year - 2011
Publication title -
journal of the royal statistical society: series b (statistical methodology)
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 6.523
H-Index - 137
eISSN - 1467-9868
pISSN - 1369-7412
DOI - 10.1111/j.1467-9868.2010.00756.x
Subject(s) - markov chain monte carlo , dirichlet process , computer science , inference , bayesian probability , robustness (evolution) , markov chain , variable order bayesian network , parametric statistics , gibbs sampling , hierarchical dirichlet process , hidden markov model , bayesian inference , machine learning , artificial intelligence , mathematics , statistics , topic model , latent dirichlet allocation , biochemistry , chemistry , gene
Summary. We propose a flexible non‐parametric specification of the emission distribution in hidden Markov models and we introduce a novel methodology for carrying out the computations. Whereas current approaches use a finite mixture model, we argue in favour of an infinite mixture model given by a mixture of Dirichlet processes. The computational framework is based on auxiliary variable representations of the Dirichlet process and consists of a forward–backward Gibbs sampling algorithm of similar complexity to that used in the analysis of parametric hidden Markov models. The algorithm involves analytic marginalizations of latent variables to improve the mixing, facilitated by exchangeability properties of the Dirichlet process that we uncover in the paper. A by‐product of this work is an efficient Gibbs sampler for learning Dirichlet process hierarchical models. We test the Monte Carlo algorithm proposed against a wide variety of alternatives and find significant advantages. We also investigate by simulations the sensitivity of the proposed model to prior specification and data‐generating mechanisms. We apply our methodology to the analysis of genomic copy number variation. Analysing various real data sets we find significantly more accurate inference compared with state of the art hidden Markov models which use finite mixture emission distributions.