Unsupervised learning for MRFs on bipartite graphs
Author(s) -
Boris Flach,
Tomáš Sixta
Publication year - 2013
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5244/c.27.72
Subject(s) - bipartite graph , computer science , unsupervised learning , artificial intelligence , machine learning , natural language processing , theoretical computer science , graph
This paper considers unsupervised (parameter) learning for general MRFs on bipartite graphs. That means that we assume training samples which consist of i.i.d. realisations of the field of visible variables only. The variables indexed by the vertices of the other graph part are assumed to be latent. The corresponding learning task is non-trivial because the (log) likelihood is a non-concave function of the model parameters, and, what is worse, its gradient is not tractable. A common approach is to calculate an approximation of the gradient by applying a stochastic gradient method known as “Persistent Contrastive Divergence” [5]. Another option, discussed in [3], is to marginalise over the latent variables (what can be done up to the unknown partition sum) and to maximise the pseudolikelihood for the resulting higher order model directly. Notice however, that the resulting objective function is then no longer concave. The main contribution of this paper is to introduce an alternative learning approach – a modified EM-algorithm with pseudo-likelihood estimator in the M-step, which is tractable on account of the bipartiteness of the model graph. In principle, such a modified EM-algorithm can be applied for parameter learning of arbitrary MRFs [6]. The resulting algorithm will however remain to be intractable, because so is the computation of the posterior pairwise marginal probabilities in the E-step. It is the bipartiteness of the graph, which ensures that the E-step and the M-step of the EM-algorithm are both tractable, if the maximum likelihood estimator in the M-step is replaced by the pseudo-likelihood estimator. MRFs on bipartite graphs can be described as follows. Let (V,E) be an undirected bipartite graph and V1, V2 denote its parts. Let X be a collection of K1-valued random variables indexed by vertices of V1. That is, X = {Xi | i ∈ V1}, where each Xi is a K1-valued random variable. Similarly, Y denotes a collection of K2-valued random variables indexed by vertices of the second part V2. Both co-domains K1 and K2 are assumed finite. We denote realisations of the random field (X ,Y ) by (x,y), i.e. x : V1→ K1 and y : V2→ K2. The joint p.d. of an MRF on (V,E) can be written as an exponential family (assuming strictly positive probability mass)
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom