73e4c4572953f295.tex
1: \begin{abstract}
2:   This paper studies the problem of parameter learning in probabilistic
3:   graphical models having latent variables, where the standard approach
4:   is the expectation maximization algorithm alternating expectation (E)
5:   and maximization (M) steps. However, both {E} and {M} steps are
6:   computationally intractable for high dimensional data, while the
7:   substitution of one step to a faster surrogate for combating against
8:   intractability can often cause failure in convergence. We propose a
9:   new learning algorithm which is computationally efficient and provably
10:   ensures convergence to a correct optimum from the multi-time-scale
11:   stochastic approximation theory. Its key idea is to run only a few
12:   cycles of Markov Chains (MC) in both {E} and {M} steps. Such an idea
13:   of running `incomplete' MC has been well studied only for {M} step in
14:   the literature, called {\em Contrastive Divergence} (CD)
15:   learning. While such known CD-based schemes find approximated
16:   solutions via the mean-field approach in {E} step, our proposed
17:   algorithm does exact ones via MC algorithms in both
18:   steps. Consequently, the former maximizes an approximation (or lower
19:   bound) of log-likelihood, while the latter does the actual
20:   one. Despite of the theoretical understandings, the proposed scheme
21:   might suffer from the slow mixing of MC in {E} step. To address the
22:   issue, we also propose a hybrid approach adapting both mean-field and
23:   MC approximations in {E} step, and it outperforms the bare mean-field
24:   CD schemes in our experiments on real-world datasets.
25: \end{abstract}