1: \begin{abstract}
2: This paper studies the problem of parameter learning in probabilistic
3: graphical models having latent variables, where the standard approach
4: is the expectation maximization algorithm alternating expectation (E)
5: and maximization (M) steps. However, both {E} and {M} steps are
6: computationally intractable for high dimensional data, while the
7: substitution of one step to a faster surrogate for combating against
8: intractability can often cause failure in convergence. We propose a
9: new learning algorithm which is computationally efficient and provably
10: ensures convergence to a correct optimum from the multi-time-scale
11: stochastic approximation theory. Its key idea is to run only a few
12: cycles of Markov Chains (MC) in both {E} and {M} steps. Such an idea
13: of running `incomplete' MC has been well studied only for {M} step in
14: the literature, called {\em Contrastive Divergence} (CD)
15: learning. While such known CD-based schemes find approximated
16: solutions via the mean-field approach in {E} step, our proposed
17: algorithm does exact ones via MC algorithms in both
18: steps. Consequently, the former maximizes an approximation (or lower
19: bound) of log-likelihood, while the latter does the actual
20: one. Despite of the theoretical understandings, the proposed scheme
21: might suffer from the slow mixing of MC in {E} step. To address the
22: issue, we also propose a hybrid approach adapting both mean-field and
23: MC approximations in {E} step, and it outperforms the bare mean-field
24: CD schemes in our experiments on real-world datasets.
25: \end{abstract}