abstract:73e4c4572953f295.tex

1: \begin{abstract}

2:   This paper studies the problem of parameter learning in probabilistic

3:   graphical models having latent variables, where the standard approach

4:   is the expectation maximization algorithm alternating expectation (E)

5:   and maximization (M) steps. However, both {E} and {M} steps are

6:   computationally intractable for high dimensional data, while the

7:   substitution of one step to a faster surrogate for combating against

8:   intractability can often cause failure in convergence. We propose a

9:   new learning algorithm which is computationally efficient and provably

10:   ensures convergence to a correct optimum from the multi-time-scale

11:   stochastic approximation theory. Its key idea is to run only a few

12:   cycles of Markov Chains (MC) in both {E} and {M} steps. Such an idea

13:   of running `incomplete' MC has been well studied only for {M} step in

14:   the literature, called {\em Contrastive Divergence} (CD)

15:   learning. While such known CD-based schemes find approximated

16:   solutions via the mean-field approach in {E} step, our proposed

17:   algorithm does exact ones via MC algorithms in both

18:   steps. Consequently, the former maximizes an approximation (or lower

19:   bound) of log-likelihood, while the latter does the actual

20:   one. Despite of the theoretical understandings, the proposed scheme

21:   might suffer from the slow mixing of MC in {E} step. To address the

22:   issue, we also propose a hybrid approach adapting both mean-field and

23:   MC approximations in {E} step, and it outperforms the bare mean-field

24:   CD schemes in our experiments on real-world datasets.

25: \end{abstract}