abstract:c213b1bb77a0bd2f.tex

1: \begin{abstract}

2:

3: %Communication cost is the performance bottleneck for parallel learning of the latent Dirichlet allocation (LDA) model.

4: %In this paper, we propose a novel communication efficient parallel belief propagation algorithm (CE-PBP) based on Zipf's law to alleviate communication cost.

5: %Following Zipf's law, we suppose that communication frequency for each word in the vocabulary of data sets is different from one another and naturally proportional to its number of occurrence in the whole data sets.

6: %Experimental results show that in this way communication cost is reduced to about 15\% of the general parallel learning algorithms, resulting in high rate of computation expense to communication expense within 1\% performance decrease measured by perplexity. With CE-PBP, it takes about 1/6 of the time needed by traditional parallel Gibbs sampling to convergence to the same perplexity. Since many types of data studied in the physical and social sciences can be approximated follow Zipf's law, the proposed algorithm is expect to be adopted in more general parallel computation fields.

7:

8: This paper presents a novel communication-efficient parallel belief propagation (CE-PBP) algorithm for training latent Dirichlet allocation (LDA).

9: Based on the synchronous belief propagation (BP) algorithm, we first develop a parallel belief propagation (PBP) algorithm on the parallel architecture.

10: Because the extensive communication delay often causes a low efficiency of parallel topic modeling, we further use Zipf's law to reduce the total communication cost in PBP.

11: Extensive experiments on different data sets demonstrate that CE-PBP achieves a higher topic modeling accuracy

12: and reduces more than $80\%$ communication cost than the state-of-the-art parallel Gibbs sampling (PGS) algorithm.

13:

14: \end{abstract}