abstract:67b1a9f0ea330a86.tex

1: \begin{abstract}

2: In the probabilistic topic models, the quantity of interest---a low-rank matrix consisting of topic vectors---is hidden in the text corpus matrix,  masked by noise,  and Singular Value Decomposition (SVD) is a potentially useful tool for learning such a low-rank matrix. However, the connection between this low-rank matrix and the singular vectors of the text corpus matrix  are usually complicated and hard to spell out, so how to use SVD for learning topic models faces challenges.

3:

4: We overcome the challenge by revealing a surprising

5: insight:  there is a low-dimensional {\it simplex} structure  which can be viewed as a bridge between the low-rank matrix of interest and

6: the SVD of the text corpus matrix, and which allows us

7: to conveniently reconstruct the  former using the latter.

8: Such an insight motivates a new SVD-based approach to learning topic models.

9:

10: For asymptotic analysis, we show that under the popular probabilistic model \citep{hofmann1999},

11: the convergence rate of the $\ell^1$-error of our method matches  that of the minimax lower bound, up to

12: a multi-logarithmic term.  In showing these results, we have derived new element-wise bounds on the singular vectors and several large-deviation bounds for weakly dependent multinomial data. Our results on the convergence rate  and

13: asymptotical minimaxity are new.

14:

15:

16: We have applied our method to two data sets,  Associated Process (AP)  and  Statistics Literature Abstract (SLA), with encouraging results. In particular, there is a clear simplex structure associated with the SVD of the data matrices, which largely validates our discovery.

17:

18:

19:

20: \end{abstract}

21: