abstract:c14134d8cd526579.tex

1: \begin{abstract}

2: In the probabilistic topic models, the quantity of interest---a low-rank matrix consisting of topic vectors---is hidden in the text corpus matrix,  masked by noise,  and the Singular Value Decomposition (SVD) is a potentially useful tool for learning such a low-rank matrix. However, the connection between this low-rank matrix and the singular vectors of the text corpus matrix  are usually complicated and hard to spell out, so how to use SVD for learning topic models faces challenges.  In this paper, we overcome the challenge by revealing a surprising

3: insight:  there is a low-dimensional {\it simplex} structure  which can be viewed as a bridge between the low-rank matrix of interest and

4: the SVD of the text corpus matrix, and allows us

5: to conveniently reconstruct the  former using the latter.

6: Such an insight motivates a new SVD approach to learning topic models, which we analyze with  delicate random matrix theory and derive the rate of convergence.  We  support our methods and theory numerically, using both  simulated data and real data.

7: \end{abstract}

8: