1: \begin{abstract}
2: In the probabilistic topic models, the quantity of interest---a low-rank matrix consisting of topic vectors---is hidden in the text corpus matrix, masked by noise, and Singular Value Decomposition (SVD) is a potentially useful tool for learning such a low-rank matrix. However, the connection between this low-rank matrix and the singular vectors of the text corpus matrix are usually complicated and hard to spell out, so how to use SVD for learning topic models faces challenges.
3:
4: We overcome the challenge by revealing a surprising
5: insight: there is a low-dimensional {\it simplex} structure which can be viewed as a bridge between the low-rank matrix of interest and
6: the SVD of the text corpus matrix, and which allows us
7: to conveniently reconstruct the former using the latter.
8: Such an insight motivates a new SVD-based approach to learning topic models.
9:
10: For asymptotic analysis, we show that under the popular probabilistic model \citep{hofmann1999},
11: the convergence rate of the $\ell^1$-error of our method matches that of the minimax lower bound, up to
12: a multi-logarithmic term. In showing these results, we have derived new element-wise bounds on the singular vectors and several large-deviation bounds for weakly dependent multinomial data. Our results on the convergence rate and
13: asymptotical minimaxity are new.
14:
15:
16: We have applied our method to two data sets, Associated Process (AP) and Statistics Literature Abstract (SLA), with encouraging results. In particular, there is a clear simplex structure associated with the SVD of the data matrices, which largely validates our discovery.
17:
18:
19:
20: \end{abstract}
21: