67b1a9f0ea330a86.tex
1: \begin{abstract}
2: In the probabilistic topic models, the quantity of interest---a low-rank matrix consisting of topic vectors---is hidden in the text corpus matrix,  masked by noise,  and Singular Value Decomposition (SVD) is a potentially useful tool for learning such a low-rank matrix. However, the connection between this low-rank matrix and the singular vectors of the text corpus matrix  are usually complicated and hard to spell out, so how to use SVD for learning topic models faces challenges. 
3: 
4: We overcome the challenge by revealing a surprising 
5: insight:  there is a low-dimensional {\it simplex} structure  which can be viewed as a bridge between the low-rank matrix of interest and   
6: the SVD of the text corpus matrix, and which allows us 
7: to conveniently reconstruct the  former using the latter. 
8: Such an insight motivates a new SVD-based approach to learning topic models. 
9: 
10: For asymptotic analysis, we show that under the popular probabilistic model \citep{hofmann1999}, 
11: the convergence rate of the $\ell^1$-error of our method matches  that of the minimax lower bound, up to 
12: a multi-logarithmic term.  In showing these results, we have derived new element-wise bounds on the singular vectors and several large-deviation bounds for weakly dependent multinomial data. Our results on the convergence rate  and 
13: asymptotical minimaxity are new.  
14: 
15: 
16: We have applied our method to two data sets,  Associated Process (AP)  and  Statistics Literature Abstract (SLA), with encouraging results. In particular, there is a clear simplex structure associated with the SVD of the data matrices, which largely validates our discovery.   
17: 
18:  
19: 
20: \end{abstract}
21: