1: \begin{proof}
2: Here we highlight the key steps of the proof, and defer the detailed proof to
3: Section~\ref{sec:proofs-phase-i-1}.
4:
5: % Note that given \pk{$W_k>\epsilon_0 e^{-k}$} and conditional on the high probability even that
6: % $\ol W_k \le 2e^{-e^k d_0}$, we have $Nd_k^{max} = O(N \log M / M)$
7:
8: In Figure~\ref{fig:block-decomp-Bk}, the rows and the columns of $B_{\wh{\mc I}_k, \wh{\mc I}_k}$
9: are sorted according to the exact marginal probabilities of the words in ascending order, with the
10: rows and columns set to 0 by regularization shaded.
11: %
12: Consider the block decomposition according to the good words $\wh{\mc L}_k$ and the
13: spillover words $\wh{\mc J}_k$.
14: %
15: We bound the spectral distance of the 4 blocks ($A_1, A_2, A_3, A_4$) separately. The bound for
16: the entire matrix $\wt B_k$ is then an immediate result of triangle inequality.
17:
18: For block $A_1$ whose rows and columns all correspond to the ``good words'' with roughly
19: uniform marginals, we show its concentration by applying the result in
20: \cite{le2015concentration}.
21: %
22: For block $A_2$ and $A_3$, we show that after regularization the spectral norm of these two blocks
23: are small. Intuitively, the expected row sums of block $A_2$ are bounded by $2d_k^{\max}$ and the
24: expected column sums are bounded by $2d_k^{\max}{\ol W_k \over W_k}= O(1/N)$, as a result of the
25: bound on $\ol W_k$ in Lemma~\ref{lem:small-spillover}. Thus the spectral norm of the block $A_2$
26: is likely to be bounded by $O(\sqrt{d_k^{\max}/N})$. We show this rigorously with high probability
27: arguments.
28: %
29: Lastly for block $A_4$, which rows and columns all correspond to the spillover words. We show that
30: the spectral norm of this block is very small, as a result of the small spillover marginal $\ol
31: W_k$.
32: \end{proof}
33: