1a93febae9af9c29.tex
1: \begin{abstract}
2: We consider solving the low rank matrix sensing problem with Factorized Gradient Descend (FGD) method when the true rank is unknown and over-specified, which we refer to as over-parameterized matrix sensing.
3: If the ground truth signal $\bX^* \in \mathbb{R}^{d*d}$ is of rank $r$, but we try to recover it using $\fitMat \fitMat^\top$ where $\fitMat \in \mathbb{R}^{d*k}$ and $k>r$, the existing statistical analysis falls short, due to a flat local curvature of the loss function around the global maxima.
4: % \jzcomment{two phase convergence of \fitMat_t \fitMat_t - \trueMat is not correct. Maybe we can say we recover the top r signal firs geometrically fast?}
5: % We show that under the over-parameterized matrix sensing setting, there are two phases of convergence with the FGD. In the first phase, the FGD converges geometrically fast to a radius of convergence $\mathcal{O} \parenth{{k d \log d} \sigma^2/n}$ around the true matrix $\bX^{*}$ where $\sigma^2$ is the variance of the observation noise and $n$ is the number of sample. Then, in the second phase, it converges sub-linearly to a statistical error of $\mathcal{O} \parenth{{d \log d} \sigma^2/n}$ under squared Frobenius norm.
6: By decomposing the factorized matrix $\fitMat$ into separate column spaces to capture the effect of extra ranks, we show that $\vecnorm{\fitMat_t \fitMat_t - \trueMat}{F}^2$ converges to a statistical error of $\tilde{\mathcal{O}} \parenth{k d \sigma^2/n}$ after $\tilde{\mathcal{O}}(\frac{\sigma_{r}}{\sigma}\sqrt{\frac{n}{d}})$ number of iterations where $\fitMat_t$ is the output of FGD after $t$ iterations, $\sigma^2$ is the variance of the observation noise, $\sigma_{r}$ is the $r$-th largest eigenvalue of $\trueMat$, and $n$ is the number of sample.
7: % \jzcomment{end-refine}
8: Our results, therefore, offer a comprehensive picture of the statistical and computational complexity of FGD for the over-parameterized matrix sensing problem. 
9: \end{abstract}
10: