1: \begin{abstract}
2:
3:
4: We propose $\textsf{ScaledGD($\lambda$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparameterized factor representations, $\textsf{ScaledGD($\lambda$)}$ starts from a small random initialization, and proceeds by gradient descent with
5: a specific form of {\em damped} preconditioning to combat bad curvatures induced by overparameterization and ill-conditioning. $\textsf{ScaledGD($\lambda$)}$ is remarkably robust to ill-conditioning compared to vanilla gradient descent ($\textsf{GD}$) even with overparameterization. Specifically, we show that, under the restricted isometry property (RIP) of the sensing operator, $\textsf{ScaledGD($\lambda$)}$ converges to the true low-rank matrix at a constant linear rate after a small number of iterations that scales only {\em logarithmically} with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla $\textsf{GD}$ which suffers from a polynomial dependency on the condition number.
6: Furthermore, we show that in the presence of measurement noise, $\textsf{ScaledGD($\lambda$)}$ converges to the minimax optimal error up to a multiplicative factor of the condition number at the same rate as in the noiseless setting, which is the first nearly minimax-optimal overparameterized gradient method for low-rank matrix sensing. Our results also extend to the setting when the matrix is only approximately low-rank under the Gaussian design.
7: Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning.
8: % preconditioning helps even with overparameterization.
9:
10:
11: \end{abstract}
12: