abstract:2cbef379c3601c6f.tex

1: \begin{abstract}

2:

3:

4: We propose $\textsf{ScaledGD($\lambda$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparameterized factor representations, $\textsf{ScaledGD($\lambda$)}$ starts from a small random initialization, and proceeds by gradient descent with

5: a specific form of {\em damped} preconditioning to combat bad curvatures induced by overparameterization and ill-conditioning. $\textsf{ScaledGD($\lambda$)}$ is remarkably robust to ill-conditioning compared to vanilla gradient descent ($\textsf{GD}$) even with overparameterization. Specifically, we show that, under the restricted isometry property (RIP) of the sensing operator, $\textsf{ScaledGD($\lambda$)}$ converges to the true low-rank matrix at a constant linear rate after a small number of iterations that scales only {\em logarithmically} with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla $\textsf{GD}$ which suffers from a polynomial dependency on the condition number.

6: Furthermore, we show that in the presence of measurement noise, $\textsf{ScaledGD($\lambda$)}$ converges to the minimax optimal error up to a multiplicative factor of the condition number at the same rate as in the noiseless setting, which is the first nearly minimax-optimal overparameterized gradient method for low-rank matrix sensing. Our results also extend to the setting when the matrix is only approximately low-rank under the Gaussian design.

7: Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning.

8: % preconditioning helps even with overparameterization.

9:

10:

11: \end{abstract}

12: