abstract:7c83378bb52e8179.tex

1: \begin{abstract}%

2: We consider using gradient descent to minimize the nonconvex function

3: $f(X)=\phi(XX^{T})$ over an $n\times r$ factor matrix $X$, in which

4: $\phi$ is an underlying smooth convex cost function defined over

5: $n\times n$ matrices. While only a second-order stationary point

6: $X$ can be provably found in reasonable time, if $X$ is additionally

7: \emph{rank deficient}, then its rank deficiency certifies it as being

8: globally optimal. This way of certifying global optimality necessarily

9: requires the search rank $r$ of the current iterate $X$ to be \emph{overparameterized}

10: with respect to the rank $r^{\star}$ of the global minimizer $X^{\star}$.

11: Unfortunately, overparameterization significantly slows down the convergence

12: of gradient descent, from a linear rate with $r=r^{\star}$ to a sublinear

13: rate when $r>r^{\star}$, even when $\phi$ is strongly convex. In

14: this paper, we propose an inexpensive preconditioner that restores

15: the convergence rate of gradient descent back to linear in the

16: overparameterized case, while also making it agnostic to possible

17: ill-conditioning in the global minimizer $X^{\star}$.

18: \end{abstract}