7c83378bb52e8179.tex
1: \begin{abstract}%
2: We consider using gradient descent to minimize the nonconvex function
3: $f(X)=\phi(XX^{T})$ over an $n\times r$ factor matrix $X$, in which
4: $\phi$ is an underlying smooth convex cost function defined over
5: $n\times n$ matrices. While only a second-order stationary point
6: $X$ can be provably found in reasonable time, if $X$ is additionally
7: \emph{rank deficient}, then its rank deficiency certifies it as being
8: globally optimal. This way of certifying global optimality necessarily
9: requires the search rank $r$ of the current iterate $X$ to be \emph{overparameterized}
10: with respect to the rank $r^{\star}$ of the global minimizer $X^{\star}$.
11: Unfortunately, overparameterization significantly slows down the convergence
12: of gradient descent, from a linear rate with $r=r^{\star}$ to a sublinear
13: rate when $r>r^{\star}$, even when $\phi$ is strongly convex. In
14: this paper, we propose an inexpensive preconditioner that restores
15: the convergence rate of gradient descent back to linear in the
16: overparameterized case, while also making it agnostic to possible
17: ill-conditioning in the global minimizer $X^{\star}$. 
18: \end{abstract}