1: \begin{abstract}
2: The matrix completion problem seeks to recover a $d\times d$ ground
3: truth matrix of low rank $r\ll d$ from observations of its individual
4: elements. Real-world matrix completion is often a huge-scale optimization
5: problem, with $d$ so large that even the simplest full-dimension
6: vector operations with $O(d)$ time complexity become prohibitively
7: expensive. Stochastic gradient descent (SGD) is one of the few algorithms
8: capable of solving matrix completion on a huge scale, and can also
9: naturally handle streaming data over an evolving ground truth. Unfortunately,
10: SGD experiences a dramatic slow-down when the underlying ground truth
11: is ill-conditioned; it requires at least $O(\kappa\log(1/\epsilon))$
12: iterations to get $\epsilon$-close to ground truth matrix with condition
13: number $\kappa$. In this paper, we propose a preconditioned version
14: of SGD that preserves all the favorable practical qualities of SGD
15: for huge-scale online optimization while also making it agnostic to
16: $\kappa$. For a symmetric ground truth and the Root Mean Square Error
17: (RMSE) loss, we prove that the preconditioned SGD converges to $\epsilon$-accuracy
18: in $O(\log(1/\epsilon))$ iterations, with a rapid linear convergence
19: rate as if the ground truth were perfectly conditioned with $\kappa=1$.
20: In our experiments, we observe a similar acceleration for item-item
21: collaborative filtering on the MovieLens25M dataset via a pair-wise ranking loss,
22: with 100 million training pairs and 10 million testing pairs.
23: {[}See supporting code at \url{https://github.com/Hong-Ming/ScaledSGD}.{]}
24: %ill-conditioned matrix completion under the root mean square error (RMSE) loss,
25: %Euclidean distance matrix (EDM) completion under pairwise square
26: %loss.
27: \end{abstract}
28: