f749dfc00194b1d3.tex
1: \begin{abstract}
2: %
3: 
4: We present a matrix factorization algorithm that scales to input
5: matrices that are large in both dimensions (i.e., that contains more
6: than 1TB of data).  The algorithm streams the matrix columns while
7: subsampling them, resulting in low complexity per iteration and
8: reasonable memory footprint. In contrast to previous online matrix
9: factorization methods, our approach relies on low-dimensional
10: statistics from past iterates to control the extra variance introduced
11: by subsampling. We present a convergence analysis that guarantees us
12: to reach a stationary point of the problem. Large speed-ups can be
13: obtained compared to previous online algorithms that do not perform
14: subsampling, thanks to the feature redundancy that often exists in
15: high-dimensional settings.
16: %
17: \end{abstract}
18: