1: \begin{abstract}
2: We consider the problem of learning a high-dimensional but low-rank matrix from a large-scale dataset distributed over several machines, where low-rankness is enforced by a convex trace norm constraint.
3: We propose \dfw, a distributed Frank-Wolfe algorithm which leverages the low-rank structure of its updates to achieve efficiency in time, memory and communication usage. The step at the heart of \dfw is solved approximately using a distributed version of the power method. We provide a theoretical analysis of the convergence of \dfw, showing that we can ensure sublinear convergence in expectation to an optimal solution with few power iterations per epoch. We implement \dfw in the Apache Spark distributed programming framework and validate the usefulness of our approach on synthetic and real data, including the ImageNet dataset with high-dimensional features extracted from a deep neural network.
4: %\keywords{Frank-Wolfe algorithm \and Low-rank learning \and Trace norm \and Distributed optimization \and Multi-task learning \and Multinomial logistic regression}
5: % \PACS{PACS code1 \and PACS code2 \and more}
6: % \subclass{MSC code1 \and MSC code2 \and more}
7: \end{abstract}
8: