7b98285ad296ecf0.tex
1: \begin{abstract}
2: The stochastic Frank-Wolfe method has recently attracted much general interest in the context of optimization for statistical and machine learning due to its ability to work with a more general feasible region. However, there has been a complexity gap \textcolor{black}{in the dependence on the optimality tolerance $\varepsilon$} in the guaranteed convergence rate for stochastic Frank-Wolfe compared to its deterministic counterpart.  In this work, we present a new generalized stochastic Frank-Wolfe method which closes this gap for the class of structured optimization problems encountered in statistical and machine learning characterized by empirical loss minimization with a certain type of ``linear prediction'' property (formally defined in the paper), which is typically present in loss minimization problems in practice.  Our method also introduces the notion of a ``substitute gradient" that is a not-necessarily-unbiased sample of the gradient.  We show that our new method is equivalent to a particular randomized coordinate mirror descent algorithm applied to the dual problem, which in turn provides a new interpretation of randomized dual coordinate descent in the primal space.  Also, in the special case of a strongly convex regularizer our generalized stochastic Frank-Wolfe method (as well as the randomized dual coordinate descent method) exhibits linear convergence.  Furthermore, we present computational experiments that indicate that our method outperforms other stochastic Frank-Wolfe methods \textcolor{black}{for a sufficiently small optimality tolerance}, consistent with the theory developed herein.
3: 
4: 
5: \end{abstract}
6: