abstract:ced836d280121eca.tex

1: \begin{abstract}

2: The Frank-Wolfe algorithm has regained much interest

3: in its use

4: %since it has been successfully used

5: in structurally constrained machine learning applications. However, one major limitation of the Frank-Wolfe algorithm is the slow local convergence property due to the zig-zagging behavior.

6: We observe that this zig-zagging phenomenon can be viewed as an artifact of discretization, as when the method is viewed as an Euler discretization of a continuous time flow, that flow does not zig-zag.

7: %In contrast to previous methods that directly break the behavior, we figure out the intuition behind this behavior, which is an artifact of truncation discretization error.

8: For this reason, we propose multistep Frank-Wolfe variants based on discretizations of the same flow whose truncation errors decay as $O(\Delta^p)$, where $p$ is the method's order.

9: This strategy ``stabilizes" the method, and allows tools like line search and momentum to have more benefit. However, in terms of a convergence rate, our result is ultimately negative, suggesting that no Runge-Kutta-type discretization scheme can achieve a better convergence rate than the vanilla Frank-Wolfe method.

10: We believe that this analysis adds to the growing knowledge of flow analysis for optimization methods, and is a cautionary tale on the ultimate usefulness of multistep methods.

11: \end{abstract}

12: