1: \begin{abstract}
2: Structural support vector machines (SSVMs) are amongst the
3: best performing models for structured computer vision tasks,
4: such as semantic image segmentation or human pose estimation.
5: %
6: Training SSVMs, however, is computationally costly, because it
7: requires repeated calls to a structured prediction subroutine
8: (called \emph{max-oracle}), which has to solve an optimization
9: problem itself, \eg a graph cut.
10:
11: In this work, we introduce a new algorithm for SSVM training that
12: is more efficient than earlier techniques when the max-oracle is
13: computationally expensive, as it is frequently the case in computer
14: vision tasks. The main idea is to (i) combine the recent stochastic
15: Block-Coordinate Frank-Wolfe algorithm with efficient hyperplane
16: caching, and (ii) use an automatic selection rule for deciding whether
17: to call the exact max-oracle or to rely on an approximate one based
18: on the cached hyperplanes.
19:
20: We show experimentally that this strategy leads to faster convergence
21: to the optimum with respect to the number of requires oracle calls,
22: and that this translates into faster convergence with respect to the
23: total runtime when the max-oracle is slow compared to the other steps
24: of the algorithm.
25:
26: A publicly available C++ implementation is provided.
27: \end{abstract}
28: