1: \begin{abstract}
2: A number of results have recently demonstrated the benefits
3: of incorporating various constraints when training
4: deep architectures in vision and machine learning.
5: The advantages range from guarantees for statistical
6: generalization to better accuracy to compression.
7: But support for general constraints within widely
8: used libraries remains scarce and their broader
9: deployment within many applications that can benefit
10: from them remains under-explored. Part of the reason is
11: that Stochastic gradient descent (SGD), the workhorse for
12: training deep neural networks, does not natively deal with constraints
13: with global scope very well. In this paper, we revisit a classical first order
14: scheme from numerical optimization, Conditional Gradients (CG), that has, thus far had limited applicability
15: in training deep models. We show via rigorous
16: analysis how various constraints can be naturally handled by modifications
17: of this algorithm. We provide convergence guarantees and show a suite of
18: immediate benefits that are possible --- from training ResNets with fewer layers but better
19: accuracy simply by substituting in our version of CG to faster training of GANs with 50\% fewer
20: epochs in image inpainting applications to provably better generalization guarantees using
21: efficiently implementable forms of recently proposed regularizers.
22:
23: \textbf{Keywords:} Constrained Deep Learning, Conditional Gradient Algorithms, Path Norm
24: \end{abstract}
25: