f57896f109532643.tex
1: \begin{abstract}
2:   A number of results have recently demonstrated the benefits
3:   of incorporating various constraints when training
4:   deep architectures in vision and machine learning.
5:   The advantages range from guarantees for statistical
6:   generalization to better accuracy to compression.
7:   But support for general constraints within widely
8:   used libraries remains scarce and their broader
9:   deployment within many applications that can benefit
10:   from them remains under-explored. Part of the reason is
11:   that Stochastic gradient descent (SGD), the workhorse for
12:   training deep neural networks, does not natively deal with constraints
13:   with global scope very well. In this paper, we revisit a classical first order
14:   scheme from numerical optimization, Conditional Gradients (CG), that has, thus far had limited applicability
15:   in training deep models. We show via rigorous
16:   analysis how various constraints can be naturally handled by  modifications 
17:   of this algorithm. We provide convergence guarantees and show a suite of
18:   immediate benefits that are possible --- from training ResNets with fewer layers but better
19:   accuracy simply by substituting in  our version of CG to faster training of GANs with 50\% fewer
20:   epochs in image inpainting applications to provably better generalization guarantees using
21:   efficiently implementable forms of recently proposed regularizers. 
22: 
23:   \textbf{Keywords:} Constrained Deep Learning, Conditional Gradient Algorithms, Path Norm
24: \end{abstract}
25: