b7a7199cd9e0c798.tex
1: \begin{abstract}
2: Connections between Deep Neural Networks (DNNs) training and optimal control theory has attracted considerable attention as a principled tool of algorithmic design.
3: Differential Dynamic Programming (DDP) neural optimizer \cite{liu2020differential} is a recently proposed method along this line.
4: Despite its empirical success,
5: the applicability has been limited to feedforward networks
6: and whether such a trajectory-optimization inspired framework can be extended to modern architectures remains unclear.
7: In this work, we derive a generalized
8: DDP optimizer that accepts both residual connections and convolution layers.
9: The resulting optimal control representation admits a {game theoretic} perspective, in which
10: training residual networks
11: can be interpreted as {cooperative trajectory optimization on state-augmented dynamical systems}.
12: This Game Theoretic DDP (GT-DDP) optimizer
13: enjoys the same theoretic connection in previous work,
14: yet generates a much complex update rule that better leverages available information during network propagation.
15: Evaluation on image classification datasets (e.g. MNIST and CIFAR100) shows an improvement in training convergence and variance reduction
16: over existing methods.
17: Our approach highlights the benefit gained from architecture-aware optimization.
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: \end{abstract}
30: