1: \begin{abstract}
2: Connections between Deep Neural Networks (DNNs) training and optimal control theory has attracted considerable attention as a principled tool of algorithmic design.
3: Differential Dynamic Programming (DDP) neural optimizer \cite{liu2020differential} is a recently proposed method along this line.
4: Despite its empirical success,
5: the applicability has been limited to feedforward networks
6: and whether such a trajectory-optimization inspired framework can be extended to modern architectures remains unclear.
7: In this work, we derive a generalized
8: DDP optimizer that accepts both residual connections and convolution layers.
9: The resulting optimal control representation admits a {game theoretic} perspective, in which
10: training residual networks
11: can be interpreted as {cooperative trajectory optimization on state-augmented dynamical systems}.
12: This Game Theoretic DDP (GT-DDP) optimizer
13: enjoys the same theoretic connection in previous work,
14: yet generates a much complex update rule that better leverages available information during network propagation.
15: Evaluation on image classification datasets (e.g. MNIST and CIFAR100) shows an improvement in training convergence and variance reduction
16: over existing methods.
17: Our approach highlights the benefit gained from architecture-aware optimization.
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29: \end{abstract}
30: