abstract:b7a7199cd9e0c798.tex

1: \begin{abstract}

2: Connections between Deep Neural Networks (DNNs) training and optimal control theory has attracted considerable attention as a principled tool of algorithmic design.

3: Differential Dynamic Programming (DDP) neural optimizer \cite{liu2020differential} is a recently proposed method along this line.

4: Despite its empirical success,

5: the applicability has been limited to feedforward networks

6: and whether such a trajectory-optimization inspired framework can be extended to modern architectures remains unclear.

7: In this work, we derive a generalized

8: DDP optimizer that accepts both residual connections and convolution layers.

9: The resulting optimal control representation admits a {game theoretic} perspective, in which

10: training residual networks

11: can be interpreted as {cooperative trajectory optimization on state-augmented dynamical systems}.

12: This Game Theoretic DDP (GT-DDP) optimizer

13: enjoys the same theoretic connection in previous work,

14: yet generates a much complex update rule that better leverages available information during network propagation.

15: Evaluation on image classification datasets (e.g. MNIST and CIFAR100) shows an improvement in training convergence and variance reduction

16: over existing methods.

17: Our approach highlights the benefit gained from architecture-aware optimization.

18:

19:

20:

21:

22:

23:

24:

25:

26:

27:

28:

29: \end{abstract}

30: