abstract:80ac70ce9b797cd4.tex

1: \begin{abstract}

2: The training of deep residual neural networks (ResNets) with backpropagation has a memory cost that increases linearly with respect to the depth of the network.

3: A way to circumvent this issue is to use reversible architectures.

4: %

5: In this paper, we propose to change the forward rule of a ResNet by adding a momentum term. The resulting networks, momentum residual neural networks (Momentum ResNets),

6: are invertible.

7: %

8: Unlike previous invertible architectures, they can be used as a drop-in replacement for any existing ResNet block.

9: %

10: We show that Momentum ResNets can be interpreted in the infinitesimal step size regime as second-order ordinary differential equations (ODEs) and exactly characterize how adding momentum progressively increases the representation capabilities of Momentum ResNets: they can learn any linear mapping up to a multiplicative factor, while ResNets cannot.

11: %

12: In a learning to optimize setting,

13: where convergence to a fixed point is required, we show theoretically and empirically that our method succeeds while existing invertible architectures fail.

14: %

15: We show on CIFAR and ImageNet that Momentum ResNets have the same accuracy as ResNets, while having a much smaller memory footprint, and show that pre-trained Momentum ResNets are promising for fine-tuning models.

16: \end{abstract}

17: