1: \begin{abstract}
2: The training of deep residual neural networks (ResNets) with backpropagation has a memory cost that increases linearly with respect to the depth of the network.
3: A way to circumvent this issue is to use reversible architectures.
4: %
5: In this paper, we propose to change the forward rule of a ResNet by adding a momentum term. The resulting networks, momentum residual neural networks (Momentum ResNets),
6: are invertible.
7: %
8: Unlike previous invertible architectures, they can be used as a drop-in replacement for any existing ResNet block.
9: %
10: We show that Momentum ResNets can be interpreted in the infinitesimal step size regime as second-order ordinary differential equations (ODEs) and exactly characterize how adding momentum progressively increases the representation capabilities of Momentum ResNets: they can learn any linear mapping up to a multiplicative factor, while ResNets cannot.
11: %
12: In a learning to optimize setting,
13: where convergence to a fixed point is required, we show theoretically and empirically that our method succeeds while existing invertible architectures fail.
14: %
15: We show on CIFAR and ImageNet that Momentum ResNets have the same accuracy as ResNets, while having a much smaller memory footprint, and show that pre-trained Momentum ResNets are promising for fine-tuning models.
16: \end{abstract}
17: