1: \begin{abstract}
2: We propose multirate training of neural networks: partitioning neural network parameters into ``fast'' and ``slow'' parts which are trained on different time scales, where slow parts are updated less frequently.
3: By choosing appropriate partitionings
4: we can obtain substantial computational speed-up for transfer learning tasks. We show for applications in vision and NLP that we can fine-tune deep neural networks in almost half the time, without reducing the generalization performance of the resulting models. We analyze the convergence properties of our multirate scheme and draw a comparison with vanilla SGD. We also discuss splitting choices for the neural network parameters which could enhance generalization performance when neural networks are trained from scratch. A multirate approach can be used to learn different features present in the data and as a form of regularization. Our paper unlocks the potential of using multirate techniques for neural network training and provides several starting points for future work in this area.
5: \end{abstract}
6: