1: \begin{abstract}
2: Modern neural networks are over-parametrized. In particular, each rectified linear hidden unit can be modified by a multiplicative factor by adjusting input and output weights, without changing the rest of the network.
3: Inspired by the Sinkhorn-Knopp algorithm, we introduce a fast iterative method for minimizing the $\ell_2$ norm of the weights, equivalently the weight decay regularizer. It provably converges to a unique solution. Interleaving our algorithm with SGD during training improves the test accuracy.
4: For small batches, our approach offers an alternative to batch- and group- normalization on CIFAR-10 and ImageNet with a ResNet-18.
5:
6: %Modern neural networks are over-parametrized. In particular, the function implemented by a neural network employing rectified linear units admits an infinite number of equivalent weight parameterizations.
7: %%
8: %In this paper, we introduce a fast iterative algorithm that converges to a canonical parametrization among the class of rescaled networks which compute the same function.
9: %It is inspired by Sinkhorn-Knopp's algorithm and provably converges to an equivalent network that minimizes the $\ell_2$ norm of its weights, \ie, the weight decay regularizer.
10: %%
11: %Interleaving our algorithm with SGD at train time improves the prediction performance. For small batches sizes, our approach offers an alternative to batch- and group-normalization on CIFAR-10 and ImageNet with a ResNet-18 architecture.
12: \end{abstract}
13: