abstract:aa4ed775ee201e68.tex

1: \begin{abstract}

2: Modern neural networks are over-parametrized. In particular, each rectified linear hidden unit can be modified by a multiplicative factor by adjusting input and output weights, without changing the rest of the network.

3: Inspired by the Sinkhorn-Knopp algorithm, we introduce a fast iterative method for minimizing the $\ell_2$ norm of the weights, equivalently the weight decay regularizer. It provably converges to a unique solution. Interleaving our algorithm with SGD during training improves the test accuracy.

4: For small batches, our approach offers an alternative to batch- and group- normalization on CIFAR-10 and ImageNet with a ResNet-18.

5:

6: %Modern neural networks are over-parametrized. In particular, the function implemented by a neural network employing rectified linear units admits an infinite number of equivalent weight parameterizations.

7: %%

8: %In this paper, we introduce a fast iterative algorithm that converges to a canonical parametrization among the class of rescaled networks which compute the same function.

9: %It is inspired by Sinkhorn-Knopp's algorithm and provably converges to an equivalent network that minimizes the $\ell_2$ norm of its weights, \ie, the weight decay regularizer.

10: %%

11: %Interleaving our algorithm with SGD at train time improves the prediction performance. For small batches sizes, our approach offers an alternative to batch- and group-normalization on CIFAR-10 and ImageNet with a ResNet-18 architecture.

12: \end{abstract}

13: