abstract:19a3a8354a96a43e.tex

1: \begin{abstract}

2: As an indispensable component, Batch Normalization (BN) has successfully improved the training of deep neural networks (DNNs) with mini-batches, by normalizing the distribution of the internal representation for each hidden layer. However, the effectiveness of BN would diminish with scenario of micro-batch (\eg~less than 10 samples in a mini-batch),

3: %(e.g., less than $50$ training examples in the batch),

4: since the estimated statistics in a mini-batch are not reliable with insufficient samples.

5: %

6: In this paper, we present a novel normalization method, called Batch Kalman Normalization (BKN), for improving and accelerating the training of DNNs, particularly under the context of micro-batches.

7: %

8: %BKN has the merits that are analogous to .

9: %

10: %This method is analogous to Kalman Filtering process, thus called  (BKN).

11: Specifically, unlike the existing solutions treating each hidden layer as an isolated system, BKN treats all the layers in a network as a whole system, and estimates the statistics of a certain layer by considering the distributions of all its preceding layers, mimicking the merits of Kalman Filtering.

12: %

13: BKN has two appealing properties. First, it enables more stable training and faster convergence compared to previous works.

14: %training.

15: Second, training DNNs using BKN performs substantially better than those using BN and its variants, especially when very small mini-batches are presented.

16: %

17: {On the image classification benchmark of ImageNet, using BKN powered networks we improve upon the best-published model-zoo results: reaching 74.0\% top-1 \emph{val} accuracy for InceptionV2. More importantly, using BKN achieves the comparable accuracy with extremely smaller batch size, such as 64 times smaller on CIFAR-10/100 and 8 times smaller on ImageNet.}

18:

19:

20: \end{abstract}

21: