19a3a8354a96a43e.tex
1: \begin{abstract}
2: As an indispensable component, Batch Normalization (BN) has successfully improved the training of deep neural networks (DNNs) with mini-batches, by normalizing the distribution of the internal representation for each hidden layer. However, the effectiveness of BN would diminish with scenario of micro-batch (\eg~less than 10 samples in a mini-batch),
3: %(e.g., less than $50$ training examples in the batch),
4: since the estimated statistics in a mini-batch are not reliable with insufficient samples.
5: %
6: In this paper, we present a novel normalization method, called Batch Kalman Normalization (BKN), for improving and accelerating the training of DNNs, particularly under the context of micro-batches.
7: %
8: %BKN has the merits that are analogous to .
9: %
10: %This method is analogous to Kalman Filtering process, thus called  (BKN).
11: Specifically, unlike the existing solutions treating each hidden layer as an isolated system, BKN treats all the layers in a network as a whole system, and estimates the statistics of a certain layer by considering the distributions of all its preceding layers, mimicking the merits of Kalman Filtering.
12: %
13: BKN has two appealing properties. First, it enables more stable training and faster convergence compared to previous works.
14: %training.
15: Second, training DNNs using BKN performs substantially better than those using BN and its variants, especially when very small mini-batches are presented.
16: %
17: {On the image classification benchmark of ImageNet, using BKN powered networks we improve upon the best-published model-zoo results: reaching 74.0\% top-1 \emph{val} accuracy for InceptionV2. More importantly, using BKN achieves the comparable accuracy with extremely smaller batch size, such as 64 times smaller on CIFAR-10/100 and 8 times smaller on ImageNet.}
18: 
19: 
20: \end{abstract}
21: