e08e78fd8f333469.tex
1: \begin{abstract}
2: \vspace{-5pt}
3: Batch Normalization (BN)
4: %makes output of hidden neuron had zero mean and unit variance,
5: improves both convergence and generalization in training neural networks.
6: % by regularizing
7: This work understands these phenomena theoretically.
8: %
9: We analyze BN by using a basic block of neural networks, consisting of a kernel layer, a BN layer, and a nonlinear activation function.
10: %
11: This basic network helps us understand the impacts of BN
12: %, where the results are generalized to deep models in numerical studies.
13: %
14: %We explore BN
15: in three aspects.
16: %
17: First, by viewing BN as an implicit regularizer, BN can be decomposed into population normalization (PN) and gamma decay as an explicit regularization.
18: %an analytical form of its regularization is derived.
19: %
20: Second, learning dynamics of BN and the regularization show that training converged with large maximum and effective learning rate.
21: %
22: Third, generalization of BN is explored by using statistical mechanics.
23: % is reformulated as statistical
24: Experiments demonstrate that BN in convolutional neural networks share the same traits of regularization as the above analyses.
25: % Finally, the characteristics of BN are studied by .
26: % support our analyses.
27: %
28: \vspace{-5pt}
29: \end{abstract}
30: