abstract:e08e78fd8f333469.tex

1: \begin{abstract}

2: \vspace{-5pt}

3: Batch Normalization (BN)

4: %makes output of hidden neuron had zero mean and unit variance,

5: improves both convergence and generalization in training neural networks.

6: % by regularizing

7: This work understands these phenomena theoretically.

8: %

9: We analyze BN by using a basic block of neural networks, consisting of a kernel layer, a BN layer, and a nonlinear activation function.

10: %

11: This basic network helps us understand the impacts of BN

12: %, where the results are generalized to deep models in numerical studies.

13: %

14: %We explore BN

15: in three aspects.

16: %

17: First, by viewing BN as an implicit regularizer, BN can be decomposed into population normalization (PN) and gamma decay as an explicit regularization.

18: %an analytical form of its regularization is derived.

19: %

20: Second, learning dynamics of BN and the regularization show that training converged with large maximum and effective learning rate.

21: %

22: Third, generalization of BN is explored by using statistical mechanics.

23: % is reformulated as statistical

24: Experiments demonstrate that BN in convolutional neural networks share the same traits of regularization as the above analyses.

25: % Finally, the characteristics of BN are studied by .

26: % support our analyses.

27: %

28: \vspace{-5pt}

29: \end{abstract}

30: