1: \begin{abstract}
2: \ac{BN} is a popular technique for training \acp{DNN}. \ac{BN} uses scaling and shifting to normalize activations of mini-batches to accelerate convergence and improve generalization. The recently proposed \ac{ITERNORM} method improves these properties by whitening the activations iteratively using Newton's method. However, since Newton's method initializes the whitening matrix independently at each training step, no information is shared between consecutive steps. In this work, instead of exact computation of whitening matrix at each time step, we estimate it gradually during training in an online fashion, using our proposed \ac{SWBN} algorithm. We show that while \ac{SWBN} improves the convergence rate and generalization of \acp{DNN}, its computational overhead is less than that of IterNorm. Due to the high efficiency of the proposed method, it can be easily employed in most \ac{DNN} architectures with a large number of layers. We provide comprehensive experiments and comparisons between \ac{BN}, \ac{ITERNORM}, and \ac{SWBN} layers to demonstrate the effectiveness of the proposed technique in conventional (many-shot) image classification and few-shot classification tasks.
3: \footnote[0]{* Equal contribution. \# Jiayi is now with Kwai Inc.}
4: \end{abstract}
5: