1: \begin{abstract}
2: % In this paper we study an empirical problem of whether and how a deep neural network can be trained well by itself without any explicit regularization technique such as dropout, weight decay, and batch normalization. We observe empirically that existing pretrained models tend to produce near orthogonal filters in each hidden layer, regardless of network architectures, data sets, and applications. Motivated by it,
3:
4:
5: In this paper, we investigate the empirical impact of orthogonality regularization (OR) in deep learning, either solo or collaboratively. Recent works on OR showed some promising results on the accuracy. In our ablation study, however, we do not observe such significant improvement from existing OR techniques compared with the conventional training based on weight decay, dropout, and batch normalization. To identify the real gain from OR, inspired by the locality sensitive hashing (LSH) in angle estimation, we propose to introduce an implicit {\em self-regularization} into OR to push the mean and variance of filter angles in a network towards $\ang{90}$ and $\ang{0}$ simultaneously to achieve (near) orthogonality among the filters, without using any other explicit regularization. Our regularization can be implemented as an architectural plug-in and integrated with an arbitrary network. We reveal that OR helps {\em stabilize} the training process and leads to {\em faster convergence} and {\em better generalization}.
6: \end{abstract}
7: