8ed5c4b417222770.tex
1: \begin{abstract}
2: 	This paper proposes a novel knowledge distillation-based learning method to improve the classification performance of convolutional neural networks (CNNs) without a pre-trained teacher network, called exit-ensemble distillation. 
3: 	Our method exploits the multi-exit architecture that adds auxiliary classifiers (called exits) in the middle of a conventional CNN, through which early inference results can be obtained. 
4: 	The idea of our method is to train the network using the ensemble of the exits as the distillation target, which greatly improves the classification performance of the overall network. 
5: 	Our method suggests a new paradigm of knowledge distillation; unlike the conventional notion of distillation where teachers only teach students, we show that students can also help other students and even the teacher to learn better. 
6: 	Experimental results demonstrate that our method achieves significant improvement of classification performance on various popular CNN architectures (VGG, ResNet, ResNeXt, WideResNet, etc.). 
7: 	Furthermore, the proposed method can expedite the convergence of learning with improved stability.
8: 	Our code will be available on github.
9: 	%Code is available at https://github.com/hjdw2/Exit-Ensemble-Distillation.
10: \end{abstract}
11: