abstract:8ed5c4b417222770.tex

1: \begin{abstract}

2: 	This paper proposes a novel knowledge distillation-based learning method to improve the classification performance of convolutional neural networks (CNNs) without a pre-trained teacher network, called exit-ensemble distillation.

3: 	Our method exploits the multi-exit architecture that adds auxiliary classifiers (called exits) in the middle of a conventional CNN, through which early inference results can be obtained.

4: 	The idea of our method is to train the network using the ensemble of the exits as the distillation target, which greatly improves the classification performance of the overall network.

5: 	Our method suggests a new paradigm of knowledge distillation; unlike the conventional notion of distillation where teachers only teach students, we show that students can also help other students and even the teacher to learn better.

6: 	Experimental results demonstrate that our method achieves significant improvement of classification performance on various popular CNN architectures (VGG, ResNet, ResNeXt, WideResNet, etc.).

7: 	Furthermore, the proposed method can expedite the convergence of learning with improved stability.

8: 	Our code will be available on github.

9: 	%Code is available at https://github.com/hjdw2/Exit-Ensemble-Distillation.

10: \end{abstract}

11: