1: \begin{abstract}
2: This paper proposes a novel knowledge distillation-based learning method to improve the classification performance of convolutional neural networks (CNNs) without a pre-trained teacher network, called exit-ensemble distillation.
3: Our method exploits the multi-exit architecture that adds auxiliary classifiers (called exits) in the middle of a conventional CNN, through which early inference results can be obtained.
4: The idea of our method is to train the network using the ensemble of the exits as the distillation target, which greatly improves the classification performance of the overall network.
5: Our method suggests a new paradigm of knowledge distillation; unlike the conventional notion of distillation where teachers only teach students, we show that students can also help other students and even the teacher to learn better.
6: Experimental results demonstrate that our method achieves significant improvement of classification performance on various popular CNN architectures (VGG, ResNet, ResNeXt, WideResNet, etc.).
7: Furthermore, the proposed method can expedite the convergence of learning with improved stability.
8: Our code will be available on github.
9: %Code is available at https://github.com/hjdw2/Exit-Ensemble-Distillation.
10: \end{abstract}
11: