6e51f91dd4a4f005.tex
1: \begin{abstract}
2:  Pre-training has exhibited notable benefits to downstream tasks by boosting accuracy and speeding up convergence, but the exact reasons for these benefits still remain unclear.
3: To this end, we propose to quantitatively and explicitly explain effects of pre-training on the downstream task from a novel game-theoretic view, which also sheds new light into the learning behavior of deep neural networks (DNNs).
4: Specifically, we extract and quantify the knowledge encoded by the pre-trained model, and further track the changes of such knowledge during the fine-tuning process.
5: Interestingly, we discover that only a small amount of pre-trained model's knowledge is preserved for the inference of downstream tasks.
6: However, such preserved knowledge is very challenging for a model training from scratch to learn.
7: Thus, with the help of this exclusively learned and useful knowledge, the model fine-tuned from pre-training usually achieves better performance than the model training from scratch.
8: Besides, we discover that pre-training can guide the fine-tuned model to learn target knowledge for the downstream task more directly and quickly, which accounts for the faster convergence of the fine-tuned model.
9: %\textit{The code will be released when the paper is accepted}.
10: 
11: \end{abstract}
12: