abstract:6e51f91dd4a4f005.tex

1: \begin{abstract}

2:  Pre-training has exhibited notable benefits to downstream tasks by boosting accuracy and speeding up convergence, but the exact reasons for these benefits still remain unclear.

3: To this end, we propose to quantitatively and explicitly explain effects of pre-training on the downstream task from a novel game-theoretic view, which also sheds new light into the learning behavior of deep neural networks (DNNs).

4: Specifically, we extract and quantify the knowledge encoded by the pre-trained model, and further track the changes of such knowledge during the fine-tuning process.

5: Interestingly, we discover that only a small amount of pre-trained model's knowledge is preserved for the inference of downstream tasks.

6: However, such preserved knowledge is very challenging for a model training from scratch to learn.

7: Thus, with the help of this exclusively learned and useful knowledge, the model fine-tuned from pre-training usually achieves better performance than the model training from scratch.

8: Besides, we discover that pre-training can guide the fine-tuned model to learn target knowledge for the downstream task more directly and quickly, which accounts for the faster convergence of the fine-tuned model.

9: %\textit{The code will be released when the paper is accepted}.

10:

11: \end{abstract}

12: