abstract:f304f7e4214662cb.tex

1: \begin{abstract}

2: \vspace{-0.2cm}

3: Recent work on the Lottery Ticket Hypothesis (LTH) shows that there exist ``\textit{winning tickets}'' in large neural networks.

4: These tickets represent ``sparse'' versions of the full model that can be trained independently to achieve comparable accuracy with respect to the full model.

5: However, finding the winning tickets requires one to  \emph{pretrain} the large model for at least a number of epochs, which can be a burdensome task, especially when the original neural network gets larger.

6:

7: In this paper, we explore how one can efficiently identify the emergence of such winning tickets, and use this observation to design efficient pretraining algorithms.

8: For clarity of exposition, our focus is on convolutional neural networks (CNNs). %, which are more complex than simple multi-layer perceptrons, but simple enough to exposure our ideas.

9: To identify good filters, we propose a novel filter distance metric that well-represents the model convergence.

10: % I think here we need to know the true winning tickets in order to evaluate a ticket using this filter distance.

11: As our theory dictates, our filter analysis behaves consistently with recent findings of neural network learning dynamics.

12: Motivated by these observations, we present the \emph{LOttery ticket through Filter-wise Training} algorithm, dubbed as \textsc{LoFT}.

13: \textsc{LoFT} is a model-parallel pretraining algorithm that partitions convolutional layers by filters to train them independently in a distributed setting, resulting in reduced memory and communication costs during pretraining.

14: Experiments show that \textsc{LoFT} $i)$ preserves and finds good lottery tickets, while $ii)$ it achieves non-trivial computation and communication savings, and maintains comparable or even better accuracy than other pretraining methods.

15: % This part is sort of repeating "preserves and finds good lottery tickets".

16: \end{abstract}

17: