f304f7e4214662cb.tex
1: \begin{abstract}
2: \vspace{-0.2cm}
3: Recent work on the Lottery Ticket Hypothesis (LTH) shows that there exist ``\textit{winning tickets}'' in large neural networks. 
4: These tickets represent ``sparse'' versions of the full model that can be trained independently to achieve comparable accuracy with respect to the full model. 
5: However, finding the winning tickets requires one to  \emph{pretrain} the large model for at least a number of epochs, which can be a burdensome task, especially when the original neural network gets larger. 
6: 
7: In this paper, we explore how one can efficiently identify the emergence of such winning tickets, and use this observation to design efficient pretraining algorithms.
8: For clarity of exposition, our focus is on convolutional neural networks (CNNs). %, which are more complex than simple multi-layer perceptrons, but simple enough to exposure our ideas. 
9: To identify good filters, we propose a novel filter distance metric that well-represents the model convergence.
10: % I think here we need to know the true winning tickets in order to evaluate a ticket using this filter distance.
11: As our theory dictates, our filter analysis behaves consistently with recent findings of neural network learning dynamics.
12: Motivated by these observations, we present the \emph{LOttery ticket through Filter-wise Training} algorithm, dubbed as \textsc{LoFT}.
13: \textsc{LoFT} is a model-parallel pretraining algorithm that partitions convolutional layers by filters to train them independently in a distributed setting, resulting in reduced memory and communication costs during pretraining.
14: Experiments show that \textsc{LoFT} $i)$ preserves and finds good lottery tickets, while $ii)$ it achieves non-trivial computation and communication savings, and maintains comparable or even better accuracy than other pretraining methods.
15: % This part is sort of repeating "preserves and finds good lottery tickets".
16: \end{abstract}
17: