40f38ad7beaf87fa.tex
1: \begin{abstract}
2: 	Current expectations from training deep learning models with gradient-based methods include: 1) transparency; 2) high convergence rates; 3) high inductive biases. While the state-of-art methods with adaptive learning rate schedules are fast, they still fail to meet the other two requirements. We suggest reconsidering neural network models in terms of single-species population dynamics where adaptation comes naturally from open-ended processes of ``growth'' and ``harvesting''. We show that the stochastic gradient descent (SGD) with two balanced pre-defined values of \emph{per} capita growth and  harvesting rates outperform the most common adaptive gradient methods in all of the three requirements.
3: 	
4:   %Current research indicates that training models with more parameters and capacity than the number of training examples guarantees good generalization. While there are many global minima of the training objective, the optimization algorithm converges to the direction of the max-margin solution. However, overparametrized models set an optimization trap: the distance to the max-margin separator decreases much slower in comparison with the rate of convergence of a loss function itself. We show that ``harvesting'' overparameterized models with the optimal strategies borrowed from single-species population dynamics may greatly assist in accelerating the convergence rate. The ``harvesting'' approach allows for dynamic optimization that reduces the computational costs and brings forward a new interpretation of the optimization procedure.
5: \end{abstract}
6: