abstract:2b93238e8d4727fd.tex

1: \begin{abstract}

2: The training process of ReLU neural networks often exhibits complicated nonlinear phenomena.

3: The nonlinearity of models and non-convexity of loss pose significant challenges for theoretical analysis. Therefore, most previous theoretical works on the optimization dynamics of neural networks focus either on local analysis (like the end of training) or approximate linear models (like Neural Tangent Kernel).

4: In this work, we conduct a complete theoretical characterization of the training process of a two-layer ReLU network trained by Gradient Flow on a linearly separable data. In this specific setting, our analysis captures the whole optimization process starting from random initialization to final convergence.

5: Despite the relatively simple model and data that we studied, we reveal four different phases from the whole training process showing a general simplifying-to-complicating learning trend.

6: Specific nonlinear behaviors can also be precisely identified and captured theoretically, such as

7: initial condensation, saddle-to-plateau dynamics, plateau escape, changes of activation patterns,

8: %random directional convergence,

9: learning with increasing complexity, etc.

10: % Beyond the lazy regime like Neural Tanget Kernel, ReLU neural networks can exhibit rich nonlinear behaviors during training. These include initial condensation, getting stuck in and escaping from saddle or plateau, saddle to saddle, rich changes of activation patterns by deactivation or reactivation, random directional convergence, learning with increasing complexity, and more. In this work, we aim to understand these behaviors theoretically. By meticulously studying a complete training process of ReLU networks trained by Gradient Flow starting from random initialization, we provide an in-depth analysis of these fascinating nonlinear phenomena during the four-phase optimization dynamics.

11: \end{abstract}

12: