abstract:e83d07e28e11ca40.tex

1: \begin{abstract}

2:     Understanding implicit bias of gradient descent for generalization capability of ReLU networks has been an important research topic in machine learning research.

3: 	Unfortunately, even for a {single ReLU neuron} trained with the square loss, it was recently shown impossible to characterize the implicit regularization in terms of a norm of model parameters \cite{vardi2021implicit}.

4:     In order to close the  gap toward understanding intriguing generalization behavior of ReLU networks, here we examine the gradient flow dynamics in the parameter space when training single-neuron ReLU networks.

5:     Specifically, we discover an implicit bias in terms of support vectors, which plays a key role in why and how ReLU networks generalize well.

6:     Moreover, we analyze gradient flows with respect to the magnitude of the norm of initialization, and show that the norm of the learned weight strictly increases through the gradient flow.

7:     Lastly, we prove the global convergence of single ReLU neuron for $d=2$ case.

8:     % , \add{which is experimentally shown to be the best result under our assumption.}

9: \end{abstract}

10: