e83d07e28e11ca40.tex
1: \begin{abstract}
2:     Understanding implicit bias of gradient descent for generalization capability of ReLU networks has been an important research topic in machine learning research. 
3: 	Unfortunately, even for a {single ReLU neuron} trained with the square loss, it was recently shown impossible to characterize the implicit regularization in terms of a norm of model parameters \cite{vardi2021implicit}.
4:     In order to close the  gap toward understanding intriguing generalization behavior of ReLU networks, here we examine the gradient flow dynamics in the parameter space when training single-neuron ReLU networks. 
5:     Specifically, we discover an implicit bias in terms of support vectors, which plays a key role in why and how ReLU networks generalize well.
6:     Moreover, we analyze gradient flows with respect to the magnitude of the norm of initialization, and show that the norm of the learned weight strictly increases through the gradient flow.
7:     Lastly, we prove the global convergence of single ReLU neuron for $d=2$ case.
8:     % , \add{which is experimentally shown to be the best result under our assumption.}
9: \end{abstract}
10: