1: \begin{abstract}
2: Understanding implicit bias of gradient descent for generalization capability of ReLU networks has been an important research topic in machine learning research.
3: Unfortunately, even for a {single ReLU neuron} trained with the square loss, it was recently shown impossible to characterize the implicit regularization in terms of a norm of model parameters \cite{vardi2021implicit}.
4: In order to close the gap toward understanding intriguing generalization behavior of ReLU networks, here we examine the gradient flow dynamics in the parameter space when training single-neuron ReLU networks.
5: Specifically, we discover an implicit bias in terms of support vectors, which plays a key role in why and how ReLU networks generalize well.
6: Moreover, we analyze gradient flows with respect to the magnitude of the norm of initialization, and show that the norm of the learned weight strictly increases through the gradient flow.
7: Lastly, we prove the global convergence of single ReLU neuron for $d=2$ case.
8: % , \add{which is experimentally shown to be the best result under our assumption.}
9: \end{abstract}
10: