3a38ddc3aac39aa4.tex
1: \begin{abstract}
2: We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime, where the networks' biases are initialized to some constant rather than zero.
3: We prove that under such initialization, the neural network will have sparse activation throughout the entire training process, which enables fast training procedures
4: via some sophisticated computational methods. With such initialization, we show that the neural networks possess a different limiting kernel which we call \textit{bias-generalized NTK}, and we study various properties of the neural networks with this new kernel.
5: We first characterize the gradient descent dynamics. 
6: In particular, we show that the network in this case can achieve as fast convergence as the dense network, as opposed to the previous work suggesting that the sparse networks converge slower. 
7: In addition, our result improves the previous required width to ensure convergence.
8: Secondly, we study the networks' generalization: we show a width-sparsity dependence, which yields a sparsity-dependent Rademacher complexity and generalization bound. 
9: To our knowledge, this is the first sparsity-dependent generalization result via Rademacher complexity. 
10: Lastly, we study the smallest eigenvalue of this new kernel.
11: We identify a data-dependent region where we can derive a much sharper lower bound on the NTK's smallest eigenvalue than the worst-case bound previously known. This can lead to improvement in the generalization bound. 
12: \end{abstract}
13: