abstract:aa34ba6761693054.tex

1: \begin{abstract}

2: 	We study the training of regularized neural networks where the regularizer can be non-smooth and non-convex. We propose

3: 	a unified framework for stochastic proximal gradient descent, which we term \textsc{ProxGen}, that allows for arbitrary positive preconditioners and lower semi-continuous regularizers. Our framework encompasses standard stochastic proximal gradient methods \emph{without} preconditioners as special cases, which have been extensively studied in various settings.

4: 	Not only that, we present two important update rules beyond the well-known standard methods as a byproduct of our approach: (i) the first closed-form proximal mappings of $\ell_q$ regularization $(0 \leq q \leq 1)$ for \emph{adaptive} stochastic gradient methods, and (ii) a revised version of \textsc{ProxQuant} \cite{bai2019proxquant} %[Bai et al, 2019]

5: 	that fixes a caveat of the original approach for quantization-specific regularizers.

6: 	We analyze the convergence of \textsc{ProxGen} and show that the whole family of \textsc{ProxGen} enjoys the same convergence rate as stochastic proximal gradient descent without preconditioners.

7: 	We also empirically show the superiority of proximal methods compared to subgradient-based approaches via extensive experiments. Interestingly, our results indicate that proximal methods with non-convex regularizers are more effective than those with convex regularizers.

8: \end{abstract}

9: