aa34ba6761693054.tex
1: \begin{abstract}
2: 	We study the training of regularized neural networks where the regularizer can be non-smooth and non-convex. We propose
3: 	a unified framework for stochastic proximal gradient descent, which we term \textsc{ProxGen}, that allows for arbitrary positive preconditioners and lower semi-continuous regularizers. Our framework encompasses standard stochastic proximal gradient methods \emph{without} preconditioners as special cases, which have been extensively studied in various settings.
4: 	Not only that, we present two important update rules beyond the well-known standard methods as a byproduct of our approach: (i) the first closed-form proximal mappings of $\ell_q$ regularization $(0 \leq q \leq 1)$ for \emph{adaptive} stochastic gradient methods, and (ii) a revised version of \textsc{ProxQuant} \cite{bai2019proxquant} %[Bai et al, 2019]
5: 	that fixes a caveat of the original approach for quantization-specific regularizers.
6: 	We analyze the convergence of \textsc{ProxGen} and show that the whole family of \textsc{ProxGen} enjoys the same convergence rate as stochastic proximal gradient descent without preconditioners.
7: 	We also empirically show the superiority of proximal methods compared to subgradient-based approaches via extensive experiments. Interestingly, our results indicate that proximal methods with non-convex regularizers are more effective than those with convex regularizers.
8: \end{abstract}
9: