abstract:58cc0f11ff9299fb.tex

1: \begin{abstract}

2: Recently, several studies consider the stochastic optimization problem

3: but in a heavy-tailed noise regime, i.e., the difference between the

4: stochastic gradient and the true gradient is assumed to have a finite

5: $p$-th moment (say being upper bounded by $\sigma^{p}$ for some

6: $\sigma\geq0$) where $p\in(1,2]$, which not only generalizes the

7: traditional finite variance assumption ($p=2$) but also has been

8: observed in practice for several different tasks. Under this challenging

9: assumption, lots of new progress has been made for either convex or

10: nonconvex problems, however, most of which only consider smooth objectives.

11: In contrast, people have not fully explored and well understood this

12: problem when functions are nonsmooth. This paper aims to fill this

13: crucial gap by providing a comprehensive analysis of stochastic nonsmooth

14: convex optimization with heavy-tailed noises. We revisit a simple

15: clipping-based algorithm, whereas, which is only proved to converge

16: in expectation but under the additional strong convexity assumption.

17: Under appropriate choices of parameters, for both convex and strongly

18: convex functions, we not only establish the first high-probability

19: rates but also give refined in-expectation bounds compared with existing

20: works. Remarkably, all of our results are optimal (or nearly optimal

21: up to logarithmic factors) with respect to the time horizon $T$ even

22: when $T$ is unknown in advance. Additionally, we show how to make

23: the algorithm parameter-free with respect to $\sigma$, in other words,

24: the algorithm can still guarantee convergence without any prior knowledge

25: of $\sigma$. Furthermore, an initial distance adaptive convergence

26: rate is provided if $\sigma$ is assumed to be known.

27: \end{abstract}

28: