1: \begin{abstract}
2: Recently, several studies consider the stochastic optimization problem
3: but in a heavy-tailed noise regime, i.e., the difference between the
4: stochastic gradient and the true gradient is assumed to have a finite
5: $p$-th moment (say being upper bounded by $\sigma^{p}$ for some
6: $\sigma\geq0$) where $p\in(1,2]$, which not only generalizes the
7: traditional finite variance assumption ($p=2$) but also has been
8: observed in practice for several different tasks. Under this challenging
9: assumption, lots of new progress has been made for either convex or
10: nonconvex problems, however, most of which only consider smooth objectives.
11: In contrast, people have not fully explored and well understood this
12: problem when functions are nonsmooth. This paper aims to fill this
13: crucial gap by providing a comprehensive analysis of stochastic nonsmooth
14: convex optimization with heavy-tailed noises. We revisit a simple
15: clipping-based algorithm, whereas, which is only proved to converge
16: in expectation but under the additional strong convexity assumption.
17: Under appropriate choices of parameters, for both convex and strongly
18: convex functions, we not only establish the first high-probability
19: rates but also give refined in-expectation bounds compared with existing
20: works. Remarkably, all of our results are optimal (or nearly optimal
21: up to logarithmic factors) with respect to the time horizon $T$ even
22: when $T$ is unknown in advance. Additionally, we show how to make
23: the algorithm parameter-free with respect to $\sigma$, in other words,
24: the algorithm can still guarantee convergence without any prior knowledge
25: of $\sigma$. Furthermore, an initial distance adaptive convergence
26: rate is provided if $\sigma$ is assumed to be known.
27: \end{abstract}
28: