1: \begin{abstract}
2: \textbf{This paper is dedicated to the memory of Boris Teodorovich Polyak.}
3:
4: In this paper, we study the convergence properties of the Stochastic
5: Gradient Descent (SGD) method for finding a stationary point
6: of a given objective function $J(\cdot)$.
7: The objective function is not required to be convex.
8: Rather, our results apply to a class of ``invex'' functions, which have the
9: property that every stationary point is also a global minimizer.
10: First, it is assumed that $J(\cdot)$ satisfies a property that
11: is slightly weaker than the Kurdyka-Lojasiewicz (KL) condition,
12: denoted here as (KL').
13: It is shown that the iterations $J(\bth_t)$ converge almost surely
14: to the global minimum of $J(\cdot)$.
15: Next, the hypothesis on $J(\cdot)$ is strengthened from (KL') to
16: the Polyak-Lojasiewicz (PL) condition.
17: With this stronger hypothesis, we derive estimates on the rate of
18: convergence of $J(\bth_t)$ to its limit.
19: Using these results, we show that for functions satisfying the PL property,
20: the convergence rate of both the objective function
21: and the norm of the gradient with SGD is the same as the best-possible rate for convex
22: functions.
23: While some results along these lines have been published in the past,
24: our contributions contain two distinct improvements.
25: First, the assumptions on the stochastic gradient are more general
26: than elsewhere, and second, our convergence is almost sure, and not
27: in expectation.
28: We also study SGD when only function evaluations are permitted.
29: In this setting, we determine the ``optimal'' increments or the size
30: of the perturbations.
31: Using the same set of ideas, we establish the global convergence
32: of the Stochastic Approximation (SA) algorithm under more general
33: assumptions on the measurement error, compared to the existing literature.
34: We also derive bounds on the rate of convergence of the SA algorithm
35: under appropriate assumptions.
36:
37: \end{abstract}
38: