1: \begin{abstract}
2:
3: In this paper, we study the well-known ``Heavy Ball''
4: method for convex and nonconvex optimization introduced by
5: Polyak in 1964, and establish its convergence under
6: a variety of situations.
7: Traditionally, most algorithms use ``full-coordinate update,''
8: that is, at each step, \textit{every component} of the argument is updated.
9: However, when the dimension of the argument is very high,
10: it is more efficient to update \textit{some but not all} components of
11: the argument at each iteration.
12: We refer to this as ``batch updating'' in this paper.
13:
14: When gradient-based algorithms are used together with batch updating,
15: in principle it is sufficient to compute only those components of the
16: gradient for which the argument is to be updated.
17: However, if a method such as backpropagation is used to compute these
18: components, computing only \textit{some components}
19: of gradient does not offer much savings over computing
20: the entire gradient.
21: Therefore, to achieve a noticeable reduction in CPU usage at each step,
22: one can use \textit{first-order differences} to approximate the
23: gradient.
24: The resulting estimates are \textit{biased}, and also have
25: \textit{unbounded variance}.
26: Thus some delicate analysis is required to ensure that the HB
27: algorithm converge when batch updating is used instead of full-coordinate
28: updating, and/or approximate gradients are used instead of true gradients.
29: % Stating and proving such theorems is the objective of the present paper.
30: In this paper, we establish the almost sure convergence of the iterations to
31: the stationary point(s) of the objective function under suitable
32: conditions; in addition, we also derive upper
33: bounds on the \textit{rate of convergence.}
34: To the best of our knowledge, there is no other paper that combines
35: all of these features.
36:
37: \textbf{This paper is dedicated to the memory of Boris Teodorovich Polyak}
38:
39: \end{abstract}
40: