abstract:9632647c3b073d6b.tex

1: \begin{abstract}

2: Nesterov's accelerated gradient

3: method for minimizing a smooth strongly convex function $f$ is

4: known to reduce $f(\x_k)-f(\x^*)$ by a factor

5: of $\eps\in(0,1)$ after $k\ge O(\sqrt{L/\ell}\log(1/\eps))$ iterations, where

6: $\ell,L$ are the two parameters of smooth strong convexity.   Furthermore,

7: it is known that this is the best possible complexity in the function-gradient oracle

8: model of computation.  The method of linear conjugate gradients (CG)

9: also satisfies the same

10: complexity bound in the special case of strongly convex quadratic functions,

11: but in this special case it is faster than the accelerated

12: gradient method.

13:

14: Despite similarities in the algorithms and their

15: asymptotic convergence rates, the conventional analyses of the two methods are

16: nearly disjoint.  The purpose of this note is provide a single

17: quantity that decreases on every step at the correct rate for both

18: algorithms.

19: Our unified

20: bound is based on a potential similar to the potential

21: in Nesterov's original analysis.

22:

23:

24: As a side benefit of this analysis, we provide a direct proof that

25: conjugate gradient converges in $O(\sqrt{L/\ell}\log(1/\eps))$ iterations.

26: In contrast, the traditional indirect proof first establishes this

27: result for the Chebyshev algorithm, and then relies on optimality

28: of conjugate gradient to show that its iterates are at least as

29: good as Chebyshev iterates.  To the best of our knowledge, ours

30: is the first direct proof of the convergence rate of linear

31: conjugate gradient in the literature.

32: \end{abstract}