9632647c3b073d6b.tex
1: \begin{abstract}
2: Nesterov's accelerated gradient
3: method for minimizing a smooth strongly convex function $f$ is
4: known to reduce $f(\x_k)-f(\x^*)$ by a factor 
5: of $\eps\in(0,1)$ after $k\ge O(\sqrt{L/\ell}\log(1/\eps))$ iterations, where
6: $\ell,L$ are the two parameters of smooth strong convexity.   Furthermore,
7: it is known that this is the best possible complexity in the function-gradient oracle
8: model of computation.  The method of linear conjugate gradients (CG) 
9: also satisfies the same
10: complexity bound in the special case of strongly convex quadratic functions,
11: but in this special case it is faster than the accelerated
12: gradient method.
13: 
14: Despite similarities in the algorithms and their
15: asymptotic convergence rates, the conventional analyses of the two methods are
16: nearly disjoint.  The purpose of this note is provide a single
17: quantity that decreases on every step at the correct rate for both
18: algorithms.
19: Our unified
20: bound is based on a potential similar to the potential
21: in Nesterov's original analysis.
22: 
23: 
24: As a side benefit of this analysis, we provide a direct proof that
25: conjugate gradient converges in $O(\sqrt{L/\ell}\log(1/\eps))$ iterations.
26: In contrast, the traditional indirect proof first establishes this
27: result for the Chebyshev algorithm, and then relies on optimality
28: of conjugate gradient to show that its iterates are at least as
29: good as Chebyshev iterates.  To the best of our knowledge, ours
30: is the first direct proof of the convergence rate of linear
31: conjugate gradient in the literature.
32: \end{abstract}