5c55bf66413974a2.tex
1: \begin{abstract}
2:   %% Cyclic coordinate descent methods have enjoyed much empirical
3: %%   success in solving $\ell_1$ regularized smooth convex optimization
4: %%   problems.  Despite their widespread use, these methods lack finite
5: %%   time convergence guarantees.  We provide $O(1/k)$ (where $k$ is the
6: %%   iteration counter) rates for two variants of cyclic coordinate
7: %%   descent under certain isotonicity assumptions.  We obtain these
8: %%   rates by studying the relationship of the variants to each other,
9: %%   and to the much better understood gradient descent method.
10: 
11: Cyclic coordinate descent is a classic optimization method that has
12: witnessed a resurgence of interest in machine learning. Reasons for
13: this include its simplicity, speed and stability, as well as its
14: competitive performance on $\ell_1$ regularized smooth optimization
15: problems.  Surprisingly, very little is known about its finite time
16: convergence behavior on these problems. Most existing results either
17: just prove convergence or provide asymptotic rates. We fill this gap
18: in the literature by proving $O(1/k)$ convergence rates (where $k$ is
19: the iteration counter) for two variants of cyclic coordinate descent
20: under an isotonicity assumption. Our analysis proceeds by comparing
21: the objective values attained by the two variants with each other, as
22: well as with the gradient descent algorithm. We show that the iterates
23: generated by the cyclic coordinate descent methods remain better than
24: those of gradient descent uniformly over time.
25:  
26: 
27: \end{abstract}
28: