1: \begin{abstract}
2: %% Cyclic coordinate descent methods have enjoyed much empirical
3: %% success in solving $\ell_1$ regularized smooth convex optimization
4: %% problems. Despite their widespread use, these methods lack finite
5: %% time convergence guarantees. We provide $O(1/k)$ (where $k$ is the
6: %% iteration counter) rates for two variants of cyclic coordinate
7: %% descent under certain isotonicity assumptions. We obtain these
8: %% rates by studying the relationship of the variants to each other,
9: %% and to the much better understood gradient descent method.
10:
11: Cyclic coordinate descent is a classic optimization method that has
12: witnessed a resurgence of interest in machine learning. Reasons for
13: this include its simplicity, speed and stability, as well as its
14: competitive performance on $\ell_1$ regularized smooth optimization
15: problems. Surprisingly, very little is known about its finite time
16: convergence behavior on these problems. Most existing results either
17: just prove convergence or provide asymptotic rates. We fill this gap
18: in the literature by proving $O(1/k)$ convergence rates (where $k$ is
19: the iteration counter) for two variants of cyclic coordinate descent
20: under an isotonicity assumption. Our analysis proceeds by comparing
21: the objective values attained by the two variants with each other, as
22: well as with the gradient descent algorithm. We show that the iterates
23: generated by the cyclic coordinate descent methods remain better than
24: those of gradient descent uniformly over time.
25:
26:
27: \end{abstract}
28: