abstract:5c55bf66413974a2.tex

1: \begin{abstract}

2:   %% Cyclic coordinate descent methods have enjoyed much empirical

3: %%   success in solving $\ell_1$ regularized smooth convex optimization

4: %%   problems.  Despite their widespread use, these methods lack finite

5: %%   time convergence guarantees.  We provide $O(1/k)$ (where $k$ is the

6: %%   iteration counter) rates for two variants of cyclic coordinate

7: %%   descent under certain isotonicity assumptions.  We obtain these

8: %%   rates by studying the relationship of the variants to each other,

9: %%   and to the much better understood gradient descent method.

10:

11: Cyclic coordinate descent is a classic optimization method that has

12: witnessed a resurgence of interest in machine learning. Reasons for

13: this include its simplicity, speed and stability, as well as its

14: competitive performance on $\ell_1$ regularized smooth optimization

15: problems.  Surprisingly, very little is known about its finite time

16: convergence behavior on these problems. Most existing results either

17: just prove convergence or provide asymptotic rates. We fill this gap

18: in the literature by proving $O(1/k)$ convergence rates (where $k$ is

19: the iteration counter) for two variants of cyclic coordinate descent

20: under an isotonicity assumption. Our analysis proceeds by comparing

21: the objective values attained by the two variants with each other, as

22: well as with the gradient descent algorithm. We show that the iterates

23: generated by the cyclic coordinate descent methods remain better than

24: those of gradient descent uniformly over time.

25:

26:

27: \end{abstract}

28: