1: \begin{abstract}
2: Coordinate descent methods usually minimize a cost function by updating a random decision variable (corresponding to one coordinate) at a time.
3: %
4: Ideally, we would update the decision variable that yields the largest decrease in the cost function.
5: %
6: However, finding this coordinate would require checking all of them, which would effectively negate the improvement in computational tractability that coordinate descent is intended to afford.
7: %
8: To address this, we propose a new adaptive method for selecting a coordinate.
9: %
10: First, we find a lower bound on the amount the cost function decreases when a coordinate is updated.
11: %
12: We then use a multi-armed bandit algorithm to learn which coordinates result in the largest lower bound by {interleaving} this learning with conventional coordinate descent updates except that the coordinate is selected proportionately to the expected decrease.
13: %
14: We show that our approach improves the convergence of coordinate descent methods both theoretically and experimentally.
15: \end{abstract}
16: