1: \begin{abstract}
2: We consider the problem of computing optimal policies in average-reward Markov decision processes.
3: This classical problem can be formulated as a linear program directly amenable to saddle-point
4: optimization methods, albeit with a number of variables that is linear in the number of states. To
5: address this issue, recent work has considered a linearly relaxed version of the resulting
6: saddle-point problem. Our work aims at achieving a better understanding of this relaxed
7: optimization problem by characterizing the conditions necessary for convergence to the
8: optimal policy, and designing an optimization algorithm enjoying fast convergence rates that are
9: independent of the size of the state space. Notably, our characterization points out some potential
10: issues with previous work.
11: \end{abstract}
12: