580a59a4b509b9bd.tex
1: \begin{abstract}
2: We consider the problem of computing optimal policies in average-reward Markov decision processes. 
3: This classical problem can be formulated as a linear program directly amenable to saddle-point 
4: optimization methods, albeit with a number of variables that is linear in the number of states. To 
5: address this issue, recent work has considered a linearly relaxed version of the resulting 
6: saddle-point problem. Our work aims at achieving a better understanding of this relaxed 
7: optimization problem by characterizing the conditions necessary for convergence to the 
8: optimal policy, and designing an optimization algorithm enjoying fast convergence rates that are
9: independent of the size of the state space. Notably, our characterization points out some potential 
10: issues with previous work.
11: \end{abstract}
12: