abstract:e75ceb513038ff6e.tex

1: \begin{abstract}

2: Kakade's natural policy gradient method has been studied extensively in the last years showing linear convergence with and without regularization.

3: We study another natural gradient method which is based on the Fisher information matrix of the state-action distributions and has received little attention from the theoretical side.

4: Here, the state-action distributions follow the Fisher-Rao gradient flow inside the state-action polytope with respect to a linear potential.

5: Therefore, we study Fisher-Rao gradient flows of linear programs more generally and show linear convergence with a rate that depends on the geometry of the linear program.

6: Equivalently, this yields an estimate on the error induced by entropic regularization of the linear program which improves existing results.

7: We extend these results and show sublinear convergence for perturbed Fisher-Rao gradient flows and natural gradient flows up to an approximation error.

8: In particular, these general results cover the case of state-action natural policy gradients.

9: \\ \textbf{Keywords: }{

10: Fisher-Rao metric, linear program, entropic regularization, multi-player game, Markov decision process, natural policy gradient

11: }

12: \\ \textbf{MSC codes: }{

13: 65K05, 90C05, 90C08, 90C40, 90C53

14: }

15: \end{abstract}

16: