1: \begin{abstract}
2: Kakade's natural policy gradient method has been studied extensively in the last years showing linear convergence with and without regularization.
3: We study another natural gradient method which is based on the Fisher information matrix of the state-action distributions and has received little attention from the theoretical side.
4: Here, the state-action distributions follow the Fisher-Rao gradient flow inside the state-action polytope with respect to a linear potential.
5: Therefore, we study Fisher-Rao gradient flows of linear programs more generally and show linear convergence with a rate that depends on the geometry of the linear program.
6: Equivalently, this yields an estimate on the error induced by entropic regularization of the linear program which improves existing results.
7: We extend these results and show sublinear convergence for perturbed Fisher-Rao gradient flows and natural gradient flows up to an approximation error.
8: In particular, these general results cover the case of state-action natural policy gradients.
9: \\ \textbf{Keywords: }{
10: Fisher-Rao metric, linear program, entropic regularization, multi-player game, Markov decision process, natural policy gradient
11: }
12: \\ \textbf{MSC codes: }{
13: 65K05, 90C05, 90C08, 90C40, 90C53
14: }
15: \end{abstract}
16: