abstract:656ad3a0325b3318.tex

1: \begin{abstract}

2: In this paper, a sparse Markov decision process (MDP) with novel causal sparse Tsallis entropy regularization is proposed.

3: The proposed policy regularization induces a sparse and multi-modal optimal policy distribution of a sparse MDP.

4: The full mathematical analysis of the proposed sparse MDP is provided.

5: We first analyze the optimality condition of a sparse MDP.

6: Then, we propose a sparse value iteration method which solves a sparse

7: MDP and then prove the convergence and optimality of sparse value

8: iteration using the Banach fixed point theorem.

9: The proposed sparse MDP is compared to soft MDPs which utilize causal

10: entropy regularization.

11: We show that the performance error of a sparse MDP has a constant bound,

12: while the error of a soft MDP increases logarithmically with

13: respect to the number of actions, where this performance error is

14: caused by the introduced regularization term.

15: In experiments,

16: we apply sparse MDPs to reinforcement learning problems.

17: The proposed method outperforms existing methods in terms of the

18: convergence speed and performance.

19: \end{abstract}

20: