1: \begin{abstract}
2: In this paper, a sparse Markov decision process (MDP) with novel causal sparse Tsallis entropy regularization is proposed.
3: The proposed policy regularization induces a sparse and multi-modal optimal policy distribution of a sparse MDP.
4: The full mathematical analysis of the proposed sparse MDP is provided.
5: We first analyze the optimality condition of a sparse MDP.
6: Then, we propose a sparse value iteration method which solves a sparse
7: MDP and then prove the convergence and optimality of sparse value
8: iteration using the Banach fixed point theorem.
9: The proposed sparse MDP is compared to soft MDPs which utilize causal
10: entropy regularization.
11: We show that the performance error of a sparse MDP has a constant bound,
12: while the error of a soft MDP increases logarithmically with
13: respect to the number of actions, where this performance error is
14: caused by the introduced regularization term.
15: In experiments,
16: we apply sparse MDPs to reinforcement learning problems.
17: The proposed method outperforms existing methods in terms of the
18: convergence speed and performance.
19: \end{abstract}
20: