82bc0d87acd1d19d.tex
1: \begin{abstract}
2: In this paper, we present a new class of Markov decision processes
3: (MDPs), called Tsallis MDPs, with Tsallis entropy maximization, which
4: generalizes existing maximum entropy reinforcement learning (RL).
5: A Tsallis MDP provides a unified framework for the original RL problem
6: and RL with various types of entropy, including the well-known
7: standard Shannon-Gibbs (SG) entropy, using an additional real-valued
8: parameter, called an \textit{entropic index}.
9: By controlling the entropic index, we can generate various types of
10: entropy, including the SG entropy, and a different entropy
11: results in a different class of the optimal policy in Tsallis MDPs.
12: We also provide a full mathematical analysis of Tsallis MDPs, including
13: the optimality condition, performance error bounds, and convergence.
14: Our theoretical result enables us to use any positive entropic index
15: in RL. 
16: To handle complex and large-scale problems, we propose a model-free
17: actor-critic RL method using Tsallis entropy maximization.  
18: We evaluate the regularization effect of the Tsallis entropy with
19: various values of entropic indices and show that the entropic index
20: controls the exploration tendency of the proposed method. 
21: For a different type of RL problems, we find that a different value
22: of the entropic index is desirable.
23: The proposed method is evaluated using the MuJoCo simulator and 
24: achieves the state-of-the-art performance.
25: \end{abstract}