abstract:82bc0d87acd1d19d.tex

1: \begin{abstract}

2: In this paper, we present a new class of Markov decision processes

3: (MDPs), called Tsallis MDPs, with Tsallis entropy maximization, which

4: generalizes existing maximum entropy reinforcement learning (RL).

5: A Tsallis MDP provides a unified framework for the original RL problem

6: and RL with various types of entropy, including the well-known

7: standard Shannon-Gibbs (SG) entropy, using an additional real-valued

8: parameter, called an \textit{entropic index}.

9: By controlling the entropic index, we can generate various types of

10: entropy, including the SG entropy, and a different entropy

11: results in a different class of the optimal policy in Tsallis MDPs.

12: We also provide a full mathematical analysis of Tsallis MDPs, including

13: the optimality condition, performance error bounds, and convergence.

14: Our theoretical result enables us to use any positive entropic index

15: in RL.

16: To handle complex and large-scale problems, we propose a model-free

17: actor-critic RL method using Tsallis entropy maximization.

18: We evaluate the regularization effect of the Tsallis entropy with

19: various values of entropic indices and show that the entropic index

20: controls the exploration tendency of the proposed method.

21: For a different type of RL problems, we find that a different value

22: of the entropic index is desirable.

23: The proposed method is evaluated using the MuJoCo simulator and

24: achieves the state-of-the-art performance.

25: \end{abstract}