abstract:00c1ef9b591d6eda.tex

1: \begin{abstract}

2:     % Distributional reinforcement learning aims to learn distribution of return under stochastic environments.

3:     % Since the learned distribution of return contains rich information about the stochasticity of the environment,

4:     % Previous research in distributional reinforcement learning has attempted to utilize estimated uncertainty, especially optimism in the face of uncertainty for exploration.

5:     Distributional reinforcement learning algorithms have attempted to utilize estimated uncertainty for exploration, such as optimism in the face of uncertainty.

6:     However, using the estimated variance for optimistic exploration may cause biased data collection and hinder convergence or performance.

7:     In this paper, we present a novel distributional reinforcement learning algorithm that selects actions by randomizing risk criterion to avoid one-sided tendency on risk.

8:     % without losing the risk-neutral objective.

9:     We provide a perturbed distributional Bellman optimality operator by distorting the risk measure and prove the convergence and optimality of the proposed method with the weaker contraction property.

10:     Our theoretical results support that the proposed method does not fall into biased exploration and is guaranteed to converge to an optimal return.

11:     % distribution.

12:     Finally, we empirically show that our method outperforms other existing distribution-based algorithms in various environments including Atari 55 games.

13:     % \TB{Distributional reinforcement learning framework provides the return distribution of each state-action pair, which existing methods utilize as risk measure or uncertainty to scale up reinforcement learning. Unfortunately, it has been challenging to provide a risk-neutral policy improvement method because of the bias of behavior policies which cannot generate a diverse data collection.}

14: \end{abstract}

15: