1: \begin{abstract}
2: % Distributional reinforcement learning aims to learn distribution of return under stochastic environments.
3: % Since the learned distribution of return contains rich information about the stochasticity of the environment,
4: % Previous research in distributional reinforcement learning has attempted to utilize estimated uncertainty, especially optimism in the face of uncertainty for exploration.
5: Distributional reinforcement learning algorithms have attempted to utilize estimated uncertainty for exploration, such as optimism in the face of uncertainty.
6: However, using the estimated variance for optimistic exploration may cause biased data collection and hinder convergence or performance.
7: In this paper, we present a novel distributional reinforcement learning algorithm that selects actions by randomizing risk criterion to avoid one-sided tendency on risk.
8: % without losing the risk-neutral objective.
9: We provide a perturbed distributional Bellman optimality operator by distorting the risk measure and prove the convergence and optimality of the proposed method with the weaker contraction property.
10: Our theoretical results support that the proposed method does not fall into biased exploration and is guaranteed to converge to an optimal return.
11: % distribution.
12: Finally, we empirically show that our method outperforms other existing distribution-based algorithms in various environments including Atari 55 games.
13: % \TB{Distributional reinforcement learning framework provides the return distribution of each state-action pair, which existing methods utilize as risk measure or uncertainty to scale up reinforcement learning. Unfortunately, it has been challenging to provide a risk-neutral policy improvement method because of the bias of behavior policies which cannot generate a diverse data collection.}
14: \end{abstract}
15: