00c1ef9b591d6eda.tex
1: \begin{abstract}
2:     % Distributional reinforcement learning aims to learn distribution of return under stochastic environments.
3:     % Since the learned distribution of return contains rich information about the stochasticity of the environment, 
4:     % Previous research in distributional reinforcement learning has attempted to utilize estimated uncertainty, especially optimism in the face of uncertainty for exploration.
5:     Distributional reinforcement learning algorithms have attempted to utilize estimated uncertainty for exploration, such as optimism in the face of uncertainty.
6:     However, using the estimated variance for optimistic exploration may cause biased data collection and hinder convergence or performance.
7:     In this paper, we present a novel distributional reinforcement learning algorithm that selects actions by randomizing risk criterion to avoid one-sided tendency on risk.
8:     % without losing the risk-neutral objective.
9:     We provide a perturbed distributional Bellman optimality operator by distorting the risk measure and prove the convergence and optimality of the proposed method with the weaker contraction property. 
10:     Our theoretical results support that the proposed method does not fall into biased exploration and is guaranteed to converge to an optimal return. 
11:     % distribution.
12:     Finally, we empirically show that our method outperforms other existing distribution-based algorithms in various environments including Atari 55 games.
13:     % \TB{Distributional reinforcement learning framework provides the return distribution of each state-action pair, which existing methods utilize as risk measure or uncertainty to scale up reinforcement learning. Unfortunately, it has been challenging to provide a risk-neutral policy improvement method because of the bias of behavior policies which cannot generate a diverse data collection.}
14: \end{abstract}
15: