c12f35930ca6b5aa.tex
1: \begin{abstract}
2: In recent years, a variety of tasks have been accomplished by deep reinforcement learning (DRL). 
3: However, when applying DRL to tasks in a real-world environment, designing an appropriate reward is difficult. 
4: Rewards obtained via actual hardware sensors may include noise, misinterpretation, or failed observations. 
5: The learning instability caused by these unstable signals is a problem that remains to be solved in DRL. 
6: In this work, we propose an approach that extends existing DRL models by adding a subtask to directly estimate the variance contained in the reward signal. 
7: The model then takes the feature map learned by the subtask in a critic network and sends it to the actor network. 
8: This enables stable learning that is robust to the effects of potential noise. 
9: The results of experiments in the Atari game domain with unstable reward signals show that our method stabilizes training convergence. 
10: We also discuss the extensibility of the model by visualizing feature maps. This approach has the potential to make DRL more practical for use in noisy, real-world scenarios.
11: 
12: \keywords{Deep reinforcement learning \and Uncertainty \and Variance branch.}
13: \end{abstract}
14: