1: \begin{abstract}
2: In recent years, a variety of tasks have been accomplished by deep reinforcement learning (DRL).
3: However, when applying DRL to tasks in a real-world environment, designing an appropriate reward is difficult.
4: Rewards obtained via actual hardware sensors may include noise, misinterpretation, or failed observations.
5: The learning instability caused by these unstable signals is a problem that remains to be solved in DRL.
6: In this work, we propose an approach that extends existing DRL models by adding a subtask to directly estimate the variance contained in the reward signal.
7: The model then takes the feature map learned by the subtask in a critic network and sends it to the actor network.
8: This enables stable learning that is robust to the effects of potential noise.
9: The results of experiments in the Atari game domain with unstable reward signals show that our method stabilizes training convergence.
10: We also discuss the extensibility of the model by visualizing feature maps. This approach has the potential to make DRL more practical for use in noisy, real-world scenarios.
11:
12: \keywords{Deep reinforcement learning \and Uncertainty \and Variance branch.}
13: \end{abstract}
14: