abstract:c12f35930ca6b5aa.tex

1: \begin{abstract}

2: In recent years, a variety of tasks have been accomplished by deep reinforcement learning (DRL).

3: However, when applying DRL to tasks in a real-world environment, designing an appropriate reward is difficult.

4: Rewards obtained via actual hardware sensors may include noise, misinterpretation, or failed observations.

5: The learning instability caused by these unstable signals is a problem that remains to be solved in DRL.

6: In this work, we propose an approach that extends existing DRL models by adding a subtask to directly estimate the variance contained in the reward signal.

7: The model then takes the feature map learned by the subtask in a critic network and sends it to the actor network.

8: This enables stable learning that is robust to the effects of potential noise.

9: The results of experiments in the Atari game domain with unstable reward signals show that our method stabilizes training convergence.

10: We also discuss the extensibility of the model by visualizing feature maps. This approach has the potential to make DRL more practical for use in noisy, real-world scenarios.

11:

12: \keywords{Deep reinforcement learning \and Uncertainty \and Variance branch.}

13: \end{abstract}

14: