abstract:f52c078f5a44b5af.tex

1: \begin{abstract}

2: Reinforcement learning (RL) typically defines a discount factor ($\gamma$) as part of the Markov Decision Process.

3: The discount factor values future rewards by an exponential scheme that leads to theoretical convergence guarantees of the Bellman equation.

4: However, evidence from psychology, economics and neuroscience suggests that humans and animals instead have \emph{hyperbolic} time-preferences ($\frac{1}{1 + kt}$ for $k>0$).

5: In this work we revisit the fundamentals of discounting in RL and bridge this disconnect by implementing an RL agent that acts via hyperbolic discounting.

6: We demonstrate that a simple approach approximates hyperbolic discount functions while still using familiar temporal-difference learning techniques in RL.

7: Additionally, and independent of hyperbolic discounting, we make a surprising discovery that simultaneously learning value functions over multiple time-horizons is an effective auxiliary task which often improves over a strong value-based RL agent, Rainbow.

8: \end{abstract}

9: