f52c078f5a44b5af.tex
1: \begin{abstract}
2: Reinforcement learning (RL) typically defines a discount factor ($\gamma$) as part of the Markov Decision Process.
3: The discount factor values future rewards by an exponential scheme that leads to theoretical convergence guarantees of the Bellman equation.
4: However, evidence from psychology, economics and neuroscience suggests that humans and animals instead have \emph{hyperbolic} time-preferences ($\frac{1}{1 + kt}$ for $k>0$).  
5: In this work we revisit the fundamentals of discounting in RL and bridge this disconnect by implementing an RL agent that acts via hyperbolic discounting.
6: We demonstrate that a simple approach approximates hyperbolic discount functions while still using familiar temporal-difference learning techniques in RL.  
7: Additionally, and independent of hyperbolic discounting, we make a surprising discovery that simultaneously learning value functions over multiple time-horizons is an effective auxiliary task which often improves over a strong value-based RL agent, Rainbow.
8: \end{abstract}
9: