1: \begin{abstract}
2: In statistical dialogue management, the dialogue manager learns a policy that maps a belief state to an action for the system to perform. Efficient exploration is key to successful policy optimisation. Current deep reinforcement learning methods are very promising but rely on $\varepsilon$-greedy exploration, thus subjecting the user to a random choice of action during learning.
3: Alternative approaches such as Gaussian Process SARSA (GPSARSA) estimate uncertainties and are sample efficient, leading to better user experience, but on the expense of a greater computational complexity.
4: This paper examines approaches to extract uncertainty estimates from deep Q-networks (DQN) in the context of dialogue management.
5: We perform an extensive benchmark of deep Bayesian methods to extract uncertainty estimates, namely Bayes-By-Backprop, dropout, its concrete variation, bootstrapped ensemble and $\alpha$-divergences,
6: combining it with DQN algorithm.
7:
8: %, and show that dropout is a promising simple method worth further investigation.
9: %We find that BBQN achieves faster convergence to an optimal policy than any other method, and reaches performance comparable to the state of the art, but without the high computational complexity of GPSARSA. %We also implement $\alpha$-divergences, variational dropout, and minimizing the negative log likelihood as other means to extract uncertainty estimates from DQN, and compare performance to BBQN and DQN. This work is carried within in the Cambridge University Engineering Department dialogue systems toolkit, CUED-pydial.
10: \end{abstract}
11: