252c1a7c4574023b.tex
1: \begin{abstract}
2: We extend the provably convergent Full Gradient DQN algorithm for discounted reward Markov decision processes from \cite{FGDQN} to average reward problems. We experimentally compare widely used RVI Q-Learning with recently proposed Differential Q-Learning in the neural function approximation setting with Full Gradient DQN and DQN. We also extend this to learn Whittle indices for Markovian restless multi-armed bandits. We observe a better convergence rate of the proposed Full Gradient variant across different tasks.
3: 
4:  % We propose a provably convergent variant of DQN, viz. Full Gradient DQN from \cite{FGDQN} for the average reward criterion. We extend this to solve restless multi-armed bandit problems using Q-learning for complex problem domains. We experimentally compare widely Relative Value Iteration Q-learning and Differential Q-learning with function approximation based on Full Gradient DQN and DQN. We observe a significant boost in convergence rate for the full gradient variant for both algorithms across various problems. 
5:  \blfootnote{Additional results, experimental details and analysis can be found in the \href{https://drive.google.com/file/d/1YV9w0-IgEjEZOhmIxIFgewK__tLMazal/view?usp=sharing}{extended appendix}}
6: \end{abstract}
7: