e3834084fec17d3e.tex
1: \begin{abstract}
2:     %Enhancing \gls{rl} with variational quantum circuits as function approximators is a promising approach to utilizing the current noisy quantum hardware. One critical technical factor in both classical and quantum \gls{rl} is the sample complexity, as interaction with the environment is potentially costly.
3:     %i.e.\ the required interactions with the environment.
4:     %This is also closely linked to the trainability of the underlying model.
5:     %To address this for function approximation in policy space, 
6:     Reinforcement learning is a growing field in AI with a lot of potential. Intelligent behavior is learned automatically through trial and error in interaction with the environment. However, this learning process is often costly. Using \glsentrylongpl{vqc} as function approximators potentially can reduce this cost. 
7:     In order to implement this, we propose the \gls{qnpg} algorithm -- a second-order gradient-based routine that takes advantage of an efficient approximation of the quantum Fisher information matrix. We experimentally demonstrate that \gls{qnpg} outperforms first-order based training on different Contextual Bandits environments regarding convergence speed and stability and moreover reduces the sample complexity. Furthermore, we provide evidence for the practical feasibility of our approach by training on a $12$-qubit hardware device.
8: \end{abstract}
9: