b35c5ab706d74e00.tex
1: \begin{abstract}
2: Full-sampling (e.g., Q-learning) and pure-expectation (e.g., Expected Sarsa) algorithms are efficient and frequently used techniques in reinforcement learning.
3: Q$(\sigma,\lambda)$ is the first approach unifies them with eligibility trace through the sampling degree $\sigma$.
4: However, it is limited to the tabular case, for large-scale learning, the Q$(\sigma,\lambda)$ is too expensive to require a huge volume of tables to accurately storage value functions.
5: To address above problem,  we propose a GQ$(\sigma,\lambda)$
6: that extends tabular Q$(\sigma,\lambda)$ with linear function approximation.
7: We prove the convergence of GQ$(\sigma,\lambda)$.
8: Empirical results on some standard domains show that GQ$(\sigma,\lambda)$ with a combination of full-sampling with pure-expectation reach a better performance than full-sampling and pure-expectation methods.
9: \end{abstract}
10: