abstract:b35c5ab706d74e00.tex

1: \begin{abstract}

2: Full-sampling (e.g., Q-learning) and pure-expectation (e.g., Expected Sarsa) algorithms are efficient and frequently used techniques in reinforcement learning.

3: Q$(\sigma,\lambda)$ is the first approach unifies them with eligibility trace through the sampling degree $\sigma$.

4: However, it is limited to the tabular case, for large-scale learning, the Q$(\sigma,\lambda)$ is too expensive to require a huge volume of tables to accurately storage value functions.

5: To address above problem,  we propose a GQ$(\sigma,\lambda)$

6: that extends tabular Q$(\sigma,\lambda)$ with linear function approximation.

7: We prove the convergence of GQ$(\sigma,\lambda)$.

8: Empirical results on some standard domains show that GQ$(\sigma,\lambda)$ with a combination of full-sampling with pure-expectation reach a better performance than full-sampling and pure-expectation methods.

9: \end{abstract}

10: