abstract:13a96a1a155f2bc6.tex

1: \begin{abstract}

2:     SARSA, a classical on-policy control algorithm for reinforcement learning,

3:     is known to \emph{chatter} when combined with linear function approximation:

4:     SARSA does not diverge but oscillates in a bounded region.

5:     However,

6:     little is known about how fast SARSA converges to that region and how large the region is.

7:     In this paper,

8:     we make progress towards this open problem by showing the convergence rate of projected SARSA to a bounded region.

9:     Importantly,

10:     the region

11:     is much smaller than the region that we project into,

12:     provided that the magnitude of the reward is not too large.

13:     % Our analysis applies to expected SARSA as well as SARSA($\lambda$).

14:     Existing works regarding the convergence of linear SARSA to a fixed point all require the Lipschitz constant of SARSA's policy improvement operator to be sufficiently small;

15:     our analysis instead applies to arbitrary Lipschitz constants

16:     and thus characterizes the behavior of linear SARSA for a new regime.

17: \end{abstract}

18: