1: \begin{abstract}
2: SARSA, a classical on-policy control algorithm for reinforcement learning,
3: is known to \emph{chatter} when combined with linear function approximation:
4: SARSA does not diverge but oscillates in a bounded region.
5: However,
6: little is known about how fast SARSA converges to that region and how large the region is.
7: In this paper,
8: we make progress towards this open problem by showing the convergence rate of projected SARSA to a bounded region.
9: Importantly,
10: the region
11: is much smaller than the region that we project into,
12: provided that the magnitude of the reward is not too large.
13: % Our analysis applies to expected SARSA as well as SARSA($\lambda$).
14: Existing works regarding the convergence of linear SARSA to a fixed point all require the Lipschitz constant of SARSA's policy improvement operator to be sufficiently small;
15: our analysis instead applies to arbitrary Lipschitz constants
16: and thus characterizes the behavior of linear SARSA for a new regime.
17: \end{abstract}
18: