13a96a1a155f2bc6.tex
1: \begin{abstract}
2:     SARSA, a classical on-policy control algorithm for reinforcement learning,
3:     is known to \emph{chatter} when combined with linear function approximation:
4:     SARSA does not diverge but oscillates in a bounded region. 
5:     However,
6:     little is known about how fast SARSA converges to that region and how large the region is.
7:     In this paper,
8:     we make progress towards this open problem by showing the convergence rate of projected SARSA to a bounded region.
9:     Importantly,
10:     the region
11:     is much smaller than the region that we project into, 
12:     provided that the magnitude of the reward is not too large.
13:     % Our analysis applies to expected SARSA as well as SARSA($\lambda$).
14:     Existing works regarding the convergence of linear SARSA to a fixed point all require the Lipschitz constant of SARSA's policy improvement operator to be sufficiently small;
15:     our analysis instead applies to arbitrary Lipschitz constants 
16:     and thus characterizes the behavior of linear SARSA for a new regime.  
17: \end{abstract}
18: