abstract:6456fed6d99eac79.tex

1: \begin{abstract}

2: We apply diffusion strategies to develop a fully-distributed cooperative reinforcement learning algorithm

3: in which agents in a network communicate only with their immediate neighbors

4: to improve predictions about their environment.

5: The algorithm can also be applied to off-policy learning,

6: meaning that the agents can predict the response to a behavior

7: different from the actual policies they are following.

8: The proposed distributed strategy is efficient,

9: with linear complexity in both computation time and memory footprint.

10: We provide a mean-square-error performance analysis and

11: establish convergence under constant step-size updates,

12: which endow the network with continuous learning capabilities.

13: The results show a clear gain from cooperation:

14: when the individual agents can estimate the solution,

15: cooperation increases stability and reduces bias and variance of the prediction error;

16: but, more importantly,

17: the network is able to approach the optimal solution

18: even when none of the individual agents can

19: (e.g., when the individual behavior policies restrict each agent to sample a small portion of the state space).

20: \end{abstract}

21: