1: \begin{abstract}
2: We apply diffusion strategies to develop a fully-distributed cooperative reinforcement learning algorithm
3: in which agents in a network communicate only with their immediate neighbors
4: to improve predictions about their environment.
5: The algorithm can also be applied to off-policy learning,
6: meaning that the agents can predict the response to a behavior
7: different from the actual policies they are following.
8: The proposed distributed strategy is efficient,
9: with linear complexity in both computation time and memory footprint.
10: We provide a mean-square-error performance analysis and
11: establish convergence under constant step-size updates,
12: which endow the network with continuous learning capabilities.
13: The results show a clear gain from cooperation:
14: when the individual agents can estimate the solution,
15: cooperation increases stability and reduces bias and variance of the prediction error;
16: but, more importantly,
17: the network is able to approach the optimal solution
18: even when none of the individual agents can
19: (e.g., when the individual behavior policies restrict each agent to sample a small portion of the state space).
20: \end{abstract}
21: