400cd060926d8118.tex
1: \begin{abstract}
2: Asynchronous stochastic approximations (SAs) are an important class of model-free algorithms, tools and techniques that are popular in multi-agent
3: and distributed control scenarios. To counter Bellman's curse of dimensionality, such algorithms are coupled with function approximations. Although the learning/ control problem becomes more tractable, function approximations affect stability and convergence.
4: In this paper, we present verifiable sufficient conditions for stability and convergence of asynchronous SAs with biased approximation errors. The theory developed herein is used to analyze
5: Policy Gradient methods
6: and noisy Value Iteration schemes. 
7: Specifically, we analyze the asynchronous approximate counterparts of the policy gradient (A2PG)
8: and value iteration (A2VI) schemes.
9: It is shown that the stability of these algorithms is unaffected by
10: biased approximation errors, provided they are
11: asymptotically bounded. With respect to convergence (of A2VI and A2PG), a relationship between the limiting set and the approximation errors
12: is established. Finally, experimental results are presented that support the theory.
13: \end{abstract}
14: