abstract:b92e1f5403d90568.tex

1: \begin{abstract}

2: There are only a few learning algorithms applicable

3: to stochastic dynamic teams and games which generalize Markov decision processes to decentralized stochastic control problems involving possibly self-interested decision makers. Learning in games is generally difficult because of the non-stationary environment in

4: which each decision maker aims to learn its optimal decisions

5: with minimal information in the presence of the other decision

6: makers who are also learning. In stochastic dynamic games, learning is

7: more challenging because, while learning, the decision makers

8: alter the state of the system and hence the future cost. In

9: this paper, we present decentralized Q-learning algorithms for

10: stochastic games, and study their convergence for the weakly

11: acyclic case which includes team problems as an important special case.

12: The algorithm is decentralized in that each decision maker has access to only its local information, the state information, and the local cost realizations; furthermore, it is completely oblivious to the presence of other decision makers.

13: We show that these algorithms converge to equilibrium policies almost surely in large classes of stochastic games.

14: \end{abstract}

15: