b92e1f5403d90568.tex
1: \begin{abstract}
2: There are only a few learning algorithms applicable
3: to stochastic dynamic teams and games which generalize Markov decision processes to decentralized stochastic control problems involving possibly self-interested decision makers. Learning in games is generally difficult because of the non-stationary environment in
4: which each decision maker aims to learn its optimal decisions
5: with minimal information in the presence of the other decision
6: makers who are also learning. In stochastic dynamic games, learning is
7: more challenging because, while learning, the decision makers
8: alter the state of the system and hence the future cost. In
9: this paper, we present decentralized Q-learning algorithms for
10: stochastic games, and study their convergence for the weakly
11: acyclic case which includes team problems as an important special case.
12: The algorithm is decentralized in that each decision maker has access to only its local information, the state information, and the local cost realizations; furthermore, it is completely oblivious to the presence of other decision makers.
13: We show that these algorithms converge to equilibrium policies almost surely in large classes of stochastic games.
14: \end{abstract}
15: