8bdefe7d33c8f73f.tex
1: \begin{abstract}
2: In this paper, we consider Markov chain and linear quadratic  models for deep structured teams with  discounted  and time-average cost functions under two non-classical information structures, namely, deep state sharing and no sharing. In deep structured teams, agents are coupled in dynamics and cost functions through deep state, where deep state  refers to a set of orthogonal linear regressions of the states. In this article, we consider a homogeneous linear regression for Markov chain models (i.e.,  empirical distribution  of states)  and a few orthonormal linear regressions  for linear quadratic models (i.e., weighted average of states).  Some  planning algorithms   are developed for the case when the model is known, and some  reinforcement learning algorithms are proposed for the case when the model is not known completely. The convergence of  two model-free (reinforcement learning) algorithms, one for Markov chain models and one for linear quadratic models,  is  established.   The results   are then applied to a smart grid.
3: \end{abstract}
4: