73bede81a435377e.tex
1: \begin{abstract}
2: We  study  model-based and model-free policy optimization in  a class of nonzero-sum stochastic dynamic  games called  linear quadratic (LQ) deep structured  games. In such games,  players interact with each other through a set of weighted averages (linear regressions)  of  the states and actions. In this paper, we focus  our attention to homogeneous weights; however,  for the special case of infinite population, the obtained results extend to asymptotically vanishing weights wherein the players learn the  sequential weighted mean-field equilibrium.  Despite the non-convexity of the optimization in policy space and the fact that policy optimization  does not generally converge in game setting,   we prove  that the proposed  model-based and model-free policy gradient descent and natural policy gradient descent algorithms  globally converge to the  sub-game  perfect Nash equilibrium.    To the best of our knowledge, this is  the first  result  that provides a global convergence proof of policy optimization  in  a nonzero-sum  LQ game. One of the salient  features of the proposed  algorithms  is that  their parameter space  is independent of the number of players, and when  the dimension of  state space is significantly larger than that of the action space,  they  provide a more efficient way of computation compared to  those algorithms that plan and learn in the action space.  Finally, some simulations are  provided to numerically verify the obtained  theoretical results. 
3: \end{abstract}
4: