a49cfddcc9f2c5d4.tex
1: \begin{abstract}
2: We analyze best response dynamics for finding a Nash equilibrium of an infinite horizon zero-sum stochastic linear quadratic dynamic game with partial and asymmetric information. We derive explicit expressions for the best response of each player within the class of pure linear dynamic output feedback control strategies where the internal state dimension of each control strategy is an integer multiple of the system state dimension. With each best response, the players form increasingly higher-order belief states, leading to an infinite regress where the players' internal state dimensions increase towards infinity. However, we observe in extensive numerical experiments that the value of the game converges after a small number of best response iterations, which indicates that feedback strategies with limited internal state dimension (corresponding to a low-order belief state) can closely approximate a Nash equilibrium. 
3: % We demonstrate that the proposed IBR method converges to a Nash equilibrium under certain conditions. 
4: To help explain this convergence, we show that the eigenvalues of the controllability and observability Gramians and Hankel singular values of the higher-order belief dynamics decay rapidly. Thus, it becomes increasingly difficult for each player to control and observe higher-order belief dynamics, and the higher-order belief dynamics are closely approximated by low-order belief dynamics. 
5: % These findings not only validate the effectiveness of our proposed IBR method but also explain the mechanism behind its convergence.
6: \end{abstract}
7: