abstract:482e11b2a566c077.tex

1: \begin{abstract}

2: 	We motivate and propose a new model for non-cooperative Markov

3: 	game which considers the interactions of risk-aware players. This model characterizes the time-consistent dynamic “risk” from \emph{both} stochastic state transitions (inherent to the game) and

4: 	randomized mixed strategies (due to all other players). An appropriate risk-aware equilibrium concept is proposed and the existence of such equilibria is demonstrated in stationary strategies by an application of Kakutani’s fixed point theorem. We further propose a

5: 	simulation-based $Q$-learning type algorithm for risk-aware equilibrium computation. This

6: 	algorithm works with a special form of minimax risk measures which can naturally be written as saddle-point

7: 	stochastic optimization problems, and covers many widely investigated risk measures.

8: 	Finally, the almost sure convergence of this simulation-based algorithm to an equilibrium is demonstrated

9: 	under some mild conditions. Our numerical experiments on a two player queuing game validate the properties of our model and algorithm, and demonstrate their worth and applicability in real life competitive

10: 	decision-making.

11: 	\\

12: 	\\

13: 	\emph{Keywords}: Markov games; time-consistent risk preferences; fixed

14: 	point theorem; $Q$-learning

15: \end{abstract}

16: