cs0309016/McDWag.tex
1: \documentclass[10pt]{article}
2: \usepackage{latexsym}
3: \usepackage{epsfig}
4: \usepackage{amsmath}
5: \usepackage{graphics}
6: \usepackage{amssymb}
7: \usepackage{amsthm}
8: \usepackage{amsopn}
9: \usepackage{amscd}
10: \usepackage{fullpage}
11: %\usepackage{cec2003,multicol,times}
12: 
13: \newtheorem{thm}{Theorem}
14: \newtheorem{cor}[thm]{Corollary}
15: \newtheorem{lem}[thm]{Lemma}
16: \newtheorem{prop}[thm]{Proposition}
17: \newtheorem{defn}[thm]{Definition}
18: \newtheorem{rem}[thm]{Remark}
19: \numberwithin{equation}{section}
20: \numberwithin{figure}{section}
21: \numberwithin{thm}{section}
22: \newtheorem{exm}[thm]{Example}
23: 
24: \begin{document}
25: 
26: %\pagestyle{empty}
27: %\sloppy
28: 
29: %\twocolumn[
30: 
31: \title{Using Simulated Annealing to Calculate the Trembles of Trembling Hand
32: Perfection}
33: %\vspace{0.1in}
34: %\begin{multicols}{2}
35: \begin{center}
36: 
37: \textbf{Stuart McDonald}  \\
38: School of Economics \\
39: The University of Queensland \\
40: Queensland 4072, \\
41: Australia \\
42: s.mcdonald@mailbox.uq.edu.au\\
43: \end{center}
44: 
45: \begin{center}
46: \textbf{Liam Wagner} \\
47: Department of Mathematics and \\
48: St John's College, within \\
49: The University of Queensland \\
50: Queensland 4072, Australia \\
51: LDW@maths.uq.edu.au
52: \end{center}
53: %\end{multicols}
54: %\vspace{0.25in}
55: %]
56: 
57: 
58: \begin{abstract}
59: \noindent Within the literature on non-cooperative game theory, there have been a
60: number of algorithms which will compute Nash equilibria.
61: This paper shows that the family of algorithms known as Markov chain Monte Carlo (MCMC) can be used to calculate Nash equilibria. MCMC is a type of Monte Carlo simulation that relies on Markov chains to ensure its regularity conditions. MCMC has been widely used
62: throughout the statistics and optimization literature, where variants of
63: this algorithm are known as simulated annealing. This paper shows that there
64: is interesting connection between the trembles that underlie the functioning
65: of this algorithm and the type of Nash refinement known as trembling hand
66: perfection. This paper shows that it is possible to use simulated annealing to compute this refinement.
67: \end{abstract}
68: 
69: \noindent \textit{Keywords:}Trembling Hand Perfection, Equilibrium Selection and Computation, Simulated
70: Annealing, Markov Chain Monte Carlo
71: 
72: \section{Introduction}
73: This paper develops an algorithm to compute a desired type of Nash Equilibrium.
74: Furthermore we use this algorithm to show existance and uniqness of sensible Nash Equilibrium. Our novel approach to this
75: problem has been motivated by the number of existance algorithms. The basis of the general
76: approach of the literature has been to rely on the geometric properties of the equilibrium.
77: 
78: This paper is interested in computing Nash
79: equilibria that satisfy the type of Nash  of refinement refered to as "trembling hand"
80: perfection \cite{Selt75} \cite{Selt78}. This paper shows that simulated annealing can be used to compute the above refinement. Simulated annealing is a type of Monte Carlo sampling procedure that relies on Markov chains to ensure its regularity conditions.
81: Most applications have mainly concentrated on problems of
82: combinatorial optimization such as routing and packing problems, or problems
83: from statistical pattern recognition like image processing.
84: 
85: Another well known group of algorithms for calculating Perfect Nash Equilibria are the
86: trace algorithms of Harsanyi and Selten
87: \cite{H+S}, where an outcome for the game is selected by ``tracing'' a
88: feasible path through a family of auxiliary games. The solution progress
89: along the feasible path is intended to represent the way in which players
90: adjust their expectations and predictions about the play of the game.
91: 
92: 
93: A major limitation of the tracing procedure is that the logarithmic version of
94: this method, does not always provide a path that traces to a perfect
95: equilibrium. Harsanyi \cite[p.69]{harsanyi}, has argued that this problem can be
96: resolved by eliminating all dominated pure strategies before applying the tracing
97: procedure. However van Damme \cite[p.77]{vDam91} constructs examples which do
98: not rquire dominated pure strategies in which the tracing procedure yields a
99: non-perfect equilibrium. Furthermore it was suggested by van Damme that the
100: inconsistancy lies in the logarithmic control costs. Games which have a control
101: cost parameter are of normal form so that players may also choose strategies,
102: incur depending on how well they choose to control their actions.
103: 
104: Another limitation of the tracing procedure it relies on the algeobro-geometric
105: properties of the equilibrium. This approach has been commonly used throughout
106: the literature for computing the equilibrium of non-cooperative games. For
107: example the focus of Lemke and Howson \cite{L+H} for bimatrix
108: games and the Wilson \cite{Wils71} and Scarf \cite{Scar73} algorithm for the
109: $N$-person games has also been to utilise the fundamental geometry of games to
110: calculate equilibrium. In general these approaches to Equilibrium calculation
111: are computationally expensive. 
112: 
113: However, within game theory there is a history of Monte Carlo methods being
114: applied to solve non-cooperative games, e.g. starting with Ulam \cite{Ulam50}
115: in 1954. From the view point of applying global optimization techniques to
116: infinite games, Monte Carlo simulation has been used by Georgobiani and
117: Torondzadze as a means of providing Nash equilibria for rectangular games
118: \cite{GT80}. This is the approach that we will be developing in this paper.
119: 
120: This paper is organised as follows.
121: The second section of this paper introduces the MCMC algorithm and provides
122: some discussion of its convergence properties in terms of Markov chain
123: theory. As a starting point for this discussion the connection between MCMC
124: sampling techniques and Monte Carlo sampling techniques is explored. The
125: MCMC algorithms include the Gibbs sampler and the Metropolis algorithm and
126: are often called simulated annealing. The third section of this paper will
127: provide a characterization of these algorithms in terms of the trembling
128: hand of trembling hand perfection. With this in mind, we provide an example
129: of the use of simulated annealing applied to calculating Nash equilibrium.
130: In this example the solution leads to equilibria that result from trembling
131: hand perfection.
132: 
133: \section{A Review of Simulated Annealing}
134: 
135: Monte Carlo simulation has been used extensively for solving complicated
136: problems that defy an analytic formulation. The main idea behind Monte Carlo
137: simulation is to either construct a stochastic model that is in agreement
138: with the actual problem analytically, or to simulate the problem directly.
139: One problem with Monte Carlo methods is that if the underlying probability
140: distribution is non-standard, then the convergence of sampled stochastic
141: process cannot be assured by the SLLN. One way around this is to realize
142: that a stochastic process can be generated from any process that draws its
143: samples from the support of underlying distribution. Markov chain Monte
144: Carlo (MCMC) does this by constructing a Markov chain that uses the
145: underlying distribution as its stationary distribution. This enables the
146: simulation of the stochastic process for non-standard distributions, while
147: ensuring that the SLLN will hold.
148: 
149: As an illustration of the MCMC we will discuss the \emph{Metropolis algorithm} \cite
150: {MRRTT53}. In this algorithm, each iteration will comprise $h$ updating
151: steps. Let $X_{t.i}$ denote the state of $X_{i}$ at the end of the $t$th
152: iteration. For step $i$ of iteration $t+1$, $X_{i}$ is updated using the
153: Metropolis algorithm. The candidate $Y_{i}$ is generated from a \emph{%
154: proposal distribution} $q_{i}\left( Y_{i}|X_{t,i},X_{t,-i}\right) $, where $%
155: X_{t,-i}$ denotes the value of
156: \begin{equation*}
157: X_{-i}=\left\{ X_{1},...,X_{i-1},X_{i+1},...,X_{h}\right\}
158: \end{equation*}
159: after completing step $i-1$ of iteration $t+1$, i.e.
160: \begin{equation*}
161: X_{t,-i}=\left\{ X_{t+1,1},...,X_{t+1,i-1},X_{t.i+1},...,X_{t.h}\right\} ,
162: \end{equation*}
163: where the components $X_{t,i+1},...,X_{t,h}$ have yet to be updated and
164: components $X_{t+1,1},...,X_{t+1,i-1}$ have already been updated. Thus the
165: proposal distribution of the $i$th component $q_{i}\left( \cdot |\cdot
166: ,\cdot \right) $, generates a candidate for only the $i$th component of $X$.
167: The candidate is accepted with probability
168: \begin{equation*}
169: \alpha \left( X_{-i},X_{i},Y_{i}\right) =\min \left( 1,\frac{\pi \left(
170: Y_{i}|X_{-i}\right) q\left( X_{i}|Y_{i},X_{-i}\right) }{\pi \left(
171: X_{i}|X_{-i}\right) q\left( Y_{i}|X_{i},X_{-i}\right) }\right) ,
172: \end{equation*}
173: where
174: \begin{equation*}
175: \pi \left( X_{i}|X_{-i}\right) =\frac{\pi \left( X\right) }{\int \pi \left(
176: X\right) dX_{.i}}
177: \end{equation*}
178: is the full conditional distribution for $X_{i}$ under $\pi \left( \cdot
179: \right) $. If $Y_{.i}$ is accepted, then $X_{t+1,i}=Y_{i}$; otherwise $%
180: X_{t+1,i}=X_{t,i}$. For this reason $\alpha \left(
181: X_{.-i},X_{.i},Y_{.i}\right) $ is known as the \emph{Metropolis criterion}.
182: 
183: One of the disadvantages of this algorithm is the complexity of the
184: Metropolis criterion\emph{\ }$\alpha \left( X_{.-i},X_{.i},Y_{.i}\right) $.
185: In practice $\alpha \left( X_{.-i},X_{.i},Y_{.i}\right) $ often simplifies
186: considerably, particularly when $\pi \left( \cdot \right) \,$derives from a
187: conditional independence model \cite{Gilks96} \cite{Rob96}. However, the
188: single component Metropolis algorithm has the advantage of employing the
189: full conditional distributions for $\pi \left( \cdot \right) $ and Besag
190: \cite{Besag74} has shown that $\pi \left( \cdot \right) $ will be uniquely
191: determined by its full conditional distribution. As a result $\alpha \left(
192: X_{.-i},X_{.i},Y_{.i}\right) $ will generate samples from a unique target
193: distribution $\pi \left( \cdot \right) $.
194: 
195: An alternative approach for constructing a Markov chain with a stationary
196: distribution $\pi \left( \cdot \right) ,$ that provides a generalization of
197: the approach suggested by Metropolis et al. \cite{MRRTT53}, has been
198: suggested by Hastings \cite{Hast70}. At each point in time $t$, the next
199: state $X_{t+1}$ is chosen by first sampling a candidate point $Y$ from a
200: proposal distribution $q\left( \cdot |X_{t}\right) $. The candidate point $Y$
201: is then accepted in accordance with the criterion
202: \begin{equation*}
203: \alpha \left( X,Y\right) =\min \left( 1,\frac{\pi \left( Y\right) }{\pi
204: \left( X\right) }\right) .
205: \end{equation*}
206: Under this criterion, if the candidate point is accepted, then $X_{t+1}=Y$,
207: otherwise $X_{t+1}=X_{t}$. The main difference between this algorithm and
208: the one proposed by Metropolis et al. \cite{MRRTT53}, is that the \emph{%
209: Metropolis-Hastings algorithm}, as it is named, assumes that the proposal
210: distributions are symmetric, i.e. $q\left( Y|X\right) =q\left( X|Y\right) $.
211: The Metropolis-Hastings algorithm is therefore ruled out for higher
212: dimensional problems, as these problems generally have little symmetry. The
213: main advantage of the Metropolis-Hastings algorithm is that proposal
214: distribution has no impact on the decision criterion, and therefore will not
215: impact on the convergence of this algorithm towards the stationary
216: distribution $\pi \left( \cdot \right) $.
217: 
218: To provide a fuller explanation, the transition kernel of the
219: Metropolis-Hastings algorithm is given by
220: \begin{equation}
221: \begin{split}
222: &P\left( X_{t+1}|X_{t}\right) =q\left( X_{t+1}|X_{t}\right) \alpha \left(
223: X_{t},X_{t+1}\right) \\
224: &+I\left( X_{t+1}=X_{t}\right) \left[ 1-\int q\left( Y|X_{t}\right) \alpha
225: \left( X_{t},Y\right) dY\right] ,
226: \end{split}
227: \end{equation}
228: 
229: 
230: where $I\left( \cdot \right) $ is the indicator function. From $\alpha
231: \left( X_{t},X_{t+1}\right) $, we can see that
232: \begin{equation*}
233: \begin{split}
234: &\pi \left( X_{t}\right) q\left( X_{t+1}|X_{t}\right) \alpha \left(
235: X_{t},X_{t+1}\right) =\\
236: &\pi \left( X_{t+1}\right) q\left( X_{t}|X_{t+1}\right)
237: \alpha \left( X_{t+1},X_{t}\right) .
238: \end{split}
239: \end{equation*}
240: This implies that
241: \begin{equation*}
242: \pi \left( X_{t}\right) P\left( X_{t+1}|X_{t}\right) =\pi \left(
243: X_{t+1}\right) P\left( X_{t}|X_{t+1}\right) .
244: \end{equation*}
245: Integrating both sides of this equation, we get
246: \begin{equation*}
247: \int \pi \left( X_{t}\right) P\left( X_{t+1}|X_{t}\right) dX_{t}=\pi \left(
248: X_{t+1}\right) .
249: \end{equation*}
250: This equation states that if $X_{t}$ is drawn from $\pi $, then so must $%
251: X_{t+1}$. In other words, once one sample value has been obtained from the
252: stationary distribution, then all subsequent samples must be drawn from the
253: same distribution.
254: 
255: This is only a partial justification of the Metropolis-Hastings algorithm. A
256: full proof requires that $P^{\left( t\right) }\left( X_{t}|X_{0}\right) $
257: converges on the stationary distribution. For a heuristic justification of
258: this result, it can be noted that this distribution will depend only on the
259: starting value $X_{0}$, therefore the proof must show that Markov chain
260: gradually forgets its starting point, and converges on a unique stationary
261: distribution. Thus, after a sufficiently long \emph{burn-in} of $m$
262: iterations, points $\left\{ X_{t};t=m+1,\,...,n\right\} $ will be dependent
263: sample approximations of the stationary distribution. Hence the \emph{%
264: burn-in sample} is usually discarded when calculating the ergodic mean for $%
265: f\left( X\right) $%
266: \begin{equation*}
267: \bar{f}=\frac{1}{m-n}\sum_{t=m}^{n}f\left( X_{t}\right) .
268: \end{equation*}
269: 
270: %The most widely used variant of the MCMC is the \emph{Gibbs sampler} \cite
271: %{G+G84}. The Gibbs sampler draws its name from the Gibbs distribution of
272: %statistical physics. The algorithm uses the Gibbs distribution as itsstationary distribution and combines stochastic %relaxation and annealing to
273: %compute estimates of posterior probabilities. Sampling occurs from a \emph{%
274: %local} conditional probability distribution. The local conditional
275: %distribution is dependent on the global control parameter $T$ (the
276: %``temperature''), that varies between $0$ and $\infty $, depending
277: %respectively on whether the algorithm is directed or undirected.
278: 
279: %The Gibbs sampler is similar to the single component Metropolis algorithm in
280: %its construction, and can be considered an extension of this algorithm. The
281: %algorithm exploits the equivalence between the Gibbs distribution and Markov
282: %random fields. In the Gibbs sampler, its $i$th component proposal
283: %distribution for $X_{t+1.i}$
284: %\begin{equation*}
285: %q_{i}\left( Y_{.i}|X_{.i},X_{.-i}\right) =\pi \left( Y_{.i}|X_{.-i}\right) ,
286: %\end{equation*}
287: %where $\pi \left( Y_{.i}|X_{.-i}\right) $ is the full conditional
288: %distribution. Thus the Gibbs sampler cuts through the intermediate step of
289: %satisfying the proposal distribution by sampling purely from the full
290: %conditional distribution -- a consequence of the Gibbs stationary
291: %distribution. The Gibbs sampler therefore has the advantage of the
292: %Metropolis-Hastings algorithm without requiring the symmetry of its Markov
293: %chain.
294: 
295: %The transition kernel of the Gibbs sampler is then expressed by the product
296: %of the conditional densities of the individual steps required for each
297: %iteration:
298: %\begin{equation*}
299: %K\left( X,Y\right) =\prod_{i=1}^{d}\pi \left( Y_{.i}|X_{.-i}\right) .
300: %\end{equation*}
301: %The transition probabilities can then be expressed as follows:
302: %\begin{equation*}
303: %P\left[ X\left( t\right) =\omega |X\left( 0\right) =\eta ,\right]
304: %=\int_{A}K\left( X,Y\right) dY.
305: %\end{equation*}
306: %Given that $\pi \left( \cdot \right) $ will also depend on the temperature
307: %parameter $T$ (i.e. $\pi _{T}\left( \cdot \right) $), it can be shown that
308: %for any decreasing sequence $T\left( t\right) $, if $T\left( t\right) \geq
309: %N\Delta /\log t$, for all $t\geq t_{0}$ for some $t_{0}>2$ (where $\Delta $
310: %is the step size and $N$ is the number of iterations taken), \noindent then
311: %\begin{equation*}
312: %\lim_{t\rightarrow \infty }P\left[ X\left( t\right) =\omega |X\left(
313: %0\right) =\eta \right] =\pi _{0}\left( \omega \right) ,
314: %\end{equation*}
315: %for any starting configuration $\eta \in \Omega $ \cite[p.731]{G+G84}. Geman
316: %and Geman \cite[p.732]{G+G84} show that for any random function $f\left(
317: %X\right) $, for fixed $T$
318: %\begin{equation*}
319: %\lim_{n\rightarrow \infty }\frac{1}{n}\sum_{t=1}^{n}f\,\left( X\left(
320: %t\right) \right) =\int_{\Omega }f\,\left( \omega \right) d\pi \left( \omega
321: %\right) .
322: %\end{equation*}
323: %If we assume that there exists a $\tau $ such that
324: %\begin{equation*}
325: %S\subset \left\{ n_{t+1},...,n_{t+\tau }\right\}
326: %\end{equation*}
327: %for all $t$, then the above relationship will hold with probability one.
328: %Gelfand and Smith \cite{Gel+S90} indicate that these results can be
329: %generalized to enable sampling from arbitrary distributions.
330: 
331: \section{Trembling Hand Algorithm}
332: 
333: \subsection{A MCMC Algorithm for Computing Perfect Equilibria in
334: Strategic Games}
335: 
336: In this sub-section we provide an algorithm for computing a perfect
337: equilibrium for a strategic game and show that this algorithm
338: provides a sequence of perturbed mixed strategies that will
339: eventually converge on perfection. The basic idea is to construct
340: select a Markov chain and then use this Markov to deliver a Nash
341: equilibrium via Markov chain approximation. The trick is to
342: nominate the appropriate Markov chain with the most suitable
343: convergence properties to deliver convergence of the sequence
344: completely mixed Nash equilibria of perturbed games or $\varepsilon
345: $-perfect equilibria to a perfect equilibrium. This is the
346: objective that is undertaken in this section.
347: 
348: Consider an $n$-person game in strategic form $G=\left( N,\left(
349: S_{i}\right) _{i\in N},\left( u_{i}\right) _{i\in N}\right) $ in which $%
350: N=\left\{ 1,...,n\right\} $ is the player set, each player $i\in N$ has a
351: finite set of pure strategies $S_{i}=\left\{ s_{i1},...,s_{ik_{i}}\right\} $
352: and a pay-off function $u_{i}:\times _{i\in N}S_{i}\rightarrow \mathbb{R}$
353: mapping the set of pure strategy profiles $\times _{i\in N}S_{i}$ into the
354: real number line.
355: 
356: In the strategic game $G$, for each player $i\in N$ there is a set of
357: probability measures $\Delta _{i}$ that can be defined over the pure
358: strategy set $S_{i},$ this is player $i$'s mixed strategy set. The elements
359: of the set $\Delta _{i}$ are of the form $p_{i}:S_{i}\rightarrow \left[
360: 0,1\right] $ where $\sum_{j=1}^{k_{i}}p_{ij}=1,$ with $p_{ij}=p\left(
361: s_{ij}\right) ,$ i.e. $\Delta _{i}$ is isomorphic to the unit simplex.
362: 
363: We denote the elements of the space of mixed strategy profiles $\times
364: _{i\in N}\Delta _{i}$ by $p=\left( p_{1},...,p_{n}\right) ,$ where $%
365: p_{i}=\left( p_{i1},...,p_{ik_{i}}\right) \in \Delta _{i}$. As is the
366: convention we use the following short-hand notation $p=\left(
367: p_{i},p_{-i}\right) $, where $p_{-i}$ denotes the other components of $p$.
368: 
369: For each player $i$, the pay-off function $u_{i}:\times _{i\in N}\Delta
370: _{i}\rightarrow \mathbb{R}$ can be extended to the domain of mixed strategy
371: profiles $\times _{i\in N}\Delta _{i}$. The pay-off function for each player
372: $i\in N$ will be defined as follows $u_{i}\left( p_{i},p_{-i}\right)
373: =\sum_{j=1}^{k_{i}}p_{ij}u_{i}\left( s_{ij},p_{-i}\right) $. A mixed
374: strategy $p\in $ $\times _{i\in N}\Delta _{i}$ is \textbf{Nash equilibrium}
375: of the strategic game $G$, if for all players $i\in N$ and all $%
376: p_{i}^{\prime }\in \Delta _{i}$
377: \begin{equation}
378: u_{i}\left( p_{i},p_{-i}\right) \geq u_{i}\left( p_{i}^{\prime
379: },p_{-i}\right) .
380: \end{equation}
381: 
382: Suppose that as well there being a positive probability $p_{ij}$ of a player
383: $i$ selecting a pure strategy s$_{ij}\in S_{i}$, there is a small
384: probability $\varepsilon _{ij}$ that the pure strategy $s_{ij}$ will be
385: chosen by $i$ out of error. In the case where player $i$ selects his $j$th
386: pure strategy $s_{ij}$ by mistake, the probability of doing so is given by $%
387: q_{ij}$. The total probability of player $i$ selecting a pure strategy s$%
388: _{ij}\in S_{i}$ is then given by
389: \begin{equation}
390: \hat{p}_{ij}=\left( 1-\varepsilon _{ij}\right) p_{ij}+\varepsilon
391: _{ij}q_{ij}.
392: \end{equation}
393: 
394: It can be seen that in this case, the total probability of player $i$
395: selecting a pure strategy s$_{ij}\in S_{i}$ will be bounded below by
396: \begin{equation}
397: \hat{p}_{ij}\geq \varepsilon _{ij}q_{ij}.
398: \end{equation}
399: Equating $\eta _{ij}=\varepsilon _{ij}q_{ij}$ we can see that this condition
400: can be rewritten as
401: \begin{equation}
402: \hat{p}_{ij}\geq \eta _{ij}\quad \forall \,s_{ij}\in S_{i}\text{ and }i\in N,
403: \end{equation}
404: with
405: \begin{equation}
406: \sum_{j=1}^{k_{i}}\eta _{ij}<1\quad \forall \,i\in N.
407: \end{equation}
408: 
409: This leads to the definition of a perturbed game $\left( G,\eta \right) $ as
410: a finite strategic game derived from the strategic game $G$, in which each
411: player $i$'s mixed strategy set is the set of completely mixed strategies
412: for player $i$ constrained by the probability of making an error
413: \begin{equation}
414: \Delta _{i}\left( \eta _{i}\right) = p_{i}=\left\{ \left(
415: p_{i1},....,p_{ik_{i}}\right) \in \Delta _{i};p_{ij}\geq \eta _{ij}\, \text{and }
416: \sum\nolimits_{j=1}^{k_{i}}\eta _{ij}<1\right\}  
417: \end{equation}
418: A mixed strategy combination $p\in \times _{i\in N}\Delta _{i}\left( \eta
419: _{i}\right) $ is a Nash equilibrium of the perturbed game $\left( G,\eta
420: \right) $ iff the following condition is satisfied
421: \begin{equation}
422: u_{i}\left( s_{ij},p_{-i}\right) <u_{i}\left( s_{il},p_{-i}\right) \text{
423: then }p_{ij}=\eta _{ij},\quad \forall \,s_{ij}\text{,\thinspace }s_{il}\in
424: S_{j}.
425: \end{equation}
426: 
427: A mixed strategy $p\in $ $\times _{i\in N}\Delta _{i}$ is a \textbf{perfect
428: equilibrium} in the strategic game $G$ if there exists a sequence of
429: completely mixed strategy profiles $\left\{ p^{k}\right\} _{k=1}^{\infty }$
430: where $\lim_{k\rightarrow \infty }p^{k}=p$, and for every player $i\in N$
431: and for every $p_{i}^{\prime }\in \Delta _{i}$%
432: \begin{equation}
433: u_{i}\left( p_{i},p_{-i}^{k}\right) \geq u_{i}\left( p_{i}^{\prime
434: },p_{-i}^{k}\right) \quad \forall \,k=1,2,....
435: \end{equation}
436: In terms of our definition of a perturbed game, a mixed strategy is a
437: perfect equilibrium iff there exist some sequences $\left\{ \eta ^{k}=\left(
438: \eta _{1}^{k},...\eta _{n}^{k}\right) \right\} _{k=1}^{\infty }$ and $%
439: \left\{ p^{k}=\left( p_{1}^{k},...p_{n}^{k}\right) \right\} _{k=1}^{\infty }$
440: such that
441: 
442: \begin{enumerate}
443: \item  each $\eta ^{k}>0$ and $\lim_{k\rightarrow \infty }\eta _{k}=0$,
444: 
445: \item  each $p^{k}$ is a Nash equilibrium of a perturbed game equilibrium $%
446: \left( G,\eta ^{k}\right) $, and
447: 
448: \item  $\lim_{k\rightarrow \infty }p^{k}=p$ where for every player $i\in N$
449: and for every $p_{i}^{\prime }\in \Delta _{i}$%
450: \begin{equation}
451: u_{i}\left( p_{i},p_{-i}^{k}\right) \geq u_{i}\left( p_{i}^{\prime
452: },p_{-i}^{k}\right) \quad \forall \,k=1,2,....
453: \end{equation}
454: \end{enumerate}
455: 
456: An alternative definition of perfection has been made Myerson
457: \cite[pp 75--76]{Myers78} and is based on the idea that every pure strategy
458: in a player's set of pure strategies has associated with it a small positive
459: probability of at least $\varepsilon >0,$ but on strategies that are best
460: responses have associated probabilities greater that $\varepsilon .$ More
461: formally, for any player $i\in N$ a mixed strategy $p_{i}\in \Delta _{i}$ is
462: an $\varepsilon $\textbf{-perfect equilibrium} iff it is completely mixed
463: and
464: \begin{equation}
465: u_{i}\left( s_{ij},p_{-i}\right) <u_{i}\left( s_{il},p_{-i}\right) \text{
466: then }p_{ij}\leq \varepsilon ,\text{\quad }\forall \,s_{ij}\text{,\thinspace
467: }s_{il}\in S_{j}.
468: \end{equation}
469: Unlike Nash equilibria of perturbed games, the $\varepsilon $-perfect
470: equilibria of a game $G$ will not necessarily be one of its Nash equilibria.
471: However, Myerson does show that $p=\left( p_{1},...,p_{n}\right) \in \times
472: _{i\in N}\Delta _{i}$ will be a perfect equilibrium iff
473: 
474: \begin{enumerate}
475: \item  each $\varepsilon ^{k}>0$ and $\lim_{k\rightarrow \infty }\varepsilon
476: ^{k}=0$,
477: 
478: \item  each $p^{k}$ is an $\varepsilon ^{k}$-perfect equilibrium of the game
479: $G$, and
480: 
481: \item  $\lim_{k\rightarrow \infty }p_{i}^{k}=p_{i}$ for every player $i\in
482: N. $
483: \end{enumerate}
484: 
485: The starting basis for the MCMC algorithm for calculating
486: perfection will be to follow Myerson by constructing a sequence of
487: $\varepsilon $-perfect equilibria for the strategic game $G$. As
488: stated above, we know that for the
489: strategic game $G$, $p\in \times _{i\in N}\Delta _{i}$ is an $\varepsilon $%
490: -perfect equilibrium iff for each player $i\in N$, $p_{i}\in \Delta
491: _{i}$ is a completely mixed strategy and
492: \begin{equation}
493: \begin{split}
494: &u_{i}\left( s_{ij},p_{-i}\right) <u_{i}\left( s_{il},p_{-i}\right)
495: \text{ then }p_{ij}\leq \varepsilon,\\
496: &\text{\quad }\forall
497: \,s_{ij}\text{,\thinspace }s_{il}\in S_{j}.
498: \end{split}
499: \end{equation}
500: 
501: Following Myerson \cite[p 79]{Myers78} we define the following set
502: of mixed strategies for each player $i\in N$
503: \begin{equation}
504: \Delta _{i}^{*}=\left\{ p_{i}\in \Delta _{i};p_{ij}\geq \delta
505: \;\,\forall \,s_{ij}\in S_{i}\right\} ,
506: \end{equation}
507: where
508: \begin{equation}
509: \delta =\frac{1}{m}\varepsilon ^{m},\quad 0<\varepsilon <1
510: \end{equation}
511: with $m=\max_{i\in N}\left| S_{i}\right| $. We then define a
512: point-to-set mapping $F_{i}:\times _{i\in N}\Delta
513: _{i}^{*}\rightarrow \Delta _{i}^{*}$ to be a family of completely
514: mixed distributions contained in $\Delta _{i}^{*}$
515: \begin{equation}
516: \begin{split}
517: &F_{i}\left( p_{1},...,p_{n}\right) =\left\{ p_{i}^{*}\in \Delta
518: _{i}^{*};u_{i}\left( s_{ij},p_{-i}\right) <u_{i}\left(
519: s_{il},p_{-i}\right)\right.\\
520: &\left.\text{ then }p_{ij}\leq \varepsilon ,\text{\quad }\forall \,s_{ij}\text{%
521: ,\thinspace }s_{il}\in S_{j}\right\}
522: \end{split}
523: \end{equation}
524: 
525: If we then define, for each player $i\in N$, a mixed strategy
526: \begin{equation}
527: p_{il}^{*}=\frac{e^{\rho \left( s_{ij}\right)
528: }}{\sum_{l=1}^{k_{i}}e^{\rho \left( s_{il}\right) }},
529: \end{equation}
530: where
531: \begin{equation}
532: \rho \left( s_{ij}\right) =\left| \left\{ s_{il}\in
533: S_{i};u_{i}\left( s_{ij},p_{-i}\right) <u_{i}\left(
534: s_{il},p_{-i}\right) \text{ and }p\in \times _{i\in N}\Delta
535: _{i}^{*}\right\} \right|
536: \end{equation}
537: Then it can be seen that $p_{i}^{*}\in F_{i}\left(
538: p_{1},...,p_{n}\right) $ will be non-empty. As each $F_{i}\left(
539: p_{1},...,p_{n}\right) $ will a finite collection of linear
540: inequalities, they will also be closed convex sets. In addition
541: each $F_{i}\left( p_{1},...,p_{n}\right) $, by the continuity of
542: the pay-off function $u_{i}\left( s_{ij},\cdot \right) ,$ will also
543: be upper semi-continuous.
544: 
545: As a consequence the mapping $F:\times _{i\in N}\Delta
546: _{i}^{*}\rightarrow \times _{i\in N}\Delta _{i}^{*}$ satisfies all
547: the conditions of the Kakutani Fixed Point Theorem. In other words
548: there exists some completely mixed strategy $p_{\varepsilon }\in
549: \times _{i\in N}\Delta _{i}^{*}$ such
550: that $p_{\varepsilon }$ is an $\varepsilon $-perfect equilibrium of $G$. As $%
551: \times _{i\in N}\Delta _{i}$ is compact, the sequence $\varepsilon
552: $-perfect
553: equilibria $p_{\varepsilon }\rightarrow $ $p$ as $\varepsilon \rightarrow 0$%
554: , where $p$ is the perfect equilibrium of $G$.
555: 
556: An alternative route to the same result can be arrived at as
557: follows using an argument based on the convergence properties
558: Markov chain.
559: 
560: \begin{thm}
561: For any normal form game $G=\left( N,\left( S_{i}\right) _{i\in
562: N},\left( u_{i}\right) _{i\in N}\right) $, it is possible to define
563: a MCMC algorithm such that its transition probabilities will
564: converge to a perfect equilibrium as long as the following
565: conditions hold:
566: 
567: \begin{enumerate}
568: \item  if $u_{i}\left( s_{ij},p_{-i}^{k}\right) -u_{i}\left(
569: s_{il},p_{-i}^{k}\right) \geq 0$ then accept, where $p_{-i}^{k}$ is
570: the tuple mixed strategies selected on the $k$th iteration;
571: 
572: \item  otherwise, accept if probability $\exp \left( \frac{u_{i}\left(
573: s_{il},p_{-i}^{k}\right) -u_{i}\left( s_{il},p_{-i}^{k}\right)
574: }{T}\right)
575: >\varepsilon ,$ where $\varepsilon \sim U\left[ 0,1\right] ;$ and
576: 
577: \item  in addition it can be seen that for all $s_{ij}$ and $s_{il}\in S_{i}$
578: such that $u_{i}\left( s_{ij},p_{-i}^{k}\right) <u_{i}\left(
579: s_{il},p_{-i}^{k}\right) $, $\alpha _{jl}^{i}\left( T\right)
580: \rightarrow 0$ as $T\rightarrow \infty $.
581: \end{enumerate}
582: \end{thm}
583: 
584: \noindent
585: %TCIMACRO{\TeXButton{Proof}{\proof}}
586: %BeginExpansion
587: \proof%
588: %EndExpansion
589: For each player $i\in N$, there will be a collection these subsets
590: \begin{equation}
591: N_{ij}=\left\{ s_{il}\in S_{i};u_{i}\left( s_{ij},p_{-i}\right)
592: <u_{i}\left( s_{il},p_{-i}\right) \text{ and }p\in \times _{i\in N}\Delta _{i}^{*}\right\}
593: \end{equation}
594: of $i$'s pure strategy space $S_{i}$. The collection of these sets
595: will referred to as player $i$'s local neighborhood structure. What
596: we would like to do is for any two pure strategies
597: $s_{ij}$,$\,s_{il}\in S_{i}$ define a path from $s_{ij}$ to
598: $s_{il}$ such that
599: \begin{equation}
600: s_{ij_{1}}\in N_{ij},s_{ij_{2}}\in N_{ij_{1}},...,s_{il}\in
601: N_{ij_{m}}.
602: \end{equation}
603: 
604: In order to do this, we observe that the point-set mapping defined
605: by the set
606: \begin{equation}
607: F_{i}\left( p_{1},...,p_{n}\right) =\left\{ p_{i}^{*}\in \Delta
608: _{i}^{*};u_{i}\left( s_{ij},p_{-i}\right) <u_{i}\left(
609: s_{il},p_{-i}\right)\text{ then }p_{ij}\leq \varepsilon ,\text{\quad }\forall \,s_{ij}\text{%
610: ,\thinspace }s_{il}\in S_{i}\right\}
611: \end{equation}
612: is a collection homogenous transition probabilities $S_{i}$
613: \begin{equation}
614: p_{jl}^{i}\left( k\right) =\Pr \left\{ s_{i}\left( k\right)
615: =s_{il}|s_{i}\left( k-1\right) =s_{ij}\right\} =\Pr \left\{
616: s_{il}|s_{ij}\right\} .
617: \end{equation}
618: Further more we can see that these transition probabilities have
619: the Markov property, i.e. given the path from $s_{ij}$ to $s_{il}$
620: such that
621: \begin{equation}
622: s_{ij_{1}}\in N_{ij},s_{ij_{2}}\in N_{ij_{1}},...,s_{il}\in
623: N_{ij_{m}}.
624: \end{equation}
625: the conditional probability
626: \begin{equation}
627: \begin{split}
628: &\Pr \left\{s_{il}s_{ij_{1}},s_{ij_{2}},...s_{ij_{m}},s_{ij}\right\} \\
629: &=\Pr
630: \left\{ s_{il}|s_{ij_{m}}\right\} \Pr \left\{
631: s_{ij_{m}}|s_{ij_{m-1}}\right\} ..\Pr \left\{
632: s_{ij_{2}}|s_{ij_{1}}\right\}
633: \end{split}
634: \end{equation}
635: 
636: We define the following generating probability for the Markov chain
637: for each
638: player $i\in N$%
639: \begin{equation}
640: g_{jl}^{i}=\left\{
641: \begin{array}{l}
642: \frac{1}{\rho \left( s_{ij}\right) }\text{,\quad if }s_{il}\in
643: N_{ij} \\ 0,\quad \quad \;\;\text{otherwise},
644: \end{array}
645: \right.
646: \end{equation}
647: where
648: \begin{equation}
649: \rho \left( s_{ij}\right) =\left| \left\{ s_{il}\in
650: S_{i};u_{i}\left( s_{ij},p_{-i}\right) <u_{i}\left(
651: s_{il},p_{-i}\right)
652: \text{ and }p\in \times _{i\in N}\Delta
653: _{i}^{*}\right\} \right| .
654: \end{equation}
655: We now introduce the following acceptance probability
656: \begin{equation}
657: \begin{split}
658: \alpha _{jl}^{i}\left( T\right) &=\left\{ 1,\exp \left(
659: \frac{u_{i}\left(
660: s_{ij},p_{-i}^{k-1}\right) -u_{i}\left( s_{il},p_{-i}^{k-1}\right) }{T}%
661: \right) \right\} ,\\
662: &T>0
663: \end{split}
664: \end{equation}
665: where $T$ is a control parameter. This last condition implies that
666: 
667: \begin{enumerate}
668: \item  if $u_{i}\left( s_{ij},p_{-i}^{k}\right) -u_{i}\left(
669: s_{il},p_{-i}^{k}\right) \geq 0$ then accept, where $p_{-i}^{k}$ is
670: the tuple mixed strategies selected on the $k$th iteration;
671: 
672: \item  otherwise, accept if probability $\exp \left( \frac{u_{i}\left(
673: s_{il},p_{-i}^{k}\right) -u_{i}\left( s_{il},p_{-i}^{k}\right)
674: }{T}\right)
675: >\varepsilon ,$ where $\varepsilon \sim U\left[ 0,1\right] ;$ and
676: 
677: \item  in addition it can be seen that for all $s_{ij}$ and $s_{il}\in S_{i}$
678: such that $u_{i}\left( s_{ij},p_{-i}^{k}\right) <u_{i}\left(
679: s_{il},p_{-i}^{k}\right) $, $\alpha _{jl}^{i}\left( T\right)
680: \rightarrow 0$ as $T\rightarrow \infty $.
681: \end{enumerate}
682: 
683: Given theses three conditions we can now see that the following
684: will hold:
685: 
686: \begin{itemize}
687: \item  We know that under this acceptance criterion as $k\rightarrow \infty $
688: The transition probability matrix $p_{i}^{k}$ of the homogenous
689: Markov chain generated by the game $G$ will converge on a
690: stationary distribution $\pi \left( T\right) $ as $k\rightarrow
691: \infty $.
692: \begin{equation}
693: p_{i}^{k}\rightarrow \pi _{i}\left( T\right) =\frac{e^{-C\left( i\right) /T}%
694: }{\sum_{k\in E}e^{-C\left( k\right) /T}}
695: \end{equation}
696: and as $T\rightarrow \infty $
697: \begin{equation}
698: \pi _{i}\left( T\right) =\left\{
699: \begin{array}{l}
700: \frac{1}{\left| N_{i}\right| }\quad \text{ if }i\in H \\ 0\quad
701: \quad \;\text{otherwise}
702: \end{array}
703: \right.
704: \end{equation}
705: where
706: \begin{equation}
707: N_{i}=\left\{ s_{il}\in S_{i};u_{i}\left( s_{ij},p_{-i}\right)
708: <u_{i}\left( s_{il},p_{-i}\right) ,p_{i}=0\right\} .
709: \end{equation}
710: (See van Laarhoven and Aarts \cite[p.22--25]{LA} for the proof of
711: this last statement.)
712: 
713: \item  The transition probability matrix $p_{i}^{k}$ satisfies Myerson's
714: definition of an $\varepsilon $-perfect equilibria and as Myerson
715: has shown, the fixed point that this sequence converges on is also
716: a perfect
717: equilibrium.%
718: %TCIMACRO{\TeXButton{End Proof}{\endproof}}
719: %BeginExpansion
720: \endproof%
721: %EndExpansion
722: \end{itemize}
723: 
724: 
725: \section{An Application to Extensive Form Games}
726: 
727: 
728: There are problems with viewing the existence of Nash equilibria as an end
729: in itself. The most immediate problem with this has been the possible large
730: number of Nash equilibria that can be found for any game, together with the
731: likelihood that not all of these Nash equilibria will be reasonable in some
732: sense. One way around this is to view the decision process of each agent
733: participating in the game from a decision theoretic perspective. From this
734: viewpoint, only those equilibria that can be found by backwards induction
735: will be self-enforcing. This leads to a technique for strategy space
736: reduction by iteratively removing strategies that lead to outcomes that are
737: not \emph{strongly dominated}. As shown by Kuhn \cite[Corollary 1]{Kuhn53},
738: under the assumption of perfect information, this leads to a recursion that
739: is equivalent to the Bellman equation of dynamic programming.
740: 
741: An alternative to this is to construct a recursion that iteratively
742: eliminates \emph{weakly dominated strategies}. However, the removal of
743: weakly dominated strategies can lead to the elimination of strategy profiles
744: that would otherwise provide suitable outcomes if only strongly dominated
745: strategies were to have been removed. From the viewpoint of this paper these
746: recursive strategy space reduction techniques can be considered to be an
747: algorithm that reduces the size of a game, making equilibrium selection
748: easier. However, these iterative reduction techniques becomes unwieldy once
749: the assumption of perfect information is relaxed and information sets
750: contain more than one node of the game tree.
751: 
752: This has led to a number of refinements to the definition of Nash
753: equilibrium. Among the first of these was the notion of \emph{subgame
754: perfection} \cite{Selt75}, which removes strategies that are not optimal for
755: every subgame of a extensive game's game tree. However, Selten \cite{Selt75}
756: has shown that subgame perfection can also prescribe non-optimizing
757: behaviour at information sets that are not reached when the equilibrium is
758: played. This is because the expected payoff for the player whose information
759: set is not reached will not depend on their own strategy. As a result every
760: strategy will maximize their payoff. As van Damme \cite[p. 8--9]{vDam91}
761: states, that this can be removed if the equilibrium prescribes a choice, at
762: each information set that is a singleton, that maximizes the expected payoff
763: after the information set. The problem is that not all subgame perfect
764: equilibria satisfying this criteria are sensible.
765: 
766: %\textbf{
767: %Nash refinement has also extended for example to the behaviour of animals in the
768: %wild. The development of the \emph{Evolutionary Stable Strategy} (ESS) by Maynard
769: %Smith and Price \cite{maysmith} introduced the reduction of the stratgey space
770: %into a new form. The idea of a ESS is to model situations in which an agents
771: %actions can be determined by the forces of evolution. However the main
772: %restriction on the ESS is that a strategy is stable if a whole population using
773: %this strategy can not be subject to invasion by a small group with a mutant
774: %genotype \cite{gin}. This condition in a nutshell means that the equilibrium is
775: %of the basic Nash form with a specific stability condition attached.
776: %}
777: %\\\\
778: %\textbf{
779: %Having such a condition tacted onto the end of the Nash equilibrium, means
780: %that the ESS is essentially static. The ESS like all equilibrium has its
781: %own limitations for both discrete and continuous systems
782: %\cite{gin}. If we were to apply a genetic algorithm approach to
783: %finding the ESS we could encounter a whole group of equilibrium which are
784: %inappropriate. Furthermore these equilibrium may not eliminate weakly dominated
785: %strategies and present a dis-equlibrium \cite{osb}.
786: %}
787: 
788: Another approach which was suggested by Selten \cite{Selt75}, was to eliminate
789: ``unreasonable'' subgame perfect equilibria by allowing the possibility of
790: ``mistakes'' or ``trembles'' on the part of decision makers. In this way,
791: isolated information sets are removed, as every information set can now be
792: reached with positive probability. The other advantage of trembling hand
793: perfection is that, unlike subgame perfection, it can be applied directly to
794: the normal form of any game. Although, as van Damme shows, the perfect
795: equilibria of a game's strategic and extensive forms need not coincide. An
796: equivalence relationship holds for only the \emph{agent normal form }and
797: extensive form of any game \cite{Selt75}. This is because the agent normal
798: form of any game views each node of the game tree, of the extensive form of
799: the game, as a player in the game. As a consequence each player represents
800: an information set held by the player and will have an identical payoff
801: function to the player.
802: 
803: As was shown by Selten \cite{Selt75}, the perfect equilibria of a game's
804: strategic and extensive forms need not coincide. However he showed that an
805: equivalence relationship holds between the equilibria of any extensive game
806: and its associated \emph{agent normal form }\cite{Selt75}. This is because
807: the agent normal form of any game views each node of the game tree, of the
808: extensive form of the game, as a player in the game. As a consequence each
809: player represents an information set held by the player and will have an
810: identical pay-off function to the player.
811: 
812: We let $\Gamma ^{e}$ define an extensive game consisting of a set of $n$
813: players, a game tree $K=\left( T,R\right) $ consisting of a set of nodes $T$
814: and a binary relation $R$ which is a partial ordering on the set of nodes.
815: The nodes of the game tree are classified as either non-terminal or terminal
816: according to whether or not their are succeeding nodes in the game tree. The
817: partial ordering is used to define a path of successive nodes. The
818: non-terminal nodes of the game tree are partitioned into the sets $%
819: P_{0},P_{1},...,P_{n}$ that specify the moves associated with each player,
820: with $P_{0}$ being the partition associated with random moves that are not
821: associated with any player. All of the non-terminal nodes is the information
822: partition $U=$ $\left( U_{1},....,U_{n}\right) $, where each set $U_{i}$ is
823: a partition of $P_{i}$ into information sets, such that all nodes within an
824: information set $u\in U_{i}$ have the same number of immediate successors
825: and path intersects an information set at most once. Under the assumption of
826: perfect information each information set $u\in U_{i}$ will be a singleton.
827: This paper will assume \emph{imperfect information} -- this implies that if
828: the information set $u\in U_{i}$ contains a node $x\in P_{i}$, player $i$
829: will not be able to distinguish other nodes contained in this information
830: set based on information possessed when moving to $x$. Throughout this paper
831: it will also be assumed that \emph{complete information} is present -- i.e.
832: each player has \emph{perfect recall} and will remember everything from
833: earlier in the game, including their own moves.
834: 
835: Associated with each random move is a probability distribution $p$. The
836: payoffs associated with the set of terminal points $Z$ of the game tree are
837: denoted by the $n$-tuple $r=\left( r_{1},...,r_{n}\right) $, where each
838: player's payoff is a function of the terminal points $r_{i}\left( z\right) $%
839: , $z\in Z$. With the information partition $U$ a choice set $C=\left\{
840: C_{u}:u\in \cup _{i=1}^{n}U_{i}\right\} $ can be defined, where each $C_{u}$
841: is a partition of the union of sets of successors $S\left( x\right) =\left\{
842: y;x\in P\left( y\right) \right\} $ for each $x\in u$: $\cup _{x\in u}S\left(
843: x\right) $. The interpretation is that if player $i$ takes the choice $c\in
844: C_{u}$ at information set $u$ $\in U_{i}$ , then if $i$ is at $x\in u$, the
845: next node reached is the element of $S\left( x\right) $ contained in $c$.
846: Under the assumption of imperfect information and perfect recall, a
847: probability distribution $b_{i}$ is assigned on $C_{u}$ to each information
848: set $u\in U_{i}.$ This distribution $b_{i}$ is a behavioural strategy, with
849: the set of all these strategies for player $i$ defined by $B_{i}$. The
850: profile of all players behavioural strategies is denoted by $b\in B:=\times
851: _{i=1}^{n}B_{i}$, where $B$ is the set of all behavioural strategy
852: combinations. The probability of a particular realization of the game $%
853: \Gamma ^{e}$ is denoted by $\mathbb{P}_{b}\left( z\right) $.
854: 
855: The definition of perfect equilibrium we will use is based Selten \cite
856: {Selt75} and Friedman \cite{Fried91}. Kuhn \cite{Kuhn53} has shown that
857: these behavioural and mixed strategies are realization equivalent.
858: Therefore, for an extensive form game $\Gamma ^{e}$ we let $\Gamma =\left(
859: S,R\right) $ define its strategic form representation, with $S$ denoting the
860: set of all mixed strategy profiles. The payoff profile $R$ is an $n$-tuple,
861: where the $i$th element is defined as
862: \begin{equation*}
863: R_{i}=\sum_{z\in Z}\Bbb{P}_{b}\left( z\right) r_{i}\left( z\right) .
864: \end{equation*}
865: A perturbed game of $\Gamma $ is defined by $\left( \Gamma ,\eta \right) $,
866: where $\eta $ is a mapping that assigns to every choice in $\Gamma $ a
867: positive number $\eta _{c}$ such that
868: \begin{equation*}
869: \sum_{c\in C_{u}}\eta _{c}<1
870: \end{equation*}
871: for every information set $u$. An equilibrium point $b$ of the strategic
872: game $\Gamma $ is a perfect equilibrium if $b$ is a limit point of a
873: sequence $\left\{ b\left( \eta \right) \right\} $ as $\eta \rightarrow 0$,
874: where each $b\left( \eta \right) $ is an equilibrium points of the
875: associated perturbed game $\left( \Gamma ,\eta \right) $.
876: 
877: The algorithm is constructed using a simulated annealing algorithm found in
878: van Laarhoven and Aarts \cite[p. 10]{LA}. The pseudo-code for this algorithm
879: is given below:
880: 
881: \begin{itemize}
882: \item[ ]  begin
883: 
884: \item[ ]  \textbf{Intitialize};
885: 
886: \item[ ]  $M:=0$;
887: 
888: \item[ ]  repeat
889: 
890: \begin{itemize}
891: \item[ ]  repeat
892: 
893: \begin{itemize}
894: \item[ ]  \textbf{Perturb}(config. $i\rightarrow j$, $\Delta R_{ij}()$) for
895: player 1;
896: 
897: \item[ ]  if $\left( \Delta R_{ij}\geq 0\right) $ then accept
898: 
899: \begin{itemize}
900: \item[ ]  elseif $\left( \exp \left( \frac{-\Delta R_{ij}}{c}\right)
901: >rand\left[ 0,1\right) \right) $ then accept;
902: \end{itemize}
903: 
904: \item[ ]  if accept then \textbf{Update}(config. $j$);
905: 
906: \item[ ]  \textbf{Perturb}(config. $i\rightarrow j$, $\Delta R_{ij}()$) for
907: player $n$;
908: 
909: \item[ ]  if $\left( \Delta R_{ij}\geq 0\right) $ then accept
910: 
911: \begin{itemize}
912: \item[ ]  elseif $\left( \exp \left( \frac{-\Delta R_{ij}}{c}\right)
913: >rand\left[ 0,1\right) \right) $ then accept;
914: \end{itemize}
915: 
916: \item[ ]  if accept then \textbf{Update}(config. $j$);
917: \end{itemize}
918: 
919: \item[ ]  until \textbf{equilibrium is approached sufficiently closely};
920: 
921: \item[ ]  $c_{M+1}:=f\left( c_{M}\right) $;
922: 
923: \item[ ]  $M:=M+1;$
924: \end{itemize}
925: 
926: \item[ ]  until \textbf{stop criterion = true;}
927: 
928: \item[ ]  end
929: \end{itemize}
930: 
931: \noindent The energy function differential for this algorithm is defined as
932: follows:
933: \begin{equation*}
934: \Delta R_{ij}=R_{j}-R_{i,}\quad i<j
935: \end{equation*}
936: where the $R_{i}$ are the expected pay-off functions for each player
937: participating in the perturbed game. The temperature function $c$ controls
938: the trembles and is updated by the decrement rule
939: \begin{equation*}
940: c_{M+1}=\alpha \cdot c_{M},\quad 0<\alpha <1,\,M=1,2,...\,\text{.}
941: \end{equation*}
942: 
943: We apply it to the following example taken from Friedman \cite[p. 51]
944: {Fried91}. This example is based on the three player extensive form game
945: used by Selten \cite{Selt75} to illustrate the existence of perfect
946: equilibrium. The game tree is defined as follows in Figure \ref{tree}
947: \cite[p. 50]{Fried91}.
948: 
949: \begin{figure}[!ht]
950: \begin{center}\label{tree}
951: \scalebox{0.5}{\epsfig{file=tree1.eps
952: ,angle=0,width=\linewidth}}
953: \caption{Selten's Horse Game Tree}
954: \end{center}
955: \end{figure}
956: 
957: 
958: This game possesses both a perfect equilibrium as well as ``non-sensical''
959: subgame perfect equilibria. The perfect equilibrium for this extensive form
960: game is defined via the perturbed pay-off functions:
961: 
962: \begin{eqnarray*}
963: R_{1} &=&\alpha _{1}(1-\varepsilon _{2}-3\varepsilon _{3}+4\varepsilon
964: _{2}\varepsilon _{3})+3\varepsilon _{3} \\
965: R_{2} &=&2\varepsilon _{3}(2-\varepsilon _{1})+\alpha _{2}(1-\varepsilon
966: _{1}-4e_{3}+4\varepsilon _{1}\varepsilon _{3}) \\
967: R_{3} &=&1-\varepsilon _{1}+\alpha _{3}(2\varepsilon _{1}-\varepsilon
968: _{2}+\varepsilon _{1}\varepsilon _{2}),
969: \end{eqnarray*}
970: where the $\alpha _{i}$ are the mixed strategies and $\varepsilon _{i}$ are
971: errors defined for $i=1,2,3$. Letting the errors approach zero, it can be
972: seen that perfect equilibrium is defined by $\left( 1,1,0\right) $.
973: 
974: The results of the simulation are shown below in Figure \ref{payoffs}
975: and indicate convergence to the trembling hand perfect
976: equilibrium.
977: \begin{figure}[!ht]
978: \begin{center}
979: \scalebox{0.5}{\epsfig{file=payoff1.eps
980: ,angle=0,width=\linewidth}}
981: \caption{Three-person game with imperfect competition and payoff solutions\label{payoffs}}
982: \end{center}
983: \end{figure}
984: 
985: \section{Conclusion}
986: 
987: This paper has concentrated on some of the underlying theoretical mechanics
988: of simulated annealing and how they relate to the trembling hand perfect
989: refinement of Nash equilibrium. It has been argued that the trembles that
990: underlie global optimization by simulated annealing are analogous to the
991: ``mistakes'' of trembling hand perfection, in that they present a means of
992: moving from local equilibria. The main contribution of this paper has been
993: to apply simulated annealing to solve a game that is known to possess both a
994: perfect equilibrium and ``nonsensical'' subgame perfect equilibrium.
995: Preliminary results indicate a convergence to the perfect equilibrium, with
996: a mixing strategy occurring for two of the three players.
997: 
998: 
999: 
1000: 
1001: 
1002: 
1003: \begin{thebibliography}{199}
1004: \bibitem{Besag74}  Besag, J. (1974) Spatial interaction and the statistical
1005: analysis of lattice systems (with discussion). \emph{Journal of the Royal
1006: Statistical Society Series B} 36, 192--236.
1007: 
1008: %\bibitem{Chan89}  Chan, K.S. (1989) A note on the geometric ergodicity of a
1009: %Markov chain. \emph{Advances in Applied Probability} 21, 702--704.
1010: 
1011: %\bibitem{Chan93}  Chan, K.S. (1993) Asymptotic behaviour of the Gibbs
1012: %sampler. \emph{Journal of the American Statistical Association} 88, 320--326.
1013: 
1014: \bibitem{Fried91}  Friedman, J.W. (1991) \emph{Game Theory with Applications
1015: to Economics}. Oxford University Press, Oxford.
1016: 
1017: %\bibitem{Fro}  Fr\"{o}berg, C. (1979) \emph{Introduction to Numerical
1018: %Analysis} (2nd ed.) Addison-Wesley, Reading, Ma.
1019: 
1020: %\bibitem{G+G84}  Geman, S. and Geman, D. (1984) Stochastic relaxation, Gibbs
1021: %distributions and Bayesian restoration of images. \emph{IEEE Transactions on
1022: %Pattern Recognition and Machine Intelligence} PAMI 6(6), 721--741.
1023: 
1024: %\bibitem{Gel+S90}  Gelfand and Smith (1990), Sampling based approaches to
1025: %calculating marginal densities. \emph{Journal of the American Statistical
1026: %Society} 85, 398--409.
1027: 
1028: \bibitem{GT80}  Georgobiani, D. A. and Torondzadze, A. F (1980) Solution of
1029: rectangular games by the Monte Carlo method. \emph{Trudy Vychisl. Tsentra Akad. Nauk Gruzin. SSR}
1030: 20(2), 5--10.
1031: 
1032: \bibitem{GRS96}  Gilks, W.R., Richardson, S. Spiegelhalter, D.J. (1996)
1033: Introducing Markov Chain Monte Carlo. In Gilks, W.R., Richardson, S.
1034: Spiegelhalter, D.J. (Eds..) \emph{Markov Chain Monte Carlo in Practice},
1035: 1--19. Chapman and Hall, London.
1036: 
1037: \bibitem{Gilks96}  Gilks, W.R. (1996) Full conditional distributions. In
1038: Gilks, W.R., Richardson, S. Spiegelhalter, D.J. (Eds.) \emph{Markov Chain
1039: Monte Carlo in Practice}, 75--88. Chapman and Hall, London.
1040: 
1041: 
1042: %\bibitem{gin} Gintis, H., \textit{Game Theory Evolving} Princeton University
1043: %Press 2000
1044: 
1045: %\bibitem{G+S92}  Grimmet, G.R. and Stirzaker, D.R. (1992) \emph{Probability
1046: %and Random Processes}. Oxford University Press, Oxford.
1047: 
1048: \bibitem{harsanyi} Harsanyi, J.C., (1975) The tracing procedure: a
1049: Bayesian approach to defining a solution for $n$-person non-cooperative games.
1050: \emph{International Journal of Game Theory} 4, 1-22.
1051: 
1052: \bibitem{H+S}  Harsanyi, J.C. and Selten, R. (1988) \emph{A General Theory
1053: of Equilibrium Selection in Games}. MIT Press, Cambridge, MA.
1054: 
1055: \bibitem{Hast70}  Hastings, W.K. (1970) Monte Carlo sampling methods using
1056: Markov chains and their application. \emph{Biometrika} 57, 97--109.
1057: 
1058: %\bibitem{Ko+M86}  Kohlberg, E. and Mertons, J-F. (1986) On the strategic
1059: %stability of equilibria. \emph{Econometrica} 54, 1003--1039.
1060: 
1061: %\bibitem{K+W82}  Kreps, D.M. and Wilson, R. (1982) Sequential equilibrium.
1062: %\emph{Econometrica} 50, 863--894.
1063: 
1064: \bibitem{Kuhn53}  Kuhn, H.W. (1953) Extensive games and the problem of
1065: information. In Kuhn, H.W. and Tucker, A.W. \emph{Contributions to the
1066: Theory of Games Vol I}, 193--216. Princeton University Press, Princeton N.J.
1067: 
1068: \bibitem{L+H}  Lempke, C.E. and Howson, J.T. (1964) Equilibrium points of
1069: bimatrix games. \emph{SIAM Journal on Applied Mathematics} 12, 413--423.
1070: 
1071: %\bibitem{maysmith} Maynard Smith, J. and Price G.,R., (1973) \textit{The
1072: %Logic of Animal Conflict}, Nature, 246, 15-18
1073: 
1074: %\bibitem{McKMcL96}  McKelvey, R.D. and McLennan A. (1996) Computation of
1075: %Equilibria in Finite Games. \emph{Handbook of Computational Economics Vol. 1}
1076: %. Elsevier Science B.V., Amersterdam.
1077: 
1078: %\bibitem{McKP95}  McKelvey, R.D. and Palfrey, T.R. (1995) Quantal Response
1079: %Equilibria for Normal Form Games. \emph{Games and Economic Behavior} 10,
1080: %6-38.
1081: 
1082: %\bibitem{McKP98}  McKelvey, R.D. and Palfrey, T.R. (1998) Quantal Response
1083: %Equilibria for Extensive Form Games. \emph{Experimental Economics} 1, 9-41.
1084: 
1085: \bibitem{MRRTT53}  Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N.,
1086: Teller, A.H., Teller, E., (1953) Equations of state calculations by fast
1087: computing machines. \emph{Journal of Chemistry Physics} 21, 1087--1091.
1088: 
1089: \bibitem{Myers78}  Myerson, R.B. (1978) Refinements of the concept of Nash
1090: equilibrium. \emph{International Journal of Game Theory} 7, 73--80.
1091: 
1092: \bibitem{Myers91}  Myerson, R.B. (1991) \emph{Game Theory: Analysis of
1093: Conflict.} Harvard University Press, Cambridge, MA.
1094: 
1095: %\bibitem{Okada81}  Okada, A. (1981) On stability of perfect equilibrium
1096: %points. \emph{International Journal of Game Theory} 10, 67-73.
1097: 
1098: 
1099: %\bibitem{osb} Osbourne, M.J., and Rubinstein, A., (1994)
1100: %\emph{A Course in Game Theory} MIT Press
1101: 
1102: %\bibitem{Rob94}  Robert, C.P. (1994) Discussion. In Tierney, L. Markov
1103: %chains for exploring posterior distributions. \emph{Annals of Statistics}
1104: %22(4), 1742--1747.
1105: 
1106: \bibitem{Rob96}  Roberts, G.O. (1996) Markov chain concepts related to
1107: sampling algorithms. In Gilks, W.R., Richardson, S. Spiegelhalter, D.J.
1108: (Eds.) \emph{Markov Chain Monte Carlo in Practice}, 45--57. Chapman and
1109: Hall, London.
1110: 
1111: \bibitem{Scar73}  Scarf, H.E. (1973) \emph{Computation of Economic
1112: Equilibria.} Yale University Press, New Haven, Conn.
1113: 
1114: \bibitem{Selt75}  Selten, R. (1975) Reexamination of the Perfectness Concept
1115: for Equilibrium Concepts in Extensive Form Games. \emph{International
1116: Journal of Game Theory }4, 25--55.
1117: 
1118: \bibitem{Selt78}  Selten, R. (1978) The Chain Store Paradox. \emph{Theory
1119: and Decision} 9, 127--159.
1120: 
1121: %\bibitem{SBGI96}  Spiegelhalter, D.J., Best, N.G., Gilks, W.R. and Inskip,
1122: %H. (1996) Hepatitis B: a case study in MCMC methods. In Gilks, W.R.,
1123: %Richardson, S. Spiegelhalter, D.J. (Eds.) \emph{Markov Chain Monte Carlo in
1124: %Practice}, 20--43. Chapman and Hall, London.
1125: 
1126: %\bibitem{S+C91}  Schwervish, M.J. and Carlin, B.P. (1992) On the convergence
1127: %of successive substitution sampling. \emph{Journal of Computational and
1128: %Graphical Statistic}s 1 111--127.
1129: 
1130: %\bibitem{Tiern94}  Tierney, L. (1994) Markov chains for exploring posterior
1131: %distributions (with discussion). \emph{Annals of Statistics} 22(4),
1132: %1701--1762
1133: 
1134: %\bibitem{Tiern96}  Tierney, L. (1996) Introduction to general state-space
1135: %Markov chain theory. In Gilks, W.R., Richardson, S. Spiegelhalter, D.J.
1136: %(Eds..) \emph{Markov Chain Monte Carlo in Practice}, 59--74. Chapman and
1137: %Hall, London.
1138: 
1139: \bibitem{Ulam50}  Ulam, S. (1954) Applications of Monte Carlo methods to
1140: tactical games. In Meyer, H.A. (Ed.)\emph{\ Symposium on Monte Carlo
1141: Methods, University of Florida 1954}, p. 63. John Wiley and Sons, New York.
1142: 
1143: \bibitem{vDam91}  van Damme, E. (1991) \emph{Stability and Perfection of
1144: Nash Equilibria (2nd ed. rev. enl.)}. Springer-Verlag, Berlin.
1145: 
1146: \bibitem{LA}  van Laarhoven, P.J.M. and Aarts, E.H.L. (1987) \emph{Simulated
1147: Annealing: Theory and Applications}. D. Reidel Publishing, Dordrecht,
1148: Holland.
1149: 
1150: \bibitem{Wils71}  Wilson, R. (1971) Computing Equilibria of $N$-Person
1151: Games. \emph{SIAM Journal on Applied Mathematics} 21, 80--87.
1152: \end{thebibliography}
1153: 
1154: \end{document}
1155: