nlin0408040/z.tex
1: %------------------------------------------------------------------------------
2: % standard
3: \newcommand{\mytex}{}
4: \newcommand{\stdpackages}{
5:   \usepackage{amsmath}
6:   \usepackage{amssymb}
7:   \usepackage{amsfonts}
8:   \allowdisplaybreaks
9:   \usepackage{amsthm}
10:   \usepackage{eucal}
11:   %\usepackage{\mytex ntheorem}
12:   %\usepackage{\mytex calrsfs}
13:   %\usepackage{\mytex calligra}
14:   \usepackage{graphicx}
15:   \usepackage{color}
16:   %\usepackage{psfrag}
17:   \usepackage{multicol} 
18:   \usepackage{fancyhdr}
19:   \renewcommand{\headrulewidth}{.0pt}\renewcommand{\footrulewidth}{.0pt}\cfoot{}
20:   %\setlength{\headsep}{10mm}
21:   \fancyhead[OL]{\it\theauthor---\today}
22:   %\fancyhead[OL]{\rightmark}
23:   \fancyhead[ER]{\leftmark}
24:   \fancyhead[OR,EL]{\thepage}
25:   \fancyfoot[EL,OR]{}
26: 
27:   \newcommand{\draft}{\usepackage[light,first]{draftcopy}\draftcopyName{draft}{350}}
28:   \newcommand{\labels}{\usepackage{\mytex showlabels}}
29:   \newcommand{\maple}{\usepackage{maple2e}}
30:   \newcommand{\makeidx}{\usepackage{makeidx}\makeindex}
31:   \newcommand{\chicago}{\usepackage{\mytex chicago}\bibliographystyle{\mytex chicago}
32:     \renewcommand{\refname}{References\thispagestyle{empty}\renewcommand{\refname}{}}}
33:   \newcommand{\numberlines}{
34:     \usepackage[mathlines,modulo]{\mytex lineno} %options: pagewise, modulo, mathlines
35:     \newcommand{\BM}{\begin{linenomath}}
36:     \newcommand{\EM}{\end{linenomath}}
37:     \linenumbers
38:     \modulolinenumbers[5]
39:   }%\newcommand{\BM}{}\newcommand{\EM}{}
40:   \newcommand{\pdflatex}{
41:     \definecolor{bluecol}{rgb}{0,0,.5}
42:     \definecolor{greencol}{rgb}{0,.6,0}
43:     \usepackage[
44:     pdftex,
45: %    letterpaper,
46:     bookmarks,
47:     bookmarksnumbered,
48:     colorlinks,
49:     urlcolor=bluecol,
50:     citecolor=bluecol,
51:     linkcolor=bluecol,
52:     pagecolor=bluecol,
53:     pdfborder={0 0 0},
54: %    backref,     %link from bibliography back to sections
55: %    pagebackref, %link from bibliography back to pages
56: %    pdfstartview=FitH, %fitwidth instead of fit window
57:     pdfpagemode=None, %UseOutlines, %bookmarks are displayed by acrobat
58: %    pdftitle={\thetitle},
59:     pdfauthor={Marc Toussaint}
60:     ]{hyperref}
61:     \DeclareGraphicsExtensions{.jpg,.pdf}
62:     \renewcommand{\r}{\varrho}
63:     \renewcommand{\l}{\lambda}
64:     \renewcommand{\L}{\Lambda}
65:     \renewcommand{\s}{\sigma}
66:     \renewcommand{\O}{\Omega}
67:     \renewcommand{\SS}{{\cal S}}
68:     \renewcommand{\boldsymbol}{}
69:     %\renewcommand{\Chapter}{\chapter}
70:     %\renewcommand{\Subsection}{\subsection}
71:   }
72: }
73: \newcommand{\stdtheorems}{
74:   \theoremstyle{plain}
75:   \newtheorem{theorem}{Theorem}[section]
76:   \newtheorem{lemma}[theorem]{Lemma}
77:   \newtheorem{corollary}[theorem]{Corollary}
78:   \newtheorem{proposition}{Proposition}[section]
79:   \newtheorem{result}{Result}[section]
80:   \newtheorem{hypothesis}{Hypothesis}[section]
81:   \theoremstyle{definition}
82:   \newtheorem{definition}{Definition}[section]
83:   \theoremstyle{remark}
84:   \newtheorem{remark}{Remark}[section]
85:   \newtheorem{example}{Example}[section]
86: }
87: \newcommand{\stdstyle}[1]{
88:   \stdpackages
89:   \stdtheorems
90:   \renewcommand{\labelenumi}{\textbf{(\roman{enumi})}}
91:   \renewcommand{\theenumi}{(\roman{enumi})} %for ref
92:   %\renewcommand{\labelenumi}{${}^{\bf (\roman{enumi})}$}
93:   %\renewcommand{\labelitemi}{\bf $\cdot$}
94:   \newcommand{\itemdot}{\renewcommand{\labelitemi}{\bf $\cdot$}}
95:   \newcommand{\enumA}{\renewcommand{\labelenumi}{\textbf{\Alph{enumi}}}}
96:   \newcommand{\blockindent}{3ex}
97:   \renewcommand{\baselinestretch}{#1}
98:   \renewcommand{\arraystretch}{1.2}
99:   %\renewcommand{\textfloatsep}{3ex}
100:   %\setlength{\mathindent}{2.5em}
101:   %\setlength{\jot}{0pt} %zwischen den math zeilen
102:   %\setlength{\abovedisplayskip}{-10pt}
103:   %\setlength{\belowdisplayskip}{-10pt}
104:   %\setlength{\mathsurround}{-10pt}
105:   %\renewcommand{\floatsep}{-1ex}
106:   \renewcommand{\topfraction}{1}
107:   \renewcommand{\bottomfraction}{1}
108:   \renewcommand{\textfraction}{0}
109:   \columnsep 5ex
110:   \parindent 3ex
111:   \parskip 1ex
112: 
113:   % Lists and paragraphs
114:   \parindent 0pt
115:   \topsep 4pt plus 1pt minus 2pt
116:   \partopsep 1pt plus 0.5pt minus 0.5pt
117:   \itemsep 2pt plus 1pt minus 0.5pt
118:   \parsep 2pt plus 1pt minus 0.5pt
119:   \parskip .5pc
120: 
121:   \setcounter{tocdepth}{3}
122:   \setcounter{secnumdepth}{3}
123: 
124:   \usepackage{\mytex geometry}
125:   \geometry{a4paper,hdivide={35mm,*,35mm},vdivide={35mm,*,35mm}}
126: 
127:   %\usepackage{layout}\layout
128: 
129:   %\thispagestyle{fancy}
130:   %\pagestyle{fancy}
131: 
132:   \renewenvironment{abstract}
133:     {\vspace*{5ex}\begin{rblock}\hrule\vspace{2ex}{\bf Abstract.~}\small}
134:     {\vspace{3ex}\hrule\end{rblock}\vspace{5ex}}
135:   \usepackage{\mytex mt}
136: }
137: \newcommand{\cleardefs}{
138:   \renewcommand{\article}[2]{}
139:   \renewcommand{\book}[2]{}
140:   \renewcommand{\draft}{}
141:   \renewcommand{\labels}{}
142:   \renewcommand{\maple}{}
143:   \renewcommand{\makeidx}{}
144:   \renewcommand{\chicago}{}
145:   \renewcommand{\pdflatex}{}
146:   \renewcommand{\header}{}
147: }
148: 
149: % A0  1189 x 841 mm   1,000 qm
150: % A1  841 x 594 mm    0,500 qm
151: % A2  594 x 420 mm    0,25O qm
152: % A3  420 x 297 mm    0,125 qm
153: % A4  297 x 210 mm    0,063 qm
154: % A5  210 x 148 mm    0,032 qm
155: % A6  148 x 105 mm    0,016 qm
156: % A7  105 x 74 mm     0,008 qm
157: % A8  74 x 52 mm      0,004 qm
158: % A9  37 x 52 mm      0,002 qm
159: % A10 26 x 37 mm      0,001 qm
160: % B0  1414 x 1000 mm  14.140 qcm
161: % B1  1000 x 707 mm   7.070 qcm
162: % B2  707 x 500 mm    3.535 qcm
163: % B3  500 x 353 mm    1.765 qcm
164: % B4  353 x 250 mm    882 qcm
165: % B5  250 x 176 mm    440 qcm
166: % B6  176 x 125 mm    220 qcm
167: % C0  1297 x 917 mm   11.894 qcm
168: % C1  917 x 648 mm    5.942 qcm
169: % C2  648 x 458 mm    2.968 qcm
170: % C3  458 x 324 mm    1.484 qcm
171: % C4  324 x 229 mm    742 qcm
172: % C5  229 x 162 mm    371 qcm
173: % C6  162 x 115 mm    186 qcm
174: % C7  115 x 81 mm     93 qcm
175: 
176: 
177: %------------------------------------------------------------------------------
178: % classes
179: 
180: \newcommand{\article}[2]{
181:   \documentclass[#1pt,twoside,fleqn]{article}
182:   \stdstyle{#2}
183:   \macros
184:   \newcommand{\mytitle}{
185:     \thispagestyle{empty}
186:     \mbox{~}
187:     \begin{list}{}{\leftmargin6ex \rightmargin6ex \topsep0ex \parsep3ex}\item[]
188:       \begin{center}
189:         {\LARGE\bf \thetitle \\}
190: 
191:         \vspace{5ex}
192:         {\large \theauthor}
193: 
194:         {\footnotesize{\sl \address}\\ \email}
195: 
196:         {\footnotesize \today}
197: 
198:         \vspace{1ex}
199:         {\small \published}
200:       \end{center}
201:     \end{list}
202:     \renewcommand{\mytitle}{\chapter{\thetitle}}
203:   }
204: }
205: \newcommand{\nips}{
206:   \documentclass{article}
207:   \usepackage{\mytex nips2003e,times}
208:   \stdpackages\macros
209: }
210: \newcommand{\ijcnn}{
211:   \documentclass[10pt,twocolumn]{\mytex ijcnn}
212:   %\documentclass[10pt,twocolumn]{article}\usepackage{\mytex wcci}
213:   \stdpackages\macros
214:   \bibliographystyle{abbrv} 
215: }
216: \newcommand{\springer}{
217:   \documentclass{\mytex springer_llncs}
218:   \renewcommand{\theenumi}{\alph{enumi}}
219:   \renewcommand{\labelenumi}{(\alph{enumi})}
220:   \renewcommand{\labelitemi}{$\bullet$}
221:   \stdpackages\macros
222: }
223: \newcommand{\foga}{
224:   \documentclass{article} 
225:   \stdpackages\macros
226:   \usepackage{\mytex foga-02}
227:   \usepackage{\mytex chicago}
228:   \bibliographystyle{\mytex foga-chicago}
229: }
230: \newcommand{\book}[2]{
231:   \documentclass[#1pt,twoside,fleqn]{book}
232:   \newenvironment{abstract}{\begin{rblock}{\bf Abstract.~}\small}{\end{rblock}}
233:   \stdstyle{#2}
234:   %\renewcommand{\thechapter}{\Roman{chapter}}
235:   \newcommand{\mytitle}{
236:     \thispagestyle{empty}
237:     \mbox{~}
238:     \begin{list}{}{\leftmargin4ex \rightmargin4ex \topsep10ex \parsep3ex}\item[]
239:       \begin{center}
240:         {\LARGE \thetitle \\}
241: 
242:         \vspace{8ex}
243:         {\large \theauthor}
244: 
245:         {\footnotesize{\sl \address}\\ \email}
246: 
247:         {\footnotesize \today}
248: 
249:         \vspace{1ex}
250:         {\small \published}
251:       \end{center}
252:     \end{list}
253:     \renewcommand{\mytitle}{\chapter{\thetitle}}
254:   }
255:   \macros
256: }
257: 
258: \newcommand{\slides}{
259:   \documentclass[fleqn]{article}
260:   \stdpackages
261:   \stdtheorems
262:   \renewcommand{\baselinestretch}{1}
263:   \renewcommand{\arraystretch}{1.2}
264: 
265:   \usepackage{\mytex geometry}
266:   \geometry{
267:     a4paper,landscape,
268:     headheight=30mm,
269:     headsep=0mm,
270:     footskip=5mm,
271:     hdivide={10mm,*,10mm},vdivide={30mm,*,8mm}}
272: 
273:   \columnsep 0mm
274:   \columnseprule 0pt
275:   \parindent 0ex
276:   \parskip 0ex
277:   \setlength{\itemsep}{8ex}
278:   \renewcommand{\labelitemi}{\rule[.4ex]{.6ex}{.6ex}~}
279: 
280:   \pagestyle{fancy}
281:   \renewcommand{\headrulewidth}{0pt} %1pt}
282:   \renewcommand{\footrulewidth}{0pt}
283:   \renewcommand{\labelenumi}{\textbf{\arabic{enumi}.}~~}
284:   \newcommand{\theauthor}{Marc Toussaint} 
285:   \rhead{}
286:   \lhead{}
287:   \rfoot{\thepage}
288: 
289:   \definecolor{grey}{rgb}{.9,.9,.9}
290:   \newcommand{\inverted}{
291:     \definecolor{main}{rgb}{1,1,1}
292:     \color{main}
293:     \pagecolor[rgb]{.3,.3,.3}
294:   }
295: 
296:   \macros
297: 
298:   \newcommand{\mytitle}{\huge\sf}
299: }
300: 
301: \newenvironment{titleslide}[2][30mm]{
302:   \onecolumn
303:   \lhead{{{\Huge\textsf{\quad#2}}\\}}
304:   \begin{center}\begin{list}{\labelitemi}{\leftmargin#1 \rightmargin#1
305:       \labelsep1ex \labelwidth3ex \topsep0pt}\item[]
306:     ~\vfill
307:     \begin{center}
308:       {\Huge\sc \thetitle}\\[3ex]
309:       \theauthor\\{\Large\ini}\\{\Large\email}
310:     \end{center}
311:     ~\vfill
312: }{
313:     ~\vfill
314:   \end{list}\end{center}
315: }
316: 
317: \newenvironment{slide}[2][30mm]{
318:   \onecolumn
319:   \lhead{{{\Huge\textsf{#2}}\\}}
320:   %\setlength{\unitlength}{1mm}
321:   %\begin{picture}(0,0)(20,-34)
322:   %\put(0,-35){\color{grey}\rule{296mm}{30mm}}
323:   %\put(0,-214){\color{grey}\rule{296mm}{10mm}}
324:   %\end{picture}
325:   \begin{center}\begin{list}{\labelitemi}{\leftmargin#1 \rightmargin#1
326:       \labelsep1ex \labelwidth3ex \topsep0pt}\huge\sf\item[]%\vfill
327: }{
328:   %\vfill
329:   \end{list}\end{center}
330: }
331: 
332: \newenvironment{slidetwo}[2][15mm]{
333:   \twocolumn
334:   \lhead{{{\Huge\textsf{#2}}\\}}
335:   \begin{center}\begin{list}{\labelitemi}{\leftmargin#1 \rightmargin#1
336:       \labelsep1ex \labelwidth3ex \topsep0pt}\huge\sf\item[]%\vfill
337: }{
338:   %\vfill
339:   \end{list}\end{center}
340: }
341: %\newcommand{\slidebreak}{\vfill\pagebreak\item[]\vfill}
342: \newcommand{\slidebreak}{\pagebreak\item[]}
343: 
344: \newcommand{\poster}{
345:   \documentclass[fleqn]{article}
346:   \stdpackages
347:   \renewcommand{\baselinestretch}{1}
348:   \renewcommand{\arraystretch}{1.8}
349: 
350:   \usepackage{\mytex geometry}
351:   \geometry{
352:     paperwidth=1189mm,
353:     paperheight=841mm, %841mm, %91.3cm, % 120cm
354: %    landscape,
355:     headheight=0mm,
356:     headsep=0mm,
357:     footskip=0mm,
358:     hdivide={5mm,*,5mm},vdivide={5mm,*,5mm}}
359: 
360: %\textwidth  86.3cm     %  Paper=91.3cm 
361: %\textheight  108cm     %  Paper=???,  banner=5cm 
362: %\oddsidemargin  0pt 
363: %\parindent  0pt 
364: %\parskip  0pt 
365: %\topmargin  1cm 
366: %\footskip  0pt 
367: %\headheight  0pt 
368: %\headsep  0pt 
369: 
370:   \setlength{\columnsep}{0ex}
371:   \columnseprule 3pt
372:   \renewcommand{\labelitemi}{\rule[.4ex]{.6ex}{.6ex}~}
373: 
374:   \pagestyle{fancy}
375:   \renewcommand{\headrulewidth}{0pt}
376:   \renewcommand{\footrulewidth}{0pt}
377:   \renewcommand{\labelenumi}{\textbf{(\roman{enumi})}}
378:   \newcommand{\theauthor}{Marc Toussaint}
379:   \rhead{}
380:   \lhead{}
381:   \rfoot{}
382: 
383:   \definecolor{grey}{rgb}{.9,.9,.9}
384:   \newcommand{\inverted}{
385:     \definecolor{main}{rgb}{1,1,1}
386:     \color{main}
387:     \pagecolor[rgb]{.3,.3,.3}
388:   }
389: 
390:   \macros
391: }
392: \newenvironment{postersection}[1]{
393: \vspace{1cm}
394: \section{#1}
395: \begin{list}{\labelitemi}{\leftmargin4ex \rightmargin3ex
396:       \labelsep1ex \labelwidth2ex \topsep0pt \parsep2ex}\item[]
397: }{
398: \end{list}
399: }
400: 
401: 
402: %------------------------------------------------------------------------------
403: % title page
404: 
405: \author{Marc Toussaint}
406: 
407: \newcommand{\inilogo}[1][.25]{\includegraphics[scale=#1]{\mytex INI}}
408: \newcommand{\rublogo}[1][.25]{\includegraphics[scale=#1]{\mytex RUB}}
409: 
410: \newcommand{\addressCologne}{
411:   Institute for Theoretical Physics\\
412:   University of Cologne\\
413:   50923 K\"oln---Germany\\
414:   {\tt mt@thp.uni-koeln.de}\\
415:   {\tt www.thp.uni-koeln.de/\~{}mt/}
416: }
417: 
418: \newcommand{\ini}{Institut f\"ur Neuroinformatik, Ruhr-Universit\"at Bochum, Germany}
419: \newcommand{\homepageINI}{\texttt{www.neuroinformatik.rub.de/PEOPLE/mt/}}
420: \newcommand{\emailINI}{mt@neuroinformatik.ruhr-uni-bochum.de}
421: \newcommand{\phoneINI}{+49-234-32-27974}
422: \newcommand{\faxINI}{+49-234-32-14209}
423: \newcommand{\phone}{+44 131 650 3089}
424: \newcommand{\fax}{+44 131 650 6899}
425: \newcommand{\email}{mtoussai@inf.ed.ac.uk}
426: \newcommand{\homepage}{homepages.inf.ed.ac.uk/mtoussai}
427: 
428: \newcommand{\addressINI}{
429:   Institut~f\"ur~Neuroinformatik,
430:   Ruhr-Universit\"at~Bochum, ND~04,
431:   44780~Bochum---Germany
432: }
433: \newcommand{\AddressINI}{
434:   Institut~f\"ur~Neuroinformatik\\
435:   Ruhr-Universit\"at Bochum, ND~04\\
436:   44780~Bochum---Germany
437: }
438: \newcommand{\address}{
439:   Institute~for~Adaptive~and~Neural~Computation,
440:   University~of~Edinburgh, 5~Forrest~Hill,
441:   Edinburgh~EH1~2QL, Scotland,~UK
442: }
443: \newcommand{\Address}{
444:   Institute~for~Adaptive~and~Neural~Computation\\
445:   University~of~Edinburgh, 5~Forrest~Hill\\
446:   Edinburgh~EH1~2QL, Scotland,~UK
447: }
448: 
449: \newcommand{\published}{}
450: 
451: %------------------------------------------------------------------------------
452: % environments / commands
453: 
454: \newlength{\subsecwidth}
455: 
456: \newcommand{\subsec}[1]{
457:   \addtocontents{toc}{
458:     \protect\setlength{\subsecwidth}{\textwidth}\protect\addtolength{\subsecwidth}{-27ex}
459:       \protect\vspace*{-1.5ex}\protect\hspace*{20ex}
460:       \protect\begin{minipage}[t]{\subsecwidth}\protect\footnotesize\protect\textsf{#1}\protect\end{minipage}
461:       \protect\par
462:   }
463:   \begin{rblock}\it #1\end{rblock}\medskip\noindent
464: }
465: \newcommand{\tocsep}{
466:   \addtocontents{toc}{\protect\bigskip}
467: }
468: \newcommand{\Chapter}[1]{
469: \chapter*{#1}\thispagestyle{empty}
470: \addcontentsline{toc}{chapter}{\protect\numberline{}#1}
471: }
472: \newcommand{\Section}[1]{
473:   \section*{#1}
474:   \addcontentsline{toc}{section}{\protect\numberline{}#1}
475: }
476: \newcommand{\Subsection}[1]{
477:   \subsection*{#1}
478:   \addcontentsline{toc}{subsection}{\protect\numberline{}#1}
479: }
480: 
481: \newcommand{\content}[1]{
482: %  \begin{rblock}\it #1\end{rblock}\medskip
483: %  \addtocontents{toc}{\protect\begin{list}{}{\leftmargin9ex
484: %        \rightmargin9ex \topsep-2ex \parsep.5ex}}
485: %  \addtocontents{toc}{\protect\item[] \protect\small\protect\it #1}
486: %  \addtocontents{toc}{\protect\end{list}\protect\medskip}
487: }
488: 
489: \newcommand{\sepline}[1][200]{
490:   \begin{center} \begin{picture}(#1,0)
491:     \line(1,0){#1}
492:   \end{picture}\end{center}
493: }
494: 
495: \newcommand{\sepstar}{
496:   \begin{center} {\vspace{0.5ex}\rule[1.2ex]{5ex}{.1pt}~*~\rule[1.2ex]{5ex}{.1pt}} \end{center}\vspace{-1.5ex}\noindent
497: }
498: 
499: \newcommand{\partsection}[1]{
500:   \vspace{5ex}
501:   \centerline{\sc\LARGE #1}
502:   \addtocontents{toc}{\contentsline{section}{{\sc #1}}{}}
503: }
504: 
505: \newcommand{\intro}[1]{\textbf{#1}\index{#1}}
506: 
507: 
508: \newcounter{parac}
509: \newcommand{\para}{\noindent\refstepcounter{parac}{\bf [{\roman{parac}}]}~~}
510: \newcommand{\Pref}[1]{[\emph{\ref{#1}}\,]}
511: 
512: 
513: 
514: \newenvironment{items}{
515: \begin{list}{}{\leftmargin1ex \topsep-\parskip}
516: \item[]
517: }{
518: \end{list}
519: }
520: 
521: \newenvironment{block}[1][]{{\noindent\bf #1}
522: \begin{list}{}{\leftmargin\blockindent \topsep-\parskip}
523: \item[]
524: }{
525: \end{list}
526: }
527: 
528: \newenvironment{rblock}{
529: \begin{list}{}{\leftmargin\blockindent \rightmargin\blockindent \topsep-\parskip}\item[]}{\end{list}}
530: 
531: \newenvironment{algorithm}{
532: \begin{list}{\raisebox{.3ex}{\footnotesize\bf\arabic{enumi}.}}
533: {\usecounter{enumi} \leftmargin7ex \rightmargin7ex \labelsep1ex
534:   \labelwidth5ex \topsep1ex \parsep.5ex \itemsep0pt} \small\sf
535: }{
536: \end{list}
537: }
538: 
539: %\newenvironment{keywords}{\paragraph{Keywords}\begin{rblock}\small}{\end{rblock}}
540: 
541: \newenvironment{colpage}{
542: \addtolength{\columnwidth}{-3ex}
543: \begin{minipage}{\columnwidth}
544: \vspace{.5ex}
545: }{
546: \vspace{.5ex}
547: \end{minipage}
548: }
549: 
550: \newenvironment{enum}{
551: \begin{list}{}{\leftmargin3ex \topsep0ex \itemsep0ex}
552: \item[\labelenumi]
553: }{
554: \end{list}
555: }
556: 
557: \newenvironment{cramp}{
558: \begin{quote} \begin{picture}(0,0)
559:         \put(-5,0){\line(1,0){20}}
560:         \put(-5,0){\line(0,-1){20}}
561: \end{picture}
562: }{
563: \begin{picture}(0,0)
564:         \put(-5,5){\line(1,0){20}}
565:         \put(-5,5){\line(0,1){20}}
566: \end{picture} \end{quote}
567: }
568: 
569: %------------------------------------------------------------------------------
570: % symbol & operator macros
571: 
572: \newcommand{\macros}{
573:   \newcommand{\0}{{\hat 0}}
574:   \newcommand{\1}{{\hat 1}}
575:   \newcommand{\2}{{\hat 2}}
576:   \newcommand{\3}{{\hat 3}}
577:   \newcommand{\5}{{\hat 5}}
578: 
579:   \renewcommand{\a}{\ensuremath\alpha}
580:   \renewcommand{\b}{\beta}
581:   \renewcommand{\c}{\gamma}
582:   \renewcommand{\d}{\delta}
583:     \newcommand{\D}{\Delta}
584:     \newcommand{\e}{\epsilon}
585:     \newcommand{\g}{\gamma}
586:     \newcommand{\G}{\Gamma}
587:   \renewcommand{\l}{\lambda}
588:   \renewcommand{\L}{\Lambda}
589:     \newcommand{\m}{\mu}
590:     \newcommand{\n}{\nu}
591:     \newcommand{\N}{\nabla}
592:   \renewcommand{\k}{\kappa}
593:   \renewcommand{\o}{\omega}
594:   \renewcommand{\O}{\Omega}
595:     \newcommand{\p}{\phi}
596:     \newcommand{\ph}{\varphi}
597:   \renewcommand{\P}{\Phi}
598:   \renewcommand{\r}{\varrho}
599:     \newcommand{\s}{\sigma}
600:     \newcommand{\Si}{\Sigma}
601:   \renewcommand{\t}{\theta}
602:     \newcommand{\T}{\Theta}
603:   \renewcommand{\v}{\vartheta}
604:     \newcommand{\x}{\xi}
605:     \newcommand{\X}{\Xi}
606:     \newcommand{\Y}{\Upsilon}
607: 
608:   \renewcommand{\AA}{{\cal A}}
609:     \newcommand{\BB}{{\cal B}}
610:     \newcommand{\CC}{{\cal C}}
611:     \newcommand{\EE}{{\cal E}}
612:     \newcommand{\FF}{{\cal F}}
613:     \newcommand{\GG}{{\cal G}}
614:     \newcommand{\HH}{{\cal H}}
615:     \newcommand{\II}{{\cal I}}
616:     \newcommand{\KK}{{\cal K}}
617:     \newcommand{\LL}{{\cal L}}
618:     \newcommand{\MM}{{\cal M}}
619:     \newcommand{\NN}{{\cal N}}
620:     \newcommand{\OO}{{\cal O}}
621:     \newcommand{\PP}{{\cal P}}
622:     \newcommand{\QQ}{{\cal Q}}
623:     \newcommand{\RR}{{\cal R}}
624:   \renewcommand{\SS}{{\cal S}}
625:     \newcommand{\TT}{{\cal T}}
626:     \newcommand{\uu}{{\cal u}}
627:     \newcommand{\UU}{{\cal U}}
628:     \newcommand{\XX}{{\cal X}}
629:     \newcommand{\YY}{{\cal Y}}
630:     \newcommand{\SOSO}{{\cal SO}}
631:     \newcommand{\GLGL}{{\cal GL}}
632: 
633:     \newcommand{\Ee}{{\rm E}}
634: 
635:   \newcommand{\NNN}{{\mathbb{N}}}
636:   \newcommand{\ZZZ}{{\mathbb{Z}}}
637:   %\newcommand{\RRR}{{\mathrm{I\!R}}}
638:   \newcommand{\RRR}{{\mathbb{R}}}
639:   \newcommand{\CCC}{{\mathbb{C}}}
640:   \newcommand{\one}{{{\bf 1}}}
641:   \newcommand{\eee}{\text{e}}
642: 
643:   \renewcommand{\[}{\Big[}
644:   \renewcommand{\]}{\Big]}
645:   \renewcommand{\(}{\Big(}
646:   \renewcommand{\)}{\Big)}
647:   \renewcommand{\|}{\big|}
648:   \newcommand{\<}{{\ensuremath\langle}}
649:   \renewcommand{\>}{{\ensuremath\rangle}}
650: 
651:   \newcommand{\Prob}{{\rm Prob}}
652:   \newcommand{\Aut}{{\rm Aut}}
653:   \newcommand{\cor}{{\rm cor}}
654:   \newcommand{\corr}{{\rm corr}}
655:   \newcommand{\cov}{{\rm cov}}
656:   \newcommand{\sd}{{\rm sd}}
657:   \newcommand{\tr}{{\rm tr}}
658:   \newcommand{\Tr}{{\rm Tr}}
659:   \newcommand{\id}{{\rm id}}
660:   \newcommand{\Gl}{{\rm Gl}}
661:   \newcommand{\lag}{\mathcal{L}}
662:   \newcommand{\inn}{\rfloor}
663:   \newcommand{\lie}{\pounds}
664:   \newcommand{\longto}{\longrightarrow}
665:   \newcommand{\speer}{\parbox{0.4ex}{\raisebox{0.8ex}{$\nearrow$}}}
666:   \renewcommand{\dag}{ {}^\dagger }
667:   \newcommand{\h}{{}^\star}
668:   \newcommand{\w}{\wedge}
669:   \newcommand{\too}{\longrightarrow}
670:   \newcommand{\To}{\Rightarrow}
671:   \newcommand{\Too}{\;\Longrightarrow\;}
672:   \newcommand{\oto}{\leftrightarrow}
673:   \newcommand{\ow}{\stackrel{\circ}\wedge}
674:   \newcommand{\feed}{\nonumber \\}
675:   \newcommand{\comma}{~,\quad}
676:   \newcommand{\period}{~.\quad}
677:   \newcommand{\del}{\partial}
678: %  \newcommand{\quabla}{\Delta}
679:   \newcommand{\point}{$\bullet~~$}
680:   \newcommand{\doubletilde}{
681:   ~ \raisebox{0.3ex}{$\widetilde {}$} \raisebox{0.6ex}{$\widetilde {}$} \!\!
682:   }
683:   \newcommand{\topcirc}{\parbox{0ex}{~\raisebox{2.5ex}{${}^\circ$}}}
684:   \newcommand{\topdot} {\parbox{0ex}{~\raisebox{2.5ex}{$\cdot$}}}
685:   \newcommand{\topddot} {\parbox{0ex}{~\raisebox{1.3ex}{$\ddot{~}$}}}
686:   \newcommand{\sym}{\topcirc}
687: 
688:   \newcommand{\half}{\frac{1}{2}}
689:   \newcommand{\third}{\frac{1}{3}}
690:   \newcommand{\fourth}{\frac{1}{4}}
691: 
692:   \newcommand{\ubar}{\underline}
693: 
694:   %\renewcommand{\vec}{\underline}
695:   \renewcommand{\vec}{\boldsymbol}
696:   \renewcommand{\_}{\underset}
697:   \renewcommand{\^}{\overset}
698:   %\renewcommand{\*}{{\rm\raisebox{-.6ex}{\text{*}}{}}}
699:   \renewcommand{\*}{\text{\footnotesize\raisebox{-.4ex}{*}{}}}
700: 
701:   \newcommand{\gto}{{\raisebox{.5ex}{${}_\rightarrow$}}}
702:   \newcommand{\gfrom}{{\raisebox{.5ex}{${}_\leftarrow$}}}
703:   \newcommand{\gnto}{{\raisebox{.5ex}{${}_\nrightarrow$}}}
704:   \newcommand{\gnfrom}{{\raisebox{.5ex}{${}_\nleftarrow$}}}
705: 
706:   \newcommand{\RND}{{\SS}}
707:   \newcommand{\IF}{\text{if }}
708:   \newcommand{\AND}{\textsc{and }}
709:   \newcommand{\OR}{\textsc{or }}
710:   \newcommand{\XOR}{\textsc{xor }}
711:   \newcommand{\NOT}{\textsc{not }}
712: }
713: 
714: %\newcommand{\argmax}[1]{{\rm arg}\!\max_{#1}}
715: %\newcommand{\argmin}[1]{{\rm arg}\!\min_{#1}}
716: \newcommand{\argmax}[1]{\underset{~#1}{\rm argmax}\;}
717: \newcommand{\argmin}[1]{\underset{~#1}{\rm argmin}\;}
718: \newcommand{\ee}[1]{\ensuremath{\cdot10^{#1}}}
719: \newcommand{\sub}[1]{\ensuremath{_{\text{#1}}}}
720: \newcommand{\up}[1]{\ensuremath{^{\text{#1}}}}
721: %\newcommand{\kld}[2]{D\big(#1\,\big|\!\big|\,#2\big)}
722: \newcommand{\kld}[2]{D\big(#1:#2\big)}
723: \newcommand{\sprod}[2]{\big<#1\,,\,#2\big>}
724: \newcommand{\End}{\text{End}}
725: \newcommand{\txt}[1]{\quad\text{#1}\quad}
726: \newcommand{\Over}[2]{\genfrac{}{}{0pt}{0}{#1}{#2}}
727: \newcommand{\mat}[1]{{\bf #1}}
728: \newcommand{\arr}[2]{\hspace*{-1ex}\begin{array}{#1}#2\end{array}\hspace*{-1ex}}
729: \newcommand{\matr}[2]{\left(\begin{array}{#1}#2\end{array}\right)}
730: \newcommand{\seq}[1]{\textsf{\<#1\>}}
731: \newcommand{\seqq}[1]{\textsf{#1}}
732: 
733: %------------------------------------------------------------------------------
734: % stuff
735: 
736: \newcommand{\url}[1]{\texttt{#1}}
737: \newcommand{\anchor}[2]{\begin{picture}(0,0)\put(#1){#2}\end{picture}}
738: \newcommand{\pagebox}{\begin{picture}(0,0)\put(-3,-23){
739: \textcolor[rgb]{.5,1,.5}{\framebox[\textwidth]{\rule[-\textheight]{0pt}{0pt}}}}
740: \end{picture}}
741: 
742: \newcommand{\pathmt}{./}
743: \newcommand{\basepath}{./}
744: \newcommand{\setpath}[1]{\renewcommand{\pathmt}{#1}\renewcommand{\basepath}{#1}}
745: \newcommand{\pathinput}[2]{
746:   \renewcommand{\pathmt}{\basepath #1}
747:   \input{\pathmt #2} \renewcommand{\pathmt}{\basepath}}
748: 
749: \newcommand{\hide}[1]{$\ll${\sf{\footnotesize #1}}$\gg$\message{^^JHIDE--Warning!^^J}}
750: %\newcommand{\hide}[1]{{\tt[hide:~}{\footnotesize\sf #1}{\tt]}\message{^^JHIDE--Warning!^^J}}
751: \newcommand{\Hide}{\renewcommand{\hide}[1]{\message{^^JHIDE--Warning (hidden)!^^J}}}
752: \newcommand{\HIDE}{\renewcommand{\hide}[1]{}}
753: \newcommand{\todo}[1]{{\tt[TODO: #1]}\message{^^JTODO--Warning: #1^^J}}
754: \newcommand{\Todo}{\renewcommand{\todo}[1]{\message{^^JTODO--Warning (hidden)!^^J}}}
755: \newcommand{\thetitle}{bla}
756: %\renewcommand{\title}[1]{\renewcommand{\thetitle}{#1}}
757: \newcommand{\header}{\begin{document}\mytitle\cleardefs}
758: \newcommand{\contents}{{\tableofcontents}\renewcommand{\contents}{}}
759: \newcommand{\footer}{\small\bibliography{\mytex bibs}\end{document}}
760: 
761: \article{10}{1}
762: \chicago
763: %\numberlines
764: %\usepackage{rotate}
765:  \Hide
766: 
767: \newcommand{\EM}{}\newcommand{\BM}{}
768: \newcommand{\pl}{{\protect\rule[.3ex]{1ex}{1ex}}}
769: \newcommand{\mi}{\ensuremath{\circ}}
770: \newcommand{\ze}{\ensuremath{\cdot}}
771: \newcommand{\rb}[1]{\hspace{4pt}\raisebox{-.5ex}{\rotatebox{90}{#1}}\hspace{4pt}}
772: 
773: \title{Notes on information geometry\\ and evolutionary processes}
774: 
775: \header
776: 
777: \begin{abstract}
778:   In order to analyze and extract different structural properties of
779:   distributions, one can introduce different coordinate systems over
780:   the manifold of distributions. In Evolutionary Computation, the
781:   Walsh bases and the Building Block Bases are often used to describe
782:   populations, which simplifies the analysis of evolutionary operators
783:   applying on populations. Quite independent from these approaches,
784:   information geometry has been developed as a geometric way to
785:   analyze different order dependencies between random variables (e.g.,
786:   neural activations or genes).
787:   
788:   In these notes I briefly review the essentials of various coordinate
789:   bases and of information geometry. The goal is to give an overview
790:   and make the approaches comparable. Besides introducing meaningful
791:   coordinate bases, information geometry also offers an explicit way
792:   to distinguish different order interactions and it offers a
793:   geometric view on the manifold and thereby also on operators that
794:   apply on the manifold. For instance, uniform crossover can be
795:   interpreted as an orthogonal projection of a population along an
796:   $m$-geodesic, monotonously reducing the $\t$-coordinates that
797:   describe interactions between genes.
798: \end{abstract}
799: 
800: \section{Introduction}
801: 
802: Evolution can be understood as a process on the space $\L$ of
803: distributions over the search $\O$. Essentially, a parent population
804: can be captured as a (finite) distribution $p \in \L$. Mutation and
805: recombination operators ($\MM \CC$) applied on the parent population
806: specify a search (offspring) distribution $q \in\L$. And a (stochastic) selection
807: operator ($\SS^\m\, \FF\, \SS^\n$) maps $q$ to a new parent population
808: $p'$. In this view, evolution can be understood as a process
809: \BM\begin{align*}
810: p
811:  ~\stackrel{\MM \CC}\longmapsto~ q
812:  ~\stackrel{\SS^\m \FF \SS^\n}\longmapsto~ p'
813:  ~\stackrel{\MM \CC}\longmapsto~ q'
814:  ~\stackrel{\SS^\m \FF \SS^\n}\longmapsto~ p''
815:  ~\stackrel{\MM \CC}\longmapsto~ \cdots
816: \end{align*}\EM
817: 
818: We do not need to go into the details of the indicated recombination,
819: mutation, and selection operators here. Instead, we would like to
820: emphasize an information theoretic point of view on this process.
821: Typically, the mapping $p \mapsto q$ (which one could also call search
822: heuristic) from the parent population to the search distribution adds
823: entropy whereas selection $q \mapsto p'$ reduces entropy. Another
824: interesting observable in this process is the \emph{structure} of the
825: distributions---by which we mean the mutual information present in
826: these distributions. For instance, one can show that ordinary mutation
827: and crossover operators (on a direct genetic representation) generally
828: reduce mutual information, i.e., destroy structural content that might
829: have been present in $p$ after selection \cite{toussaint:04-ecj}.
830: 
831: The analysis of the structure of distributions is an important topic
832: in various areas. In evolutionary computation, the Walsh
833: spectrum is a prominent way to analyze the structure of $p$, often
834: with the aim to transport it to $q$. The Walsh coefficients may also
835: be considered as a way of describing epistasis. In complex systems,
836: certain mutual information measures are often used to define the
837: structuredness (in their terms: complexity) of dynamics systems
838: \cite{langton:90,sporns-tononi:02}.
839: 
840: In these notes, I want to briefly review the information geometric way
841: to describe the structure of a distribution \cite{amari:99,amari:01}
842: and relate it to the field of evolutionary computation. The first step
843: is simply to present the coordinates introduced by Walsh coefficients
844: side-by-side with those used in information geometry to make them
845: comparable. This gives an intuition about the ``bases'' over which
846: distributions can be analyzed and reveals, for instance, that the
847: so-called Building-Block-Basis \cite{chryssomalakos-stephens:04}, as
848: introduced in Evolutionary Computation, is the same as Amari's
849: $\eta$-basis. Maybe Amari's $\t$-bases is most interesting in its
850: capabilities to precisely capture $k$th-order mutual dependencies. It
851: offers a notion of the ``order-spectrum of mutual information''
852: alternative to the Walsh spectrum. Eventually, Amari's formalism
853: allows to completely decompose any distribution into its different
854: $k$th-oder components.
855: 
856: Finally, the \emph{geometry} introduced over the space of
857: distributions by Amari gives very insightful interpretations of
858: distances between distributions. A Pythagoras theorem can be
859: formulated for the Kullback-Leibler divergence. Under some conditions,
860: minimizations of the Kullback-Leibler divergence can often be
861: interpreted as orthogonal projections. This offers a geometric view on
862: some evolutionary operators.
863: 
864: 
865: \section{Notations}
866: 
867: \paragraph{Distributions, $\log$-probabilities, and hypercube bases}
868: 
869: The most direct ``coordinate system'' that can be introduced on the
870: manifold of distributions is given by the probabilities $p(x)$ for all
871: $x\in \O$ itself. To preserve notational uniformity with other
872: coordinate systems we write these numbers as $p_x := p(x)$, which
873: means that $p_x$ is the $x$-th component of $p \in \L$ in the direct
874: basis. Because of the normalization constraint $\sum_x p_x = 1$, these
875: are only $|\O|-1$ independent coordinates.
876: 
877: Clearly, instead of using $p_x$ as coordinates, one can also use their
878: $\log$'s $l_x := -\log p_x$. Taking the log of probabilities is, very
879: roughly spoken, related to changing to entropic units. (Note the
880: definition of the entropy of $p$ as $H(p) = -\sum_x p_x \log p_x = {\rm
881:   E}_p \{l_x\}$.) Thus, coordinates that have some ``entropic
882: meaning'' (i.e., are related to information theoretic measures like
883: entropy, mutual information, or Kullback-Leibler divergence) will be
884: based on these log quantities. Namely, this will be the $\t$-coordinate
885: system introduced by Amari (see \citeNP{amari:99,amari:01}).
886: 
887: In the following we will speek of bases of coordinate systems.
888: Essentially, what we mean are basis functions, similar to the sine
889: and cosine in the Fourier transform. For illustration, we will always
890: think of $\O$ as the hypercube; the basis function then correspond to
891: ``colorings'' of the hypercube with function values (mostly $1$, $0$,
892: or $-1$). E.g., if $e_i:~ \O=\{0,1\}^3 \to \{1,0,-1\}$ is the $i$-th
893: basis function, then the $i$-th coordinate of a distributions $p$ in
894: this coordinate system is the convolution of $p$ with $e_i$: $p_i =
895: \<e_i,p\> := \sum_{x\in\O} e_i(x)\, p(x)$. We illustrate such basis
896: functions by 3D-hypercubes, \raisebox{-3mm}{\input{figtexs/samplecube}}
897: where the bullet corresponds to $1$, the circle to $-1$ and empty
898: vertices to $0$.
899: 
900: The basis of direct coordinate system is the $\d$-basis: the set of
901: all hypercubes where only one vertex is $1$ and all others are $0$.
902: 
903: 
904: \paragraph{Marginals over $k$-tuples of variables and schemata}
905: 
906: In the following, we will also need a compact notation for the
907: different marginals of a distribution. Let $\O$ be a product space
908: $\O=\O^1 \times \cdots \times \O^n$ such that we can define the
909: marginals of a distributions $p$ over single variables but also pairs,
910: triples, and $k$-tuples of variables. We use indices $i,j,..\in I=
911: \{1,..,n\}$ to indicate variables and write the marginals as $p^{ij..}$,
912: \BM\begin{align*}
913: p^{ij..}(a,b,..)=\Pr\{x_i=a,~ x_j=b,~ ..\} ~.
914: \end{align*}\EM
915: The set of all possible marginals is given by considering all single
916: indices $i$, all pairs $i<j$, all triples $i<j<k$, etc. To simplify
917: notation (e.g., summation over such objects), we collect all these
918: tuples of indices in a set
919: \BM\begin{align*}
920: A
921: &= I
922:   ~\cup~ \{ (i,j) ~|~ i<j \in I \}
923:   ~\cup~ \{ (i,j,k) ~|~ i<j<k \in I \}
924:   ~\cup~ \cdots ~\cup~ \{ (1,2,..,n) \} \\
925: &=\{1,..,n,~ (1,2),(1,3)..,(1,n),(2,3),(2,4)...,(n-1,n),~
926:     (1,2,3),..,~ (1,2,3,..,n)\} ~.
927: \end{align*}\EM
928: In that way, all marginals of $p$ are given as $p^a$ for $a \in
929: A$. Note that $|A|=|\O|-1$.
930: 
931: Besides using $a \in A$ to indicate a marginal, one can equivalently
932: use the schemata notation of length-$n$ strings in $\{\*,d\}^n$:
933: For a given $a$, the corresponding schema is the string of all
934: $\*$'s except for those positions indicated in the tuple $a$. E.g.,
935: for $n=6$:
936: \BM\begin{align*}
937: p^{245} \equiv p^{\*d\*dd\*}
938: \end{align*}\EM
939: 
940: 
941: 
942: \section{Walsh, $\eta$-, $\theta$-, Building Block, and Haar bases}
943: 
944: Table \ref{tabBases} captures the basics of the Walsh, $\eta$-,
945: $\theta$-, and Haar bases. In all cases, the coordinate system is
946: defined by the basis functions $e_i$ depicted for the 3D-case as
947: hypercubes. Actually, these 3D illustrations of the basis functions
948: $e_i$ are already sufficient to infer the basis functions for all $n$
949: since they are constructed in a very systematic way---which seems
950: obvious by simply looking at them and becomes rigorous by considering
951: the transformation matrices into these coordinates systems:
952: 
953: The transformation matrices map linearly (mod 2) from the direct
954: coordinates $p_x$ to the new coordinates. E.g., in the Walsh case,
955: $w_y = \sum_x W_{yx} p_x$. The rows in these matrices correspond to
956: the basis functions $e_y = W_{y\cdot}$. An important property is that
957: in all cases (except the Haar bases!), the transformation matrices can
958: be constructed by repeated tensor products of a 2D matrix. For
959: instance, for $n=2$ in the Walsh case:
960: \BM\begin{align*}
961: W^{n=2}
962:  = \matr{rrrr}{1&1&1&1\\1&-1&1&-1\\1&1&-1&-1\\1&-1&-1&1}
963:  = \matr{rr}{1&1\\1&-1} \otimes \matr{rr}{1&1\\1&-1}
964:  =: \matr{rr}{1&1\\1&-1}^{\!\!\otimes 2}
965: \end{align*}\EM
966: Here, we introduced the superscript notation ${}^{\otimes n}$ to
967: indicate the $n$-fold tensor product.
968: 
969: 
970: 
971: 
972: 
973: 
974: 
975: \newcommand{\inclfig}[1]{\begin{minipage}[t]{35mm}\raisebox{-43mm}{\input{figtexs/#1}}\end{minipage}}
976: 
977: 
978: \begin{table}
979: \begin{tabular}{@{}p{43mm}rr@{}}
980: {\bf Walsh} \newline
981:  $w_y = \sum_x W_{yx}\, p_x$\newline
982:  $p_x = \frac{1}{n}\, \sum_y W_{xy} w_y$\newline
983:  $W_{yx} = (-1)^{|x\, \AND y|}$\newline
984:  $~~~~  = \matr{cc}{\pl & \pl \\ \pl & \mi}^{\!\!\otimes n}$\newline
985:  $W^{-1} = \frac{1}{n}\, W$
986:  & \begin{tabular}[t]{@{}c@{\,}|@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}}
987:     & \rb{000} & \rb{001} & \rb{010} & \rb{011} & \rb{100} & \rb{101} & \rb{110} & \rb{111} \\
988: \hline
989: 000 & \pl & \pl & \pl & \pl & \pl & \pl & \pl & \pl \\
990: 001 & \pl & \mi & \pl & \mi & \pl & \mi & \pl & \mi \\
991: 010 & \pl & \pl & \mi & \mi & \pl & \pl & \mi & \mi \\
992: 011 & \pl & \mi & \mi & \pl & \pl & \mi & \mi & \pl \\
993: 100 & \pl & \pl & \pl & \pl & \mi & \mi & \mi & \mi \\
994: 101 & \pl & \mi & \pl & \mi & \mi & \pl & \mi & \pl \\
995: 110 & \pl & \pl & \mi & \mi & \mi & \mi & \pl & \pl \\
996: 111 & \pl & \mi & \mi & \pl & \mi & \pl & \pl & \mi \\
997: \end{tabular}
998:   &\inclfig{walsh} \\\hline
999: {\bf Amari's $\eta$ / BBB} \newline
1000:  $\eta_a = \sum_x \bar B_{ax} p_x$\newline
1001:  $~~~~ = \sum_x (B^{-1})^T_{ax} p_x$ \newline
1002:  $p_x = \sum_a B^T_{xa} \eta_a$\newline
1003:  $\bar B = (B^{-1})^T = \matr{cc}{\pl & \pl \\ \ze & \pl}^{\!\!\otimes n}$ \newline
1004:  $\bar B^{-1} = \matr{cc}{\pl & \mi \\ \ze & \pl}^{\!\!\otimes n}$
1005:  & \begin{tabular}[t]{@{}c@{\,}|@{\,}c@{\,}|@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}}
1006:   &   & \rb{000} & \rb{001} & \rb{010} & \rb{011} & \rb{100} & \rb{101} & \rb{110} & \rb{111} \\
1007: \hline
1008: $\cdot$ & $\*\*\*$ & \pl & \pl & \pl & \pl & \pl & \pl & \pl & \pl \\
1009: 3 & $\*\*1$  & \ze & \pl & \ze & \pl & \ze & \pl & \ze & \pl \\
1010: 2  & $\*1\*$ & \ze & \ze & \pl & \pl & \ze & \ze & \pl & \pl \\
1011: 23 & $\*11$ & \ze & \ze & \ze & \pl & \ze & \ze & \ze & \pl \\
1012: 1  & $1\*\*$ & \ze & \ze & \ze & \ze & \pl & \pl & \pl & \pl \\
1013: 13 & $1\*1$ & \ze & \ze & \ze & \ze & \ze & \pl & \ze & \pl \\
1014: 12 & $11\*$ & \ze & \ze & \ze & \ze & \ze & \ze & \pl & \pl \\
1015: 123 & $111$ & \ze & \ze & \ze & \ze & \ze & \ze & \ze & \pl \\
1016: \end{tabular}
1017:  &\inclfig{eta} \\\hline
1018: {\bf Amari's $\t$} \newline
1019:  $\t_a = \sum_x B_{ax} l_x$ \newline
1020:  $l_x = \sum_a \bar B^T_{xa} \t_a$ \newline
1021:  $B =  (\bar B^{-1})^T = \matr{cc}{\pl & \ze \\ \mi & \pl}^{\!\!\otimes n}$ \newline
1022:  $B^{-1} = \matr{cc}{\pl & \ze \\ \pl & \pl}^{\!\!\otimes n}$
1023:  & \begin{tabular}[t]{@{}c@{\,}|@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}}
1024:     & \rb{000} & \rb{001} & \rb{010} & \rb{011} & \rb{100} & \rb{101} & \rb{110} & \rb{111} \\
1025: \hline
1026: $\cdot$ & \pl & \ze & \ze & \ze & \ze & \ze & \ze & \ze \\
1027: 3   & \mi & \pl & \ze & \ze & \ze & \ze & \ze & \ze \\
1028: 2   & \mi & \ze & \pl & \ze & \ze & \ze & \ze & \ze \\
1029: 23  & \pl & \mi & \mi & \pl & \ze & \ze & \ze & \ze \\
1030: 1   & \mi & \ze & \ze & \ze & \pl & \ze & \ze & \ze \\
1031: 13  & \pl & \mi & \ze & \ze & \mi & \pl & \ze & \ze \\
1032: 12  & \pl & \ze & \mi & \ze & \mi & \ze & \pl & \ze \\
1033: 123 & \mi & \pl & \pl & \mi & \pl & \mi & \mi & \pl \\
1034: \end{tabular}
1035:  &\inclfig{theta} \\\hline
1036: {\bf Haar}\newline
1037: please see \cite{khuri:94}
1038:  & \begin{tabular}[t]{@{}c@{\,}|@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}}
1039:     & \rb{000} & \rb{001} & \rb{010} & \rb{011} & \rb{100} & \rb{101} & \rb{110} & \rb{111} \\
1040: \hline
1041: 000 & \pl & \pl & \pl & \pl & \pl & \pl & \pl & \pl \\
1042: 001 & \pl & \pl & \pl & \pl & \mi & \mi & \mi & \mi \\
1043: 010 & \pl & \pl & \mi & \mi & \ze & \ze & \ze & \ze \\
1044: 011 & \ze & \ze & \ze & \ze & \pl & \pl & \mi & \mi \\
1045: 100 & \pl & \mi & \ze & \ze & \ze & \ze & \ze & \ze \\
1046: 101 & \ze & \ze & \pl & \mi & \ze & \ze & \ze & \ze \\
1047: 110 & \ze & \ze & \ze & \ze & \pl & \mi & \ze & \ze \\
1048: 111 & \ze & \ze & \ze & \ze & \ze & \ze & \pl & \mi \\
1049: \end{tabular}
1050:  &\inclfig{haar}
1051: \end{tabular}
1052: \caption{\label{tabBases}
1053: Overview over the different bases for the space of distributions. The
1054: first column gives the definitions of the transformations and their
1055: inverse. Note that the $\t$-bases is defined in log-space. The 
1056: transformation matrices are illustrated in the section column
1057: for $n=3$ using the symbols $\pl =1$, $\mi=-1$, and $\ze=0$. The third
1058: column illustrates the bases functions $e_y$ (or $e_a$) as colorings
1059: of the hypercube $\{0,1\}^3$. Note that the basis functions
1060: correspond to rows of the transformation matrix. The 1-norm $|x\, \AND y|$ 
1061: of the \AND of two binary strings counts the 1-bits that they have in common.
1062: }
1063: \end{table}
1064: 
1065: 
1066: 
1067: Table \ref{tabBases} summarizes the most important properties of these
1068: transformation matrices: their closed form expression, their tensor
1069: product construction, and their inverse. When looking at the table one
1070: should first observe the self-similar regularity of the transformation
1071: matrices, which stems from their definition of repeated tensor
1072: products. The meaning of the various bases become more intuitive when
1073: looking at the hypercube illustrations of the basis. The Walsh bases,
1074: e.g., can nicely be compared to a Fourier basis: $e_{000}$ corresponds
1075: to the constant function $1$, $e_{001},e_{010},e_{100}$ could be view
1076: as sinus functions along the $x$-, $y$-, and $z$-axes, respectively;
1077: $e_{011},e_{101},e_{110}$ are products of sinus functions---and
1078: capture 2nd order dependencies; and $e_{111}$ is the ``highest
1079: frequency'' bases function capturing 3rd order dependencies.
1080: 
1081: The $\eta$-bases captures certain marginals relative to the all-1s string:
1082: \BM\begin{align*}
1083: \eta_a=p^a(11..) ~.
1084: \end{align*}\EM
1085: These can be thought of the marginals over all possible
1086: Building-Blocks---thus it is also called the Building-Block-Bases
1087: (BBB, cf.\ \citeNP{chryssomalakos-stephens:04}). This marginalization becomes apparent
1088: in the hypercube colorings as the abundance of zeros (non-colored
1089: vertices and dots in the matrix).
1090: 
1091: The $\t$-bases combines the ``frequency'' idea of the Walsh
1092: bases with the marginalization: The highest order bases function
1093: $e_{123}$ is analogous to the Walsh bases $e_{111}$ and detects
1094: highest order dependencies. Lower order dependencies though are only
1095: detected on a marginal.
1096: 
1097: However, note that the $\t$ bases is defined in log-space, $\t_a =
1098: \sum_x B_{ax} \log p_x$. We will find some implications of this in the next
1099: section. Note that the transformation matrices of the $\eta$-
1100: (Building-Block-) and the $\t$-bases are related via $B =  (\bar
1101: B^{-1})^T$.
1102: 
1103: For completeness, we also indicated the Haar bases in table
1104: \ref{tabBases}. It can not be derived as repeated tensor products and
1105: we do not discuss it any further here. One argument made about the
1106: Haar bases \cite{khuri:94} is that the transformation matrix incorporates
1107: a lot of 0s. Thus, the coefficients are more efficient to compute as
1108: the Walsh coefficients. We add here that the ratio of zeros in the
1109: $\eta$ and $\t$ transformation matrices is $1-(3/4)^{n-1}$ and
1110: approaches $1$ exponentially with the dimension $n$.
1111: 
1112: 
1113: 
1114: 
1115: 
1116: 
1117: 
1118: 
1119: 
1120: \section{Mathematical structure on the manifold $\L$}
1121: 
1122: In this section we want to develop a more geometric view on the
1123: manifold of distributions, following \cite{amari:99,amari:01}. This
1124: geometry will put a special emphasis on the $\eta$- and $\t$-bases.
1125: 
1126: 
1127: \paragraph{$m$- and $e$-geodesics}
1128: 
1129: An essential ingredient to describe the geometry of a manifold is the
1130: definition of the notion of ``straight lines'', or geodesics, connecting two
1131: points in the manifold. In the case of the manifold of distributions,
1132: there exist at least two ways of defining a straight path connecting two
1133: distributions $q$ and $r$: the one being the linear mixture in direct
1134: coordinates $p_x$, the other being the linear mixture in $\log$
1135: coordinates $l_x$,
1136: \BM\begin{align*}
1137: \text{$m$-geodesic:}\qquad& p(x) = (1\!-\!\a)\, q(x) + \a\, r(x) ~,\\
1138: \text{$e$-geodesic:}\qquad& \log p(x) = (1\!-\!\a)\, \log q(x) + \a\, \log r(x) - \psi(x) ~.
1139: \end{align*}\EM
1140: Here $m$ means \emph{mixture} and $e$ means \emph{exponential}. The
1141: additional term $\psi(x)$ in the $e$-geodesic is necessary to preserve
1142: the normalization of $p(x)$.
1143: 
1144: The fact that there exist two ways of defining geodesics means that
1145: there exist two meaningful \emph{affine connections} on the manifold.
1146: %(instead of only the Christoffel symbol derived from the Fisher
1147: %metric---the manifold is non-Riemannian).
1148: Both define a notion of
1149: flatness: we say that a $m$-geodesic is $m$-flat and a $e$-geodesic
1150: is $e$-flat.
1151: 
1152: It turns out that the coordinate lines (and planes, hyperplanes, etc.)
1153: of $\eta$ are $m$-flat and those of $\t$ are $e$-flat. The former is
1154: obvious, since an $m$-geodesic can equivalently be written in the
1155: $\eta$ coordinate system as $\eta_a(p) = (1\!-\!\a)\, \eta_a(q) + \a\,
1156: \eta_a(r)$. The second becomes apparent when realizing that the Taylor
1157: expansion of $\log p$ reads
1158: \BM\begin{align*}
1159: l_x
1160:  = \sum_i \t_i x_i
1161:  + \sum_{i<j} \t_{ij}\, x_i x_j
1162:  + \sum_{i<j<k} \t_{ijk}\, x_i x_j x_k
1163:  + \cdots + \t_{1..n}\, x_1..x_n - \psi
1164:  = \sum_{a\in A} \t_a X^a - \psi
1165: \end{align*}\EM
1166: where $X^a$ is the product of the components $x_{i_1} x_{i_2}\cdots x_{i_k}
1167: \in \{0,1\}$ when $a=(i_1,i_2,..,i_k)$. Thus, an $e$-geodesic is
1168: written, in the $\t$ coordinate system, simply as $\t_a(p) = (1\!-\!\a)\,
1169: \t_a(q) + \a\, \t_a(r)$.
1170: 
1171: 
1172: \paragraph{Fisher metric, Kullback-Leibler divergence}
1173: 
1174: On this manifold $\L$, there is a metric defined, the \emph{Fisher
1175:   metric}.  In \emph{arbitrary} coordinates $v_i$ (it could be any of
1176: the Walsh, log, $\eta$-, or $\t$-coordinates), it reads
1177: \BM\begin{align*}
1178: g_{ij}(p) = {\rm E}\left\{ \frac{\del \log p}{\del v_i}\, \frac{\del \log p}{\del v_j}\right\} ~.
1179: \end{align*}\EM
1180: Some intuition can be gained by realizing that, locally, the distance
1181: measured by the Fisher metric coincides with the distance measured by
1182: the Kullback-Leibler divergence:\footnote{ The Kullback-Leibler
1183:   divergence $\kld{p}{q}$ (also called relative entropy or divergence)
1184:   is a measure for the loss of information (or gain of entropy) when a
1185:   \emph{true} distribution $p$ is approximated by a model
1186:   distributions $q$. For example, when $p(x,y)$ is approximated by
1187:   $p(x)\,p(y)$ one looses information on the mutual dependence between
1188:   $x$ and $y$.  Accordingly, the relative entropy
1189:   $\kld{p(x,y)}{p(x)\,p(y)}$ is equal to the mutual information
1190:   between $x$ and $y$. Generally, when \emph{knowing} the real
1191:   distribution $p$ one needs on average $H(p)$ (entropy of $p$) bits to
1192:   describe a random sample. If, however, we know only an approximate
1193:   model $q$ we would need on average $H(p) + \kld{p}{q}$ bits to
1194:   describe a random sample of $p$.  The loss of knowledge about the
1195:   true distribution induces an increase of entropy and thereby an
1196:   increase of average description length for random samples. }
1197: Consider a point $p \in \L$ and a nearby point $p+\d p$. When we
1198: measure the squared length $\<\d p,\d p\>$ of the variation $\d p$ by
1199: the Kullback-Leibler divergence we find \BM\begin{align*} \<\d p,\d
1200:   p\> = \kld{p}{p+\d p} = {\rm E}\left\{ \log p - \log (p+\d
1201:     p)\right\} \ddot=~ {\rm E}\left\{- \frac{\d p}{p} + \frac{\d
1202:       p^2}{p^2} \right\} = {\rm E}\left\{\frac{\d p^2}{p^2} \right\}
1203:   ~.
1204: \end{align*}\EM
1205: Here, the 2nd-order approximation stems from the Taylor expansion of
1206: $\log(p+\d p)$ and ${\rm E}\{\d p/p\} =0$ since $\sum_x \d p(x)=0$ to
1207: preserve normalization. Note that, in this infinitesimal neighborhood,
1208: the Kullback-Leibler divergence becomes symmetric. Generalizing this
1209: to two small variations $\d_1 p= \del_{v_i} p := \frac{\del p}{\del v_i}$ and $\d_2
1210: p= \del_{v_j} p := \frac{\del p}{\del v_j}$ induced by small shifts along some
1211: coordinates we have
1212: \BM\begin{align*}
1213: \<\del_{v_i} p,\del_{v_j} p\>
1214:  =  {\rm E}\left\{\frac{\del_{v_i} p}{p}\, \frac{\del_{v_i} p}{p} \right\}
1215:  =  {\rm E}\left\{\frac{\del \log p}{\del v_i}\, \frac{\del \log p}{\del v_i} \right\}
1216: \end{align*}\EM
1217: and retrieve the Fisher metric. In turn, the Fisher metric can also be derived by considering the second order derivatives of the Kullback-Leibler divergence:
1218: \BM\begin{align*}
1219: g_{ij}(q) = \frac{1}{2}\, \frac{\del}{\del v_i}\frac{\del}{\del v_j} \kld{p}{p+\d v}\Big|_{\d v=0} ~.
1220: \end{align*}\EM
1221: 
1222: 
1223: \paragraph{Orthogonality of $\eta$ and $\t$, the Pythagoras}
1224: 
1225: The coordinate systems $\eta$ and $\t$ have a crucial property w.r.t.\ 
1226: the Fisher metric---they are mutually orthogonal: At any point $p$ in
1227: the manifold the variations induced by shifts along $\t$ and $\eta$
1228: coordinates fulfill
1229: \BM\begin{align*}
1230: \<\del_{\t_a} p, \del_{\eta_b} p\> = \d_{ab} ~,
1231: \end{align*}\EM
1232: where $\d_{ab}$ is the Kronecker delta. Based on this one can derive a
1233: Pythagoras theorem: Let $p$, $r$ and $q$ be three distributions where
1234: the $m$-geodesic connecting $p$ and $r$ is orthogonal to the
1235: $e$-geodesic connecting $r$ and $q$, then
1236: \BM\begin{align*}
1237: \kld{p}{q} = \kld{p}{r} + \kld{r}{q} ~.
1238: \end{align*}\EM
1239: Please figure \ref{figPy} for an illustration.
1240: 
1241: \begin{figure}\center
1242: \input figtexs/pythagoras
1243: \caption{\label{figPy}
1244: The Pythagoras in the case when a certain $k$-cut is used to define
1245: the $m$- and $e$ geodesics connecting to $r$, respectively $r'$. It
1246: holds: $\kld{p}{q} = \kld{p}{r} + \kld{r}{q}$ and $\kld{q}{p} = \kld{q}{r'} + \kld{r'}{p}$.
1247: }
1248: \end{figure}
1249: 
1250: 
1251: \paragraph{$k$-cuts}
1252: 
1253: Let $k$ denote an order of interactions that we are interested in.
1254: Then, the coordinates split into those describing interactions of
1255: order $\le k$ and those describing interactions of order $> k$,
1256: \BM\begin{align*}
1257: \vec \eta_k &:= (\text{all $\eta_a$ of order $|a| \le k$}) ~,\\
1258: \vec \t_{k^*} &:= (\text{all $\t_a$ of order $|a| >k$}) ~.
1259: \end{align*}\EM
1260: 
1261: These can be mixed into a new coordinate system $(\vec \eta_k,\vec
1262: \t_{k^*})$. The point is that those dimensions spanned by $\vec
1263: \eta_k$ are orthogonal to those spanned by $\vec \t_{k^*}$. To
1264: simplify the discussion we call $\vec \eta_k$ \emph{marginals}
1265: (although they include marginals over $k$-tuples of variables) and
1266: $\vec\t_{k^*}$ \emph{higher order interactions}. Keeping the marginals
1267: $\vec \eta_k$ constant defines $m$-flat sub-manifolds $M_k(\eta_k)$,
1268: which are disjoint for different $\vec \eta_k$ and cover all $\L$.
1269: Keeping higher order interactions $\vec \t_{k^*}$ constant defines
1270: $e$-flat sub-manifolds $E_{k^*}(\t_{k^*})$, which are disjoint for
1271: different $\vec \t_{k^*}$ and cover all $\L$.
1272: 
1273: 
1274: 
1275: \paragraph{Complete decomposition of different order interactions}
1276: 
1277: Given a distribution $p$, we define its $k$th order reduction
1278: $p^{(k)}$ as the distribution with same marginals $\vec \eta_k(p)$ as
1279: $p$ but vanishing higher order interactions $\vec \t_{k^*}=0$,
1280: %
1281: \BM\begin{align*}
1282:   p^{(k)} = (\vec \eta_k(p),\vec \t_{k^*}=0) ~.
1283: \end{align*}\EM
1284: %
1285: That is, $p^{(k)}$ is the same distributions as $p$ except that all
1286: interactions of order $>k$ have been canceled. We call $p^{(k)}$ the
1287: $k$th-order reduction of $p$. Given the Pythagoras it should be clear
1288: that $p^{(k)}$ can also be defined as the orthogonal projection of $p$
1289: onto the submanifold $E_{k^*}(0)$ or as the orthogonal projection of
1290: the uniform distribution $p^{(0)}$ onto $M_k(\vec\eta_k(p))$, please
1291: see figure \ref{figDecomp} left,
1292: \BM\begin{align*}
1293: p^{(k)}
1294:  = \argmin{q \in E_{k^*}(0)} \kld{p}{q}
1295:  = \argmin{q \in M_k(\vec\eta_k(p))} \kld{q}{p^{(0)}} ~.
1296: \end{align*}\EM
1297: Further, define $D_k(p) = \kld{p^{(k)}}{p^{(k-1)}}$. Then the
1298: Pythagoras allows to decompose the mutual information $I(p)$ in $p$
1299: (i.e., the measure of all interactions in $p$) into a sum of different
1300: order interactions:
1301: \BM\begin{align*}
1302: I(p) = \kld{p}{p^{(1)}} = \sum_{k=2}^n D_k(p)
1303: \end{align*}\EM
1304: Please see figure \ref{figDecomp} right for an illustration.
1305: 
1306: This result should be highlighted. The given formalism allows to
1307: explicitly distinguish different order interactions between variables
1308: in a distribution and directly assigns coordinates $\t$ to those
1309: different order interactions. The quantities $D_k(p) =
1310: \kld{p^{(k)}}{p^{(k-1)}}$ measure precisely and only the $k$th-order
1311: interactions in entropic units.
1312: 
1313: For instance, consider three random variables $X_1,\, X_2,\, X_3$
1314: which are pair-wise dependent in the sense $I(X_i|X_j) \not=0$. The
1315: question is whether there exist ``true'' 3rd-order interactions or only
1316: concatenated 2nd-order interactions---in other terms, can they be
1317: described by a Markov process $X_1 \to X_2 \to X_3$. The formalism
1318: gives an answer: if $D_3(p)=0$ it is a Markov process, otherwise there
1319: exist 3rd-order interactions.
1320: 
1321: \begin{figure}\center
1322: \input{figtexs/decomp}\hfill
1323: \input figtexs/cuts
1324: \caption{\label{figDecomp}
1325:   The left figure illustrates a distribution $p$ and its $k$th-order
1326:   reduction $p^{(k)}$: It is the orthogonal projection of $p$ along
1327:   $M_k(p)$ onto $E_{k^*}(0)$. The ``distance'' $D(p:p^{(k)})$ measures
1328:   ``norm'' of $\vec\t_{k^*}$, i.e., it measures the amount of mutual
1329:   information of order higher than $k$. The right figure illustrates
1330:   the complete decomposition of $p$ in reductions $p^{(k)}$ of all
1331:   orders. Every projection from $p^{(k)}$ to $p^{(k-1)}$ is an
1332:   orthogonal projection onto $E_{(k-1)^*}(0)$. Every ``distance''
1333:   $D(p^{(k)}:p^{(k-1)})$ measures the mutual information specifically
1334:   of order $k$. }
1335: \end{figure}
1336: 
1337: 
1338: 
1339: 
1340: 
1341: 
1342: \section{Geometric view on evolution operators}
1343: 
1344: \paragraph{Crossover}
1345: 
1346: In Evolutionary Algorithms, crossover is one means of mixing a parent
1347: population to an offspring population. Populations can be formalized
1348: as distributions $p$ and a definition of a simple form of crossover
1349: (uniform crossover parameterized with $c\in\RRR$) reads
1350: \BM\begin{align*}
1351: \CC p &= (1-c)\, p + c\, p^{(1)} ~.
1352: \end{align*}\EM
1353: See, for instance, \cite{toussaint:03-gecco-cross} for a general
1354: definition of a crossover operator in more conventional notation and
1355: details of when it reduces to this simple form.
1356: 
1357: This crossover simply mixes the original distribution (or population)
1358: $p$ with its $1$st-order reduction. The $1$st-order reduction is the
1359: product of all single variable marginals, i.e., it is the distribution
1360: with the same marginals (gene frequencies) as $p$ but all dependencies
1361: (gene linkages) between the variables eliminated. From the geometrical
1362: point of view, crossover makes a step along the $m$-geodesic
1363: connecting $p$ and $p^{(1)}$. It can be illustrated as a step along
1364: the projection onto the submanifold $E_{1^*}(p)$, please see figure
1365: \ref{figCross}.
1366: 
1367: 
1368: From this view it becomes clear that a reasonable coordinate system to
1369: describe crossover is $(\vec\eta_1,\vec\t_{1^*})$. Crossover does not
1370: change $\vec\eta_1$ (it operates orthogonally to $\eta_1$) but
1371: continuously reduces the $\vec\t_{1^*}$ variables. That $\vec\t_{1^*}$
1372: are reduced and not increased is intuitive from figure \ref{figCross}
1373: (recall that $\t$'s are always positive) and becomes apparent from that
1374: the ``distance'' from $p$ to $p^{(1)}$, $I(p)=\kld{p}{p^{(1)}}$, is a
1375: norm of $\vec\t_{1^*}$.  \hide{ More explicitly, consider the
1376:   derivative of the $\t$-coordinates along the path, i.e., with
1377:   varying $c$, \BM\begin{align*}
1378:     \frac{\del}{\del c} p_x(c) &= p_x^{(1)} - p_x \\
1379:     \frac{\del}{\del c} \t_a(c) &= \sum_x B_{ax}\, \frac{\del}{\del c}
1380:     l_x(c) = \sum_x B_{ax}\, \frac{p_x^{(1)}-p_x}{p_x(c)}
1381: \end{align*}\EM
1382: 
1383: -- In the $\O=\{0,1\}^2$ two gene case we have
1384: \BM\begin{align*}
1385: & |a|\le 1 \To \eta_a(c) = \eta_a \\
1386: &\eta_{12}(c) = (1-c)\, \eta_{12} + c\, \eta_1\, \eta_2
1387: \end{align*}\EM
1388: with crossover probability $c$. Further
1389: \BM\begin{align*}
1390: \dot\eta_{12}
1391:  &= -\eta_{12} + \eta_1\, \eta_2 \\
1392: \t_{12}
1393:  &= \sum_x B_{(12)x} \log \sum_a B^T_{xa} \eta_a \\
1394: \dot \t_{12}
1395:  &= \sum_x B_{(12)x} \frac{1}{p_x}[\sum_a B^T_{xa} \dot\eta_a]
1396:   = \sum_x B_{(12)x} \frac{B^T_{x(12)} \dot\eta_{12}}{p_x}
1397:   = \dot\eta_{12} \sum_x \frac{B_{(12)x} B^T_{x(12)} }{p_x}
1398:   = \dot\eta_{12} \sum_x \frac{1}{p_x}
1399: \end{align*}\EM
1400: where the $\dot{}$ means the derivative $\del_c$ at $c=0$. Since, on
1401: the path $c\in [0,1]$, $\dot\eta_{12}$ is constant and does not change
1402: the sign, $\t_{12}$ monotonously approaches zero.
1403: }
1404: 
1405: \begin{figure}\center
1406: \input{figtexs/cross}
1407: \caption{\label{figCross}
1408:   Crossover is an operator that takes a step along the projection of
1409:   $p$ towards the first order reduction $p^{(1)}$.}
1410: \end{figure}
1411: 
1412: \paragraph{Max Entropy}
1413: 
1414: \citeN{wright-et-al:04} recently proposed an evolutionary search
1415: scheme that constructs the new search distribution (offspring
1416: population) via a maximum entropy principle: From the parent
1417: population all second order scheme frequencies are calculated. Then,
1418: from all the distributions which have the same second order schema
1419: frequencies, the new offspring distribution is the one with maximum
1420: entropy.
1421: 
1422: In our formalism, constraining the schema frequencies corresponds to
1423: fixing $\vec \eta_2$, i.e., constraining the offspring distribution to
1424: the submanifold $M_2(\vec \eta_2)$. The distribution with maximal
1425: entropy in $M_2(\vec \eta_2)$ must have minimal higher order
1426: (3rd-order or higher) interactions $\vec\t_{2^*}$ since interactions
1427: (mutual information) reduce entropy. Thus, the max entropy rule simply
1428: amounts to setting $\vec\t_{2^*}=0$, i.e., choosing
1429: $p^{(2)}=(\vec\eta_2,0)$ as the new offspring distribution.
1430: 
1431: Again, this can be viewed geometrically as the orthogonal projection
1432: of the parent population $p$ onto $E_{2^*}(0)$ according to
1433: \BM\begin{align*}
1434: \argmin{q \in E_{2^*}(0)} \kld{p}{q}
1435: \end{align*}\EM
1436: or as the orthogonal projection of the uniform distribution $p^{(0)}$
1437: onto $M_2(\vec\eta_2)$
1438: \BM\begin{align*}
1439: \argmin{q \in M_2(\vec\eta_2)} \kld{q}{p^{(0)}} ~.
1440: \end{align*}\EM
1441: This latter way of writing the max entropy principle is quite
1442: intuitive: find the distribution that fulfills the required constraints
1443: (lies on $M_2(\vec\eta_2)$) but is closest to the uniform distribution
1444: $p^{(0)}$.
1445: 
1446: Eventually, note the strong analogy of the maximum entropy principle
1447: proposed by \cite{wright-et-al:04} and the simple crossover operator
1448: given before: Crossover moves $p$ toward $p^{(1)}$, while the search
1449: heuristic considered by Wright et.\ al.\ chooses $p^{(2)}$ as the new
1450: search distribution.
1451: 
1452: 
1453: 
1454: \section{Discussion}
1455: 
1456: The methods information geometry provides to analyze and describe the
1457: structure of distributions are deeply grounded in information
1458: theory. For instance, it seems very beneficial to have coordinate
1459: systems for distributions which capture precisely arbitrary $k$th
1460: order interactions between variables and have a direct link to
1461: measures like mutual information and the Kullback-Leibler
1462: divergence. Also the geometric aspects, e.g., that some operations
1463: can be described as orthogonal to certain submanifolds, add to a more
1464: comprehensive picture of the space of distributions. In that sense,
1465: information geometric methods enhance more common approaches in
1466: Evolutionary Computation, like the Walsh bases, in describing the
1467: structure of distributions and operators.
1468: 
1469: However, the question remains how and whether these methods can be
1470: used to (1) actually propose new heuristic search algorithms or (2) to
1471: provide new theoretical tools to analyze the dynamics of evolutionary
1472: processes.
1473: 
1474: 
1475: \subsection*{Acknowledgment}
1476: 
1477: I would like to thank the German Research Foundation (DFG) for their
1478: funding of the Emmy Noether fellowship TO 409/1-1.
1479: 
1480: \footer\small
1481: \bibliography{/cygdrive/c/home/tex/bibs}
1482: \end{document}
1483: