1: %------------------------------------------------------------------------------
2: % standard
3: \newcommand{\mytex}{}
4: \newcommand{\stdpackages}{
5: \usepackage{amsmath}
6: \usepackage{amssymb}
7: \usepackage{amsfonts}
8: \allowdisplaybreaks
9: \usepackage{amsthm}
10: \usepackage{eucal}
11: %\usepackage{\mytex ntheorem}
12: %\usepackage{\mytex calrsfs}
13: %\usepackage{\mytex calligra}
14: \usepackage{graphicx}
15: \usepackage{color}
16: %\usepackage{psfrag}
17: \usepackage{multicol}
18: \usepackage{fancyhdr}
19: \renewcommand{\headrulewidth}{.0pt}\renewcommand{\footrulewidth}{.0pt}\cfoot{}
20: %\setlength{\headsep}{10mm}
21: \fancyhead[OL]{\it\theauthor---\today}
22: %\fancyhead[OL]{\rightmark}
23: \fancyhead[ER]{\leftmark}
24: \fancyhead[OR,EL]{\thepage}
25: \fancyfoot[EL,OR]{}
26:
27: \newcommand{\draft}{\usepackage[light,first]{draftcopy}\draftcopyName{draft}{350}}
28: \newcommand{\labels}{\usepackage{\mytex showlabels}}
29: \newcommand{\maple}{\usepackage{maple2e}}
30: \newcommand{\makeidx}{\usepackage{makeidx}\makeindex}
31: \newcommand{\chicago}{\usepackage{\mytex chicago}\bibliographystyle{\mytex chicago}
32: \renewcommand{\refname}{References\thispagestyle{empty}\renewcommand{\refname}{}}}
33: \newcommand{\numberlines}{
34: \usepackage[mathlines,modulo]{\mytex lineno} %options: pagewise, modulo, mathlines
35: \newcommand{\BM}{\begin{linenomath}}
36: \newcommand{\EM}{\end{linenomath}}
37: \linenumbers
38: \modulolinenumbers[5]
39: }%\newcommand{\BM}{}\newcommand{\EM}{}
40: \newcommand{\pdflatex}{
41: \definecolor{bluecol}{rgb}{0,0,.5}
42: \definecolor{greencol}{rgb}{0,.6,0}
43: \usepackage[
44: pdftex,
45: % letterpaper,
46: bookmarks,
47: bookmarksnumbered,
48: colorlinks,
49: urlcolor=bluecol,
50: citecolor=bluecol,
51: linkcolor=bluecol,
52: pagecolor=bluecol,
53: pdfborder={0 0 0},
54: % backref, %link from bibliography back to sections
55: % pagebackref, %link from bibliography back to pages
56: % pdfstartview=FitH, %fitwidth instead of fit window
57: pdfpagemode=None, %UseOutlines, %bookmarks are displayed by acrobat
58: % pdftitle={\thetitle},
59: pdfauthor={Marc Toussaint}
60: ]{hyperref}
61: \DeclareGraphicsExtensions{.jpg,.pdf}
62: \renewcommand{\r}{\varrho}
63: \renewcommand{\l}{\lambda}
64: \renewcommand{\L}{\Lambda}
65: \renewcommand{\s}{\sigma}
66: \renewcommand{\O}{\Omega}
67: \renewcommand{\SS}{{\cal S}}
68: \renewcommand{\boldsymbol}{}
69: %\renewcommand{\Chapter}{\chapter}
70: %\renewcommand{\Subsection}{\subsection}
71: }
72: }
73: \newcommand{\stdtheorems}{
74: \theoremstyle{plain}
75: \newtheorem{theorem}{Theorem}[section]
76: \newtheorem{lemma}[theorem]{Lemma}
77: \newtheorem{corollary}[theorem]{Corollary}
78: \newtheorem{proposition}{Proposition}[section]
79: \newtheorem{result}{Result}[section]
80: \newtheorem{hypothesis}{Hypothesis}[section]
81: \theoremstyle{definition}
82: \newtheorem{definition}{Definition}[section]
83: \theoremstyle{remark}
84: \newtheorem{remark}{Remark}[section]
85: \newtheorem{example}{Example}[section]
86: }
87: \newcommand{\stdstyle}[1]{
88: \stdpackages
89: \stdtheorems
90: \renewcommand{\labelenumi}{\textbf{(\roman{enumi})}}
91: \renewcommand{\theenumi}{(\roman{enumi})} %for ref
92: %\renewcommand{\labelenumi}{${}^{\bf (\roman{enumi})}$}
93: %\renewcommand{\labelitemi}{\bf $\cdot$}
94: \newcommand{\itemdot}{\renewcommand{\labelitemi}{\bf $\cdot$}}
95: \newcommand{\enumA}{\renewcommand{\labelenumi}{\textbf{\Alph{enumi}}}}
96: \newcommand{\blockindent}{3ex}
97: \renewcommand{\baselinestretch}{#1}
98: \renewcommand{\arraystretch}{1.2}
99: %\renewcommand{\textfloatsep}{3ex}
100: %\setlength{\mathindent}{2.5em}
101: %\setlength{\jot}{0pt} %zwischen den math zeilen
102: %\setlength{\abovedisplayskip}{-10pt}
103: %\setlength{\belowdisplayskip}{-10pt}
104: %\setlength{\mathsurround}{-10pt}
105: %\renewcommand{\floatsep}{-1ex}
106: \renewcommand{\topfraction}{1}
107: \renewcommand{\bottomfraction}{1}
108: \renewcommand{\textfraction}{0}
109: \columnsep 5ex
110: \parindent 3ex
111: \parskip 1ex
112:
113: % Lists and paragraphs
114: \parindent 0pt
115: \topsep 4pt plus 1pt minus 2pt
116: \partopsep 1pt plus 0.5pt minus 0.5pt
117: \itemsep 2pt plus 1pt minus 0.5pt
118: \parsep 2pt plus 1pt minus 0.5pt
119: \parskip .5pc
120:
121: \setcounter{tocdepth}{3}
122: \setcounter{secnumdepth}{3}
123:
124: \usepackage{\mytex geometry}
125: \geometry{a4paper,hdivide={35mm,*,35mm},vdivide={35mm,*,35mm}}
126:
127: %\usepackage{layout}\layout
128:
129: %\thispagestyle{fancy}
130: %\pagestyle{fancy}
131:
132: \renewenvironment{abstract}
133: {\vspace*{5ex}\begin{rblock}\hrule\vspace{2ex}{\bf Abstract.~}\small}
134: {\vspace{3ex}\hrule\end{rblock}\vspace{5ex}}
135: \usepackage{\mytex mt}
136: }
137: \newcommand{\cleardefs}{
138: \renewcommand{\article}[2]{}
139: \renewcommand{\book}[2]{}
140: \renewcommand{\draft}{}
141: \renewcommand{\labels}{}
142: \renewcommand{\maple}{}
143: \renewcommand{\makeidx}{}
144: \renewcommand{\chicago}{}
145: \renewcommand{\pdflatex}{}
146: \renewcommand{\header}{}
147: }
148:
149: % A0 1189 x 841 mm 1,000 qm
150: % A1 841 x 594 mm 0,500 qm
151: % A2 594 x 420 mm 0,25O qm
152: % A3 420 x 297 mm 0,125 qm
153: % A4 297 x 210 mm 0,063 qm
154: % A5 210 x 148 mm 0,032 qm
155: % A6 148 x 105 mm 0,016 qm
156: % A7 105 x 74 mm 0,008 qm
157: % A8 74 x 52 mm 0,004 qm
158: % A9 37 x 52 mm 0,002 qm
159: % A10 26 x 37 mm 0,001 qm
160: % B0 1414 x 1000 mm 14.140 qcm
161: % B1 1000 x 707 mm 7.070 qcm
162: % B2 707 x 500 mm 3.535 qcm
163: % B3 500 x 353 mm 1.765 qcm
164: % B4 353 x 250 mm 882 qcm
165: % B5 250 x 176 mm 440 qcm
166: % B6 176 x 125 mm 220 qcm
167: % C0 1297 x 917 mm 11.894 qcm
168: % C1 917 x 648 mm 5.942 qcm
169: % C2 648 x 458 mm 2.968 qcm
170: % C3 458 x 324 mm 1.484 qcm
171: % C4 324 x 229 mm 742 qcm
172: % C5 229 x 162 mm 371 qcm
173: % C6 162 x 115 mm 186 qcm
174: % C7 115 x 81 mm 93 qcm
175:
176:
177: %------------------------------------------------------------------------------
178: % classes
179:
180: \newcommand{\article}[2]{
181: \documentclass[#1pt,twoside,fleqn]{article}
182: \stdstyle{#2}
183: \macros
184: \newcommand{\mytitle}{
185: \thispagestyle{empty}
186: \mbox{~}
187: \begin{list}{}{\leftmargin6ex \rightmargin6ex \topsep0ex \parsep3ex}\item[]
188: \begin{center}
189: {\LARGE\bf \thetitle \\}
190:
191: \vspace{5ex}
192: {\large \theauthor}
193:
194: {\footnotesize{\sl \address}\\ \email}
195:
196: {\footnotesize \today}
197:
198: \vspace{1ex}
199: {\small \published}
200: \end{center}
201: \end{list}
202: \renewcommand{\mytitle}{\chapter{\thetitle}}
203: }
204: }
205: \newcommand{\nips}{
206: \documentclass{article}
207: \usepackage{\mytex nips2003e,times}
208: \stdpackages\macros
209: }
210: \newcommand{\ijcnn}{
211: \documentclass[10pt,twocolumn]{\mytex ijcnn}
212: %\documentclass[10pt,twocolumn]{article}\usepackage{\mytex wcci}
213: \stdpackages\macros
214: \bibliographystyle{abbrv}
215: }
216: \newcommand{\springer}{
217: \documentclass{\mytex springer_llncs}
218: \renewcommand{\theenumi}{\alph{enumi}}
219: \renewcommand{\labelenumi}{(\alph{enumi})}
220: \renewcommand{\labelitemi}{$\bullet$}
221: \stdpackages\macros
222: }
223: \newcommand{\foga}{
224: \documentclass{article}
225: \stdpackages\macros
226: \usepackage{\mytex foga-02}
227: \usepackage{\mytex chicago}
228: \bibliographystyle{\mytex foga-chicago}
229: }
230: \newcommand{\book}[2]{
231: \documentclass[#1pt,twoside,fleqn]{book}
232: \newenvironment{abstract}{\begin{rblock}{\bf Abstract.~}\small}{\end{rblock}}
233: \stdstyle{#2}
234: %\renewcommand{\thechapter}{\Roman{chapter}}
235: \newcommand{\mytitle}{
236: \thispagestyle{empty}
237: \mbox{~}
238: \begin{list}{}{\leftmargin4ex \rightmargin4ex \topsep10ex \parsep3ex}\item[]
239: \begin{center}
240: {\LARGE \thetitle \\}
241:
242: \vspace{8ex}
243: {\large \theauthor}
244:
245: {\footnotesize{\sl \address}\\ \email}
246:
247: {\footnotesize \today}
248:
249: \vspace{1ex}
250: {\small \published}
251: \end{center}
252: \end{list}
253: \renewcommand{\mytitle}{\chapter{\thetitle}}
254: }
255: \macros
256: }
257:
258: \newcommand{\slides}{
259: \documentclass[fleqn]{article}
260: \stdpackages
261: \stdtheorems
262: \renewcommand{\baselinestretch}{1}
263: \renewcommand{\arraystretch}{1.2}
264:
265: \usepackage{\mytex geometry}
266: \geometry{
267: a4paper,landscape,
268: headheight=30mm,
269: headsep=0mm,
270: footskip=5mm,
271: hdivide={10mm,*,10mm},vdivide={30mm,*,8mm}}
272:
273: \columnsep 0mm
274: \columnseprule 0pt
275: \parindent 0ex
276: \parskip 0ex
277: \setlength{\itemsep}{8ex}
278: \renewcommand{\labelitemi}{\rule[.4ex]{.6ex}{.6ex}~}
279:
280: \pagestyle{fancy}
281: \renewcommand{\headrulewidth}{0pt} %1pt}
282: \renewcommand{\footrulewidth}{0pt}
283: \renewcommand{\labelenumi}{\textbf{\arabic{enumi}.}~~}
284: \newcommand{\theauthor}{Marc Toussaint}
285: \rhead{}
286: \lhead{}
287: \rfoot{\thepage}
288:
289: \definecolor{grey}{rgb}{.9,.9,.9}
290: \newcommand{\inverted}{
291: \definecolor{main}{rgb}{1,1,1}
292: \color{main}
293: \pagecolor[rgb]{.3,.3,.3}
294: }
295:
296: \macros
297:
298: \newcommand{\mytitle}{\huge\sf}
299: }
300:
301: \newenvironment{titleslide}[2][30mm]{
302: \onecolumn
303: \lhead{{{\Huge\textsf{\quad#2}}\\}}
304: \begin{center}\begin{list}{\labelitemi}{\leftmargin#1 \rightmargin#1
305: \labelsep1ex \labelwidth3ex \topsep0pt}\item[]
306: ~\vfill
307: \begin{center}
308: {\Huge\sc \thetitle}\\[3ex]
309: \theauthor\\{\Large\ini}\\{\Large\email}
310: \end{center}
311: ~\vfill
312: }{
313: ~\vfill
314: \end{list}\end{center}
315: }
316:
317: \newenvironment{slide}[2][30mm]{
318: \onecolumn
319: \lhead{{{\Huge\textsf{#2}}\\}}
320: %\setlength{\unitlength}{1mm}
321: %\begin{picture}(0,0)(20,-34)
322: %\put(0,-35){\color{grey}\rule{296mm}{30mm}}
323: %\put(0,-214){\color{grey}\rule{296mm}{10mm}}
324: %\end{picture}
325: \begin{center}\begin{list}{\labelitemi}{\leftmargin#1 \rightmargin#1
326: \labelsep1ex \labelwidth3ex \topsep0pt}\huge\sf\item[]%\vfill
327: }{
328: %\vfill
329: \end{list}\end{center}
330: }
331:
332: \newenvironment{slidetwo}[2][15mm]{
333: \twocolumn
334: \lhead{{{\Huge\textsf{#2}}\\}}
335: \begin{center}\begin{list}{\labelitemi}{\leftmargin#1 \rightmargin#1
336: \labelsep1ex \labelwidth3ex \topsep0pt}\huge\sf\item[]%\vfill
337: }{
338: %\vfill
339: \end{list}\end{center}
340: }
341: %\newcommand{\slidebreak}{\vfill\pagebreak\item[]\vfill}
342: \newcommand{\slidebreak}{\pagebreak\item[]}
343:
344: \newcommand{\poster}{
345: \documentclass[fleqn]{article}
346: \stdpackages
347: \renewcommand{\baselinestretch}{1}
348: \renewcommand{\arraystretch}{1.8}
349:
350: \usepackage{\mytex geometry}
351: \geometry{
352: paperwidth=1189mm,
353: paperheight=841mm, %841mm, %91.3cm, % 120cm
354: % landscape,
355: headheight=0mm,
356: headsep=0mm,
357: footskip=0mm,
358: hdivide={5mm,*,5mm},vdivide={5mm,*,5mm}}
359:
360: %\textwidth 86.3cm % Paper=91.3cm
361: %\textheight 108cm % Paper=???, banner=5cm
362: %\oddsidemargin 0pt
363: %\parindent 0pt
364: %\parskip 0pt
365: %\topmargin 1cm
366: %\footskip 0pt
367: %\headheight 0pt
368: %\headsep 0pt
369:
370: \setlength{\columnsep}{0ex}
371: \columnseprule 3pt
372: \renewcommand{\labelitemi}{\rule[.4ex]{.6ex}{.6ex}~}
373:
374: \pagestyle{fancy}
375: \renewcommand{\headrulewidth}{0pt}
376: \renewcommand{\footrulewidth}{0pt}
377: \renewcommand{\labelenumi}{\textbf{(\roman{enumi})}}
378: \newcommand{\theauthor}{Marc Toussaint}
379: \rhead{}
380: \lhead{}
381: \rfoot{}
382:
383: \definecolor{grey}{rgb}{.9,.9,.9}
384: \newcommand{\inverted}{
385: \definecolor{main}{rgb}{1,1,1}
386: \color{main}
387: \pagecolor[rgb]{.3,.3,.3}
388: }
389:
390: \macros
391: }
392: \newenvironment{postersection}[1]{
393: \vspace{1cm}
394: \section{#1}
395: \begin{list}{\labelitemi}{\leftmargin4ex \rightmargin3ex
396: \labelsep1ex \labelwidth2ex \topsep0pt \parsep2ex}\item[]
397: }{
398: \end{list}
399: }
400:
401:
402: %------------------------------------------------------------------------------
403: % title page
404:
405: \author{Marc Toussaint}
406:
407: \newcommand{\inilogo}[1][.25]{\includegraphics[scale=#1]{\mytex INI}}
408: \newcommand{\rublogo}[1][.25]{\includegraphics[scale=#1]{\mytex RUB}}
409:
410: \newcommand{\addressCologne}{
411: Institute for Theoretical Physics\\
412: University of Cologne\\
413: 50923 K\"oln---Germany\\
414: {\tt mt@thp.uni-koeln.de}\\
415: {\tt www.thp.uni-koeln.de/\~{}mt/}
416: }
417:
418: \newcommand{\ini}{Institut f\"ur Neuroinformatik, Ruhr-Universit\"at Bochum, Germany}
419: \newcommand{\homepageINI}{\texttt{www.neuroinformatik.rub.de/PEOPLE/mt/}}
420: \newcommand{\emailINI}{mt@neuroinformatik.ruhr-uni-bochum.de}
421: \newcommand{\phoneINI}{+49-234-32-27974}
422: \newcommand{\faxINI}{+49-234-32-14209}
423: \newcommand{\phone}{+44 131 650 3089}
424: \newcommand{\fax}{+44 131 650 6899}
425: \newcommand{\email}{mtoussai@inf.ed.ac.uk}
426: \newcommand{\homepage}{homepages.inf.ed.ac.uk/mtoussai}
427:
428: \newcommand{\addressINI}{
429: Institut~f\"ur~Neuroinformatik,
430: Ruhr-Universit\"at~Bochum, ND~04,
431: 44780~Bochum---Germany
432: }
433: \newcommand{\AddressINI}{
434: Institut~f\"ur~Neuroinformatik\\
435: Ruhr-Universit\"at Bochum, ND~04\\
436: 44780~Bochum---Germany
437: }
438: \newcommand{\address}{
439: Institute~for~Adaptive~and~Neural~Computation,
440: University~of~Edinburgh, 5~Forrest~Hill,
441: Edinburgh~EH1~2QL, Scotland,~UK
442: }
443: \newcommand{\Address}{
444: Institute~for~Adaptive~and~Neural~Computation\\
445: University~of~Edinburgh, 5~Forrest~Hill\\
446: Edinburgh~EH1~2QL, Scotland,~UK
447: }
448:
449: \newcommand{\published}{}
450:
451: %------------------------------------------------------------------------------
452: % environments / commands
453:
454: \newlength{\subsecwidth}
455:
456: \newcommand{\subsec}[1]{
457: \addtocontents{toc}{
458: \protect\setlength{\subsecwidth}{\textwidth}\protect\addtolength{\subsecwidth}{-27ex}
459: \protect\vspace*{-1.5ex}\protect\hspace*{20ex}
460: \protect\begin{minipage}[t]{\subsecwidth}\protect\footnotesize\protect\textsf{#1}\protect\end{minipage}
461: \protect\par
462: }
463: \begin{rblock}\it #1\end{rblock}\medskip\noindent
464: }
465: \newcommand{\tocsep}{
466: \addtocontents{toc}{\protect\bigskip}
467: }
468: \newcommand{\Chapter}[1]{
469: \chapter*{#1}\thispagestyle{empty}
470: \addcontentsline{toc}{chapter}{\protect\numberline{}#1}
471: }
472: \newcommand{\Section}[1]{
473: \section*{#1}
474: \addcontentsline{toc}{section}{\protect\numberline{}#1}
475: }
476: \newcommand{\Subsection}[1]{
477: \subsection*{#1}
478: \addcontentsline{toc}{subsection}{\protect\numberline{}#1}
479: }
480:
481: \newcommand{\content}[1]{
482: % \begin{rblock}\it #1\end{rblock}\medskip
483: % \addtocontents{toc}{\protect\begin{list}{}{\leftmargin9ex
484: % \rightmargin9ex \topsep-2ex \parsep.5ex}}
485: % \addtocontents{toc}{\protect\item[] \protect\small\protect\it #1}
486: % \addtocontents{toc}{\protect\end{list}\protect\medskip}
487: }
488:
489: \newcommand{\sepline}[1][200]{
490: \begin{center} \begin{picture}(#1,0)
491: \line(1,0){#1}
492: \end{picture}\end{center}
493: }
494:
495: \newcommand{\sepstar}{
496: \begin{center} {\vspace{0.5ex}\rule[1.2ex]{5ex}{.1pt}~*~\rule[1.2ex]{5ex}{.1pt}} \end{center}\vspace{-1.5ex}\noindent
497: }
498:
499: \newcommand{\partsection}[1]{
500: \vspace{5ex}
501: \centerline{\sc\LARGE #1}
502: \addtocontents{toc}{\contentsline{section}{{\sc #1}}{}}
503: }
504:
505: \newcommand{\intro}[1]{\textbf{#1}\index{#1}}
506:
507:
508: \newcounter{parac}
509: \newcommand{\para}{\noindent\refstepcounter{parac}{\bf [{\roman{parac}}]}~~}
510: \newcommand{\Pref}[1]{[\emph{\ref{#1}}\,]}
511:
512:
513:
514: \newenvironment{items}{
515: \begin{list}{}{\leftmargin1ex \topsep-\parskip}
516: \item[]
517: }{
518: \end{list}
519: }
520:
521: \newenvironment{block}[1][]{{\noindent\bf #1}
522: \begin{list}{}{\leftmargin\blockindent \topsep-\parskip}
523: \item[]
524: }{
525: \end{list}
526: }
527:
528: \newenvironment{rblock}{
529: \begin{list}{}{\leftmargin\blockindent \rightmargin\blockindent \topsep-\parskip}\item[]}{\end{list}}
530:
531: \newenvironment{algorithm}{
532: \begin{list}{\raisebox{.3ex}{\footnotesize\bf\arabic{enumi}.}}
533: {\usecounter{enumi} \leftmargin7ex \rightmargin7ex \labelsep1ex
534: \labelwidth5ex \topsep1ex \parsep.5ex \itemsep0pt} \small\sf
535: }{
536: \end{list}
537: }
538:
539: %\newenvironment{keywords}{\paragraph{Keywords}\begin{rblock}\small}{\end{rblock}}
540:
541: \newenvironment{colpage}{
542: \addtolength{\columnwidth}{-3ex}
543: \begin{minipage}{\columnwidth}
544: \vspace{.5ex}
545: }{
546: \vspace{.5ex}
547: \end{minipage}
548: }
549:
550: \newenvironment{enum}{
551: \begin{list}{}{\leftmargin3ex \topsep0ex \itemsep0ex}
552: \item[\labelenumi]
553: }{
554: \end{list}
555: }
556:
557: \newenvironment{cramp}{
558: \begin{quote} \begin{picture}(0,0)
559: \put(-5,0){\line(1,0){20}}
560: \put(-5,0){\line(0,-1){20}}
561: \end{picture}
562: }{
563: \begin{picture}(0,0)
564: \put(-5,5){\line(1,0){20}}
565: \put(-5,5){\line(0,1){20}}
566: \end{picture} \end{quote}
567: }
568:
569: %------------------------------------------------------------------------------
570: % symbol & operator macros
571:
572: \newcommand{\macros}{
573: \newcommand{\0}{{\hat 0}}
574: \newcommand{\1}{{\hat 1}}
575: \newcommand{\2}{{\hat 2}}
576: \newcommand{\3}{{\hat 3}}
577: \newcommand{\5}{{\hat 5}}
578:
579: \renewcommand{\a}{\ensuremath\alpha}
580: \renewcommand{\b}{\beta}
581: \renewcommand{\c}{\gamma}
582: \renewcommand{\d}{\delta}
583: \newcommand{\D}{\Delta}
584: \newcommand{\e}{\epsilon}
585: \newcommand{\g}{\gamma}
586: \newcommand{\G}{\Gamma}
587: \renewcommand{\l}{\lambda}
588: \renewcommand{\L}{\Lambda}
589: \newcommand{\m}{\mu}
590: \newcommand{\n}{\nu}
591: \newcommand{\N}{\nabla}
592: \renewcommand{\k}{\kappa}
593: \renewcommand{\o}{\omega}
594: \renewcommand{\O}{\Omega}
595: \newcommand{\p}{\phi}
596: \newcommand{\ph}{\varphi}
597: \renewcommand{\P}{\Phi}
598: \renewcommand{\r}{\varrho}
599: \newcommand{\s}{\sigma}
600: \newcommand{\Si}{\Sigma}
601: \renewcommand{\t}{\theta}
602: \newcommand{\T}{\Theta}
603: \renewcommand{\v}{\vartheta}
604: \newcommand{\x}{\xi}
605: \newcommand{\X}{\Xi}
606: \newcommand{\Y}{\Upsilon}
607:
608: \renewcommand{\AA}{{\cal A}}
609: \newcommand{\BB}{{\cal B}}
610: \newcommand{\CC}{{\cal C}}
611: \newcommand{\EE}{{\cal E}}
612: \newcommand{\FF}{{\cal F}}
613: \newcommand{\GG}{{\cal G}}
614: \newcommand{\HH}{{\cal H}}
615: \newcommand{\II}{{\cal I}}
616: \newcommand{\KK}{{\cal K}}
617: \newcommand{\LL}{{\cal L}}
618: \newcommand{\MM}{{\cal M}}
619: \newcommand{\NN}{{\cal N}}
620: \newcommand{\OO}{{\cal O}}
621: \newcommand{\PP}{{\cal P}}
622: \newcommand{\QQ}{{\cal Q}}
623: \newcommand{\RR}{{\cal R}}
624: \renewcommand{\SS}{{\cal S}}
625: \newcommand{\TT}{{\cal T}}
626: \newcommand{\uu}{{\cal u}}
627: \newcommand{\UU}{{\cal U}}
628: \newcommand{\XX}{{\cal X}}
629: \newcommand{\YY}{{\cal Y}}
630: \newcommand{\SOSO}{{\cal SO}}
631: \newcommand{\GLGL}{{\cal GL}}
632:
633: \newcommand{\Ee}{{\rm E}}
634:
635: \newcommand{\NNN}{{\mathbb{N}}}
636: \newcommand{\ZZZ}{{\mathbb{Z}}}
637: %\newcommand{\RRR}{{\mathrm{I\!R}}}
638: \newcommand{\RRR}{{\mathbb{R}}}
639: \newcommand{\CCC}{{\mathbb{C}}}
640: \newcommand{\one}{{{\bf 1}}}
641: \newcommand{\eee}{\text{e}}
642:
643: \renewcommand{\[}{\Big[}
644: \renewcommand{\]}{\Big]}
645: \renewcommand{\(}{\Big(}
646: \renewcommand{\)}{\Big)}
647: \renewcommand{\|}{\big|}
648: \newcommand{\<}{{\ensuremath\langle}}
649: \renewcommand{\>}{{\ensuremath\rangle}}
650:
651: \newcommand{\Prob}{{\rm Prob}}
652: \newcommand{\Aut}{{\rm Aut}}
653: \newcommand{\cor}{{\rm cor}}
654: \newcommand{\corr}{{\rm corr}}
655: \newcommand{\cov}{{\rm cov}}
656: \newcommand{\sd}{{\rm sd}}
657: \newcommand{\tr}{{\rm tr}}
658: \newcommand{\Tr}{{\rm Tr}}
659: \newcommand{\id}{{\rm id}}
660: \newcommand{\Gl}{{\rm Gl}}
661: \newcommand{\lag}{\mathcal{L}}
662: \newcommand{\inn}{\rfloor}
663: \newcommand{\lie}{\pounds}
664: \newcommand{\longto}{\longrightarrow}
665: \newcommand{\speer}{\parbox{0.4ex}{\raisebox{0.8ex}{$\nearrow$}}}
666: \renewcommand{\dag}{ {}^\dagger }
667: \newcommand{\h}{{}^\star}
668: \newcommand{\w}{\wedge}
669: \newcommand{\too}{\longrightarrow}
670: \newcommand{\To}{\Rightarrow}
671: \newcommand{\Too}{\;\Longrightarrow\;}
672: \newcommand{\oto}{\leftrightarrow}
673: \newcommand{\ow}{\stackrel{\circ}\wedge}
674: \newcommand{\feed}{\nonumber \\}
675: \newcommand{\comma}{~,\quad}
676: \newcommand{\period}{~.\quad}
677: \newcommand{\del}{\partial}
678: % \newcommand{\quabla}{\Delta}
679: \newcommand{\point}{$\bullet~~$}
680: \newcommand{\doubletilde}{
681: ~ \raisebox{0.3ex}{$\widetilde {}$} \raisebox{0.6ex}{$\widetilde {}$} \!\!
682: }
683: \newcommand{\topcirc}{\parbox{0ex}{~\raisebox{2.5ex}{${}^\circ$}}}
684: \newcommand{\topdot} {\parbox{0ex}{~\raisebox{2.5ex}{$\cdot$}}}
685: \newcommand{\topddot} {\parbox{0ex}{~\raisebox{1.3ex}{$\ddot{~}$}}}
686: \newcommand{\sym}{\topcirc}
687:
688: \newcommand{\half}{\frac{1}{2}}
689: \newcommand{\third}{\frac{1}{3}}
690: \newcommand{\fourth}{\frac{1}{4}}
691:
692: \newcommand{\ubar}{\underline}
693:
694: %\renewcommand{\vec}{\underline}
695: \renewcommand{\vec}{\boldsymbol}
696: \renewcommand{\_}{\underset}
697: \renewcommand{\^}{\overset}
698: %\renewcommand{\*}{{\rm\raisebox{-.6ex}{\text{*}}{}}}
699: \renewcommand{\*}{\text{\footnotesize\raisebox{-.4ex}{*}{}}}
700:
701: \newcommand{\gto}{{\raisebox{.5ex}{${}_\rightarrow$}}}
702: \newcommand{\gfrom}{{\raisebox{.5ex}{${}_\leftarrow$}}}
703: \newcommand{\gnto}{{\raisebox{.5ex}{${}_\nrightarrow$}}}
704: \newcommand{\gnfrom}{{\raisebox{.5ex}{${}_\nleftarrow$}}}
705:
706: \newcommand{\RND}{{\SS}}
707: \newcommand{\IF}{\text{if }}
708: \newcommand{\AND}{\textsc{and }}
709: \newcommand{\OR}{\textsc{or }}
710: \newcommand{\XOR}{\textsc{xor }}
711: \newcommand{\NOT}{\textsc{not }}
712: }
713:
714: %\newcommand{\argmax}[1]{{\rm arg}\!\max_{#1}}
715: %\newcommand{\argmin}[1]{{\rm arg}\!\min_{#1}}
716: \newcommand{\argmax}[1]{\underset{~#1}{\rm argmax}\;}
717: \newcommand{\argmin}[1]{\underset{~#1}{\rm argmin}\;}
718: \newcommand{\ee}[1]{\ensuremath{\cdot10^{#1}}}
719: \newcommand{\sub}[1]{\ensuremath{_{\text{#1}}}}
720: \newcommand{\up}[1]{\ensuremath{^{\text{#1}}}}
721: %\newcommand{\kld}[2]{D\big(#1\,\big|\!\big|\,#2\big)}
722: \newcommand{\kld}[2]{D\big(#1:#2\big)}
723: \newcommand{\sprod}[2]{\big<#1\,,\,#2\big>}
724: \newcommand{\End}{\text{End}}
725: \newcommand{\txt}[1]{\quad\text{#1}\quad}
726: \newcommand{\Over}[2]{\genfrac{}{}{0pt}{0}{#1}{#2}}
727: \newcommand{\mat}[1]{{\bf #1}}
728: \newcommand{\arr}[2]{\hspace*{-1ex}\begin{array}{#1}#2\end{array}\hspace*{-1ex}}
729: \newcommand{\matr}[2]{\left(\begin{array}{#1}#2\end{array}\right)}
730: \newcommand{\seq}[1]{\textsf{\<#1\>}}
731: \newcommand{\seqq}[1]{\textsf{#1}}
732:
733: %------------------------------------------------------------------------------
734: % stuff
735:
736: \newcommand{\url}[1]{\texttt{#1}}
737: \newcommand{\anchor}[2]{\begin{picture}(0,0)\put(#1){#2}\end{picture}}
738: \newcommand{\pagebox}{\begin{picture}(0,0)\put(-3,-23){
739: \textcolor[rgb]{.5,1,.5}{\framebox[\textwidth]{\rule[-\textheight]{0pt}{0pt}}}}
740: \end{picture}}
741:
742: \newcommand{\pathmt}{./}
743: \newcommand{\basepath}{./}
744: \newcommand{\setpath}[1]{\renewcommand{\pathmt}{#1}\renewcommand{\basepath}{#1}}
745: \newcommand{\pathinput}[2]{
746: \renewcommand{\pathmt}{\basepath #1}
747: \input{\pathmt #2} \renewcommand{\pathmt}{\basepath}}
748:
749: \newcommand{\hide}[1]{$\ll${\sf{\footnotesize #1}}$\gg$\message{^^JHIDE--Warning!^^J}}
750: %\newcommand{\hide}[1]{{\tt[hide:~}{\footnotesize\sf #1}{\tt]}\message{^^JHIDE--Warning!^^J}}
751: \newcommand{\Hide}{\renewcommand{\hide}[1]{\message{^^JHIDE--Warning (hidden)!^^J}}}
752: \newcommand{\HIDE}{\renewcommand{\hide}[1]{}}
753: \newcommand{\todo}[1]{{\tt[TODO: #1]}\message{^^JTODO--Warning: #1^^J}}
754: \newcommand{\Todo}{\renewcommand{\todo}[1]{\message{^^JTODO--Warning (hidden)!^^J}}}
755: \newcommand{\thetitle}{bla}
756: %\renewcommand{\title}[1]{\renewcommand{\thetitle}{#1}}
757: \newcommand{\header}{\begin{document}\mytitle\cleardefs}
758: \newcommand{\contents}{{\tableofcontents}\renewcommand{\contents}{}}
759: \newcommand{\footer}{\small\bibliography{\mytex bibs}\end{document}}
760:
761: \article{10}{1}
762: \chicago
763: %\numberlines
764: %\usepackage{rotate}
765: \Hide
766:
767: \newcommand{\EM}{}\newcommand{\BM}{}
768: \newcommand{\pl}{{\protect\rule[.3ex]{1ex}{1ex}}}
769: \newcommand{\mi}{\ensuremath{\circ}}
770: \newcommand{\ze}{\ensuremath{\cdot}}
771: \newcommand{\rb}[1]{\hspace{4pt}\raisebox{-.5ex}{\rotatebox{90}{#1}}\hspace{4pt}}
772:
773: \title{Notes on information geometry\\ and evolutionary processes}
774:
775: \header
776:
777: \begin{abstract}
778: In order to analyze and extract different structural properties of
779: distributions, one can introduce different coordinate systems over
780: the manifold of distributions. In Evolutionary Computation, the
781: Walsh bases and the Building Block Bases are often used to describe
782: populations, which simplifies the analysis of evolutionary operators
783: applying on populations. Quite independent from these approaches,
784: information geometry has been developed as a geometric way to
785: analyze different order dependencies between random variables (e.g.,
786: neural activations or genes).
787:
788: In these notes I briefly review the essentials of various coordinate
789: bases and of information geometry. The goal is to give an overview
790: and make the approaches comparable. Besides introducing meaningful
791: coordinate bases, information geometry also offers an explicit way
792: to distinguish different order interactions and it offers a
793: geometric view on the manifold and thereby also on operators that
794: apply on the manifold. For instance, uniform crossover can be
795: interpreted as an orthogonal projection of a population along an
796: $m$-geodesic, monotonously reducing the $\t$-coordinates that
797: describe interactions between genes.
798: \end{abstract}
799:
800: \section{Introduction}
801:
802: Evolution can be understood as a process on the space $\L$ of
803: distributions over the search $\O$. Essentially, a parent population
804: can be captured as a (finite) distribution $p \in \L$. Mutation and
805: recombination operators ($\MM \CC$) applied on the parent population
806: specify a search (offspring) distribution $q \in\L$. And a (stochastic) selection
807: operator ($\SS^\m\, \FF\, \SS^\n$) maps $q$ to a new parent population
808: $p'$. In this view, evolution can be understood as a process
809: \BM\begin{align*}
810: p
811: ~\stackrel{\MM \CC}\longmapsto~ q
812: ~\stackrel{\SS^\m \FF \SS^\n}\longmapsto~ p'
813: ~\stackrel{\MM \CC}\longmapsto~ q'
814: ~\stackrel{\SS^\m \FF \SS^\n}\longmapsto~ p''
815: ~\stackrel{\MM \CC}\longmapsto~ \cdots
816: \end{align*}\EM
817:
818: We do not need to go into the details of the indicated recombination,
819: mutation, and selection operators here. Instead, we would like to
820: emphasize an information theoretic point of view on this process.
821: Typically, the mapping $p \mapsto q$ (which one could also call search
822: heuristic) from the parent population to the search distribution adds
823: entropy whereas selection $q \mapsto p'$ reduces entropy. Another
824: interesting observable in this process is the \emph{structure} of the
825: distributions---by which we mean the mutual information present in
826: these distributions. For instance, one can show that ordinary mutation
827: and crossover operators (on a direct genetic representation) generally
828: reduce mutual information, i.e., destroy structural content that might
829: have been present in $p$ after selection \cite{toussaint:04-ecj}.
830:
831: The analysis of the structure of distributions is an important topic
832: in various areas. In evolutionary computation, the Walsh
833: spectrum is a prominent way to analyze the structure of $p$, often
834: with the aim to transport it to $q$. The Walsh coefficients may also
835: be considered as a way of describing epistasis. In complex systems,
836: certain mutual information measures are often used to define the
837: structuredness (in their terms: complexity) of dynamics systems
838: \cite{langton:90,sporns-tononi:02}.
839:
840: In these notes, I want to briefly review the information geometric way
841: to describe the structure of a distribution \cite{amari:99,amari:01}
842: and relate it to the field of evolutionary computation. The first step
843: is simply to present the coordinates introduced by Walsh coefficients
844: side-by-side with those used in information geometry to make them
845: comparable. This gives an intuition about the ``bases'' over which
846: distributions can be analyzed and reveals, for instance, that the
847: so-called Building-Block-Basis \cite{chryssomalakos-stephens:04}, as
848: introduced in Evolutionary Computation, is the same as Amari's
849: $\eta$-basis. Maybe Amari's $\t$-bases is most interesting in its
850: capabilities to precisely capture $k$th-order mutual dependencies. It
851: offers a notion of the ``order-spectrum of mutual information''
852: alternative to the Walsh spectrum. Eventually, Amari's formalism
853: allows to completely decompose any distribution into its different
854: $k$th-oder components.
855:
856: Finally, the \emph{geometry} introduced over the space of
857: distributions by Amari gives very insightful interpretations of
858: distances between distributions. A Pythagoras theorem can be
859: formulated for the Kullback-Leibler divergence. Under some conditions,
860: minimizations of the Kullback-Leibler divergence can often be
861: interpreted as orthogonal projections. This offers a geometric view on
862: some evolutionary operators.
863:
864:
865: \section{Notations}
866:
867: \paragraph{Distributions, $\log$-probabilities, and hypercube bases}
868:
869: The most direct ``coordinate system'' that can be introduced on the
870: manifold of distributions is given by the probabilities $p(x)$ for all
871: $x\in \O$ itself. To preserve notational uniformity with other
872: coordinate systems we write these numbers as $p_x := p(x)$, which
873: means that $p_x$ is the $x$-th component of $p \in \L$ in the direct
874: basis. Because of the normalization constraint $\sum_x p_x = 1$, these
875: are only $|\O|-1$ independent coordinates.
876:
877: Clearly, instead of using $p_x$ as coordinates, one can also use their
878: $\log$'s $l_x := -\log p_x$. Taking the log of probabilities is, very
879: roughly spoken, related to changing to entropic units. (Note the
880: definition of the entropy of $p$ as $H(p) = -\sum_x p_x \log p_x = {\rm
881: E}_p \{l_x\}$.) Thus, coordinates that have some ``entropic
882: meaning'' (i.e., are related to information theoretic measures like
883: entropy, mutual information, or Kullback-Leibler divergence) will be
884: based on these log quantities. Namely, this will be the $\t$-coordinate
885: system introduced by Amari (see \citeNP{amari:99,amari:01}).
886:
887: In the following we will speek of bases of coordinate systems.
888: Essentially, what we mean are basis functions, similar to the sine
889: and cosine in the Fourier transform. For illustration, we will always
890: think of $\O$ as the hypercube; the basis function then correspond to
891: ``colorings'' of the hypercube with function values (mostly $1$, $0$,
892: or $-1$). E.g., if $e_i:~ \O=\{0,1\}^3 \to \{1,0,-1\}$ is the $i$-th
893: basis function, then the $i$-th coordinate of a distributions $p$ in
894: this coordinate system is the convolution of $p$ with $e_i$: $p_i =
895: \<e_i,p\> := \sum_{x\in\O} e_i(x)\, p(x)$. We illustrate such basis
896: functions by 3D-hypercubes, \raisebox{-3mm}{\input{figtexs/samplecube}}
897: where the bullet corresponds to $1$, the circle to $-1$ and empty
898: vertices to $0$.
899:
900: The basis of direct coordinate system is the $\d$-basis: the set of
901: all hypercubes where only one vertex is $1$ and all others are $0$.
902:
903:
904: \paragraph{Marginals over $k$-tuples of variables and schemata}
905:
906: In the following, we will also need a compact notation for the
907: different marginals of a distribution. Let $\O$ be a product space
908: $\O=\O^1 \times \cdots \times \O^n$ such that we can define the
909: marginals of a distributions $p$ over single variables but also pairs,
910: triples, and $k$-tuples of variables. We use indices $i,j,..\in I=
911: \{1,..,n\}$ to indicate variables and write the marginals as $p^{ij..}$,
912: \BM\begin{align*}
913: p^{ij..}(a,b,..)=\Pr\{x_i=a,~ x_j=b,~ ..\} ~.
914: \end{align*}\EM
915: The set of all possible marginals is given by considering all single
916: indices $i$, all pairs $i<j$, all triples $i<j<k$, etc. To simplify
917: notation (e.g., summation over such objects), we collect all these
918: tuples of indices in a set
919: \BM\begin{align*}
920: A
921: &= I
922: ~\cup~ \{ (i,j) ~|~ i<j \in I \}
923: ~\cup~ \{ (i,j,k) ~|~ i<j<k \in I \}
924: ~\cup~ \cdots ~\cup~ \{ (1,2,..,n) \} \\
925: &=\{1,..,n,~ (1,2),(1,3)..,(1,n),(2,3),(2,4)...,(n-1,n),~
926: (1,2,3),..,~ (1,2,3,..,n)\} ~.
927: \end{align*}\EM
928: In that way, all marginals of $p$ are given as $p^a$ for $a \in
929: A$. Note that $|A|=|\O|-1$.
930:
931: Besides using $a \in A$ to indicate a marginal, one can equivalently
932: use the schemata notation of length-$n$ strings in $\{\*,d\}^n$:
933: For a given $a$, the corresponding schema is the string of all
934: $\*$'s except for those positions indicated in the tuple $a$. E.g.,
935: for $n=6$:
936: \BM\begin{align*}
937: p^{245} \equiv p^{\*d\*dd\*}
938: \end{align*}\EM
939:
940:
941:
942: \section{Walsh, $\eta$-, $\theta$-, Building Block, and Haar bases}
943:
944: Table \ref{tabBases} captures the basics of the Walsh, $\eta$-,
945: $\theta$-, and Haar bases. In all cases, the coordinate system is
946: defined by the basis functions $e_i$ depicted for the 3D-case as
947: hypercubes. Actually, these 3D illustrations of the basis functions
948: $e_i$ are already sufficient to infer the basis functions for all $n$
949: since they are constructed in a very systematic way---which seems
950: obvious by simply looking at them and becomes rigorous by considering
951: the transformation matrices into these coordinates systems:
952:
953: The transformation matrices map linearly (mod 2) from the direct
954: coordinates $p_x$ to the new coordinates. E.g., in the Walsh case,
955: $w_y = \sum_x W_{yx} p_x$. The rows in these matrices correspond to
956: the basis functions $e_y = W_{y\cdot}$. An important property is that
957: in all cases (except the Haar bases!), the transformation matrices can
958: be constructed by repeated tensor products of a 2D matrix. For
959: instance, for $n=2$ in the Walsh case:
960: \BM\begin{align*}
961: W^{n=2}
962: = \matr{rrrr}{1&1&1&1\\1&-1&1&-1\\1&1&-1&-1\\1&-1&-1&1}
963: = \matr{rr}{1&1\\1&-1} \otimes \matr{rr}{1&1\\1&-1}
964: =: \matr{rr}{1&1\\1&-1}^{\!\!\otimes 2}
965: \end{align*}\EM
966: Here, we introduced the superscript notation ${}^{\otimes n}$ to
967: indicate the $n$-fold tensor product.
968:
969:
970:
971:
972:
973:
974:
975: \newcommand{\inclfig}[1]{\begin{minipage}[t]{35mm}\raisebox{-43mm}{\input{figtexs/#1}}\end{minipage}}
976:
977:
978: \begin{table}
979: \begin{tabular}{@{}p{43mm}rr@{}}
980: {\bf Walsh} \newline
981: $w_y = \sum_x W_{yx}\, p_x$\newline
982: $p_x = \frac{1}{n}\, \sum_y W_{xy} w_y$\newline
983: $W_{yx} = (-1)^{|x\, \AND y|}$\newline
984: $~~~~ = \matr{cc}{\pl & \pl \\ \pl & \mi}^{\!\!\otimes n}$\newline
985: $W^{-1} = \frac{1}{n}\, W$
986: & \begin{tabular}[t]{@{}c@{\,}|@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}}
987: & \rb{000} & \rb{001} & \rb{010} & \rb{011} & \rb{100} & \rb{101} & \rb{110} & \rb{111} \\
988: \hline
989: 000 & \pl & \pl & \pl & \pl & \pl & \pl & \pl & \pl \\
990: 001 & \pl & \mi & \pl & \mi & \pl & \mi & \pl & \mi \\
991: 010 & \pl & \pl & \mi & \mi & \pl & \pl & \mi & \mi \\
992: 011 & \pl & \mi & \mi & \pl & \pl & \mi & \mi & \pl \\
993: 100 & \pl & \pl & \pl & \pl & \mi & \mi & \mi & \mi \\
994: 101 & \pl & \mi & \pl & \mi & \mi & \pl & \mi & \pl \\
995: 110 & \pl & \pl & \mi & \mi & \mi & \mi & \pl & \pl \\
996: 111 & \pl & \mi & \mi & \pl & \mi & \pl & \pl & \mi \\
997: \end{tabular}
998: &\inclfig{walsh} \\\hline
999: {\bf Amari's $\eta$ / BBB} \newline
1000: $\eta_a = \sum_x \bar B_{ax} p_x$\newline
1001: $~~~~ = \sum_x (B^{-1})^T_{ax} p_x$ \newline
1002: $p_x = \sum_a B^T_{xa} \eta_a$\newline
1003: $\bar B = (B^{-1})^T = \matr{cc}{\pl & \pl \\ \ze & \pl}^{\!\!\otimes n}$ \newline
1004: $\bar B^{-1} = \matr{cc}{\pl & \mi \\ \ze & \pl}^{\!\!\otimes n}$
1005: & \begin{tabular}[t]{@{}c@{\,}|@{\,}c@{\,}|@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}}
1006: & & \rb{000} & \rb{001} & \rb{010} & \rb{011} & \rb{100} & \rb{101} & \rb{110} & \rb{111} \\
1007: \hline
1008: $\cdot$ & $\*\*\*$ & \pl & \pl & \pl & \pl & \pl & \pl & \pl & \pl \\
1009: 3 & $\*\*1$ & \ze & \pl & \ze & \pl & \ze & \pl & \ze & \pl \\
1010: 2 & $\*1\*$ & \ze & \ze & \pl & \pl & \ze & \ze & \pl & \pl \\
1011: 23 & $\*11$ & \ze & \ze & \ze & \pl & \ze & \ze & \ze & \pl \\
1012: 1 & $1\*\*$ & \ze & \ze & \ze & \ze & \pl & \pl & \pl & \pl \\
1013: 13 & $1\*1$ & \ze & \ze & \ze & \ze & \ze & \pl & \ze & \pl \\
1014: 12 & $11\*$ & \ze & \ze & \ze & \ze & \ze & \ze & \pl & \pl \\
1015: 123 & $111$ & \ze & \ze & \ze & \ze & \ze & \ze & \ze & \pl \\
1016: \end{tabular}
1017: &\inclfig{eta} \\\hline
1018: {\bf Amari's $\t$} \newline
1019: $\t_a = \sum_x B_{ax} l_x$ \newline
1020: $l_x = \sum_a \bar B^T_{xa} \t_a$ \newline
1021: $B = (\bar B^{-1})^T = \matr{cc}{\pl & \ze \\ \mi & \pl}^{\!\!\otimes n}$ \newline
1022: $B^{-1} = \matr{cc}{\pl & \ze \\ \pl & \pl}^{\!\!\otimes n}$
1023: & \begin{tabular}[t]{@{}c@{\,}|@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}}
1024: & \rb{000} & \rb{001} & \rb{010} & \rb{011} & \rb{100} & \rb{101} & \rb{110} & \rb{111} \\
1025: \hline
1026: $\cdot$ & \pl & \ze & \ze & \ze & \ze & \ze & \ze & \ze \\
1027: 3 & \mi & \pl & \ze & \ze & \ze & \ze & \ze & \ze \\
1028: 2 & \mi & \ze & \pl & \ze & \ze & \ze & \ze & \ze \\
1029: 23 & \pl & \mi & \mi & \pl & \ze & \ze & \ze & \ze \\
1030: 1 & \mi & \ze & \ze & \ze & \pl & \ze & \ze & \ze \\
1031: 13 & \pl & \mi & \ze & \ze & \mi & \pl & \ze & \ze \\
1032: 12 & \pl & \ze & \mi & \ze & \mi & \ze & \pl & \ze \\
1033: 123 & \mi & \pl & \pl & \mi & \pl & \mi & \mi & \pl \\
1034: \end{tabular}
1035: &\inclfig{theta} \\\hline
1036: {\bf Haar}\newline
1037: please see \cite{khuri:94}
1038: & \begin{tabular}[t]{@{}c@{\,}|@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}}
1039: & \rb{000} & \rb{001} & \rb{010} & \rb{011} & \rb{100} & \rb{101} & \rb{110} & \rb{111} \\
1040: \hline
1041: 000 & \pl & \pl & \pl & \pl & \pl & \pl & \pl & \pl \\
1042: 001 & \pl & \pl & \pl & \pl & \mi & \mi & \mi & \mi \\
1043: 010 & \pl & \pl & \mi & \mi & \ze & \ze & \ze & \ze \\
1044: 011 & \ze & \ze & \ze & \ze & \pl & \pl & \mi & \mi \\
1045: 100 & \pl & \mi & \ze & \ze & \ze & \ze & \ze & \ze \\
1046: 101 & \ze & \ze & \pl & \mi & \ze & \ze & \ze & \ze \\
1047: 110 & \ze & \ze & \ze & \ze & \pl & \mi & \ze & \ze \\
1048: 111 & \ze & \ze & \ze & \ze & \ze & \ze & \pl & \mi \\
1049: \end{tabular}
1050: &\inclfig{haar}
1051: \end{tabular}
1052: \caption{\label{tabBases}
1053: Overview over the different bases for the space of distributions. The
1054: first column gives the definitions of the transformations and their
1055: inverse. Note that the $\t$-bases is defined in log-space. The
1056: transformation matrices are illustrated in the section column
1057: for $n=3$ using the symbols $\pl =1$, $\mi=-1$, and $\ze=0$. The third
1058: column illustrates the bases functions $e_y$ (or $e_a$) as colorings
1059: of the hypercube $\{0,1\}^3$. Note that the basis functions
1060: correspond to rows of the transformation matrix. The 1-norm $|x\, \AND y|$
1061: of the \AND of two binary strings counts the 1-bits that they have in common.
1062: }
1063: \end{table}
1064:
1065:
1066:
1067: Table \ref{tabBases} summarizes the most important properties of these
1068: transformation matrices: their closed form expression, their tensor
1069: product construction, and their inverse. When looking at the table one
1070: should first observe the self-similar regularity of the transformation
1071: matrices, which stems from their definition of repeated tensor
1072: products. The meaning of the various bases become more intuitive when
1073: looking at the hypercube illustrations of the basis. The Walsh bases,
1074: e.g., can nicely be compared to a Fourier basis: $e_{000}$ corresponds
1075: to the constant function $1$, $e_{001},e_{010},e_{100}$ could be view
1076: as sinus functions along the $x$-, $y$-, and $z$-axes, respectively;
1077: $e_{011},e_{101},e_{110}$ are products of sinus functions---and
1078: capture 2nd order dependencies; and $e_{111}$ is the ``highest
1079: frequency'' bases function capturing 3rd order dependencies.
1080:
1081: The $\eta$-bases captures certain marginals relative to the all-1s string:
1082: \BM\begin{align*}
1083: \eta_a=p^a(11..) ~.
1084: \end{align*}\EM
1085: These can be thought of the marginals over all possible
1086: Building-Blocks---thus it is also called the Building-Block-Bases
1087: (BBB, cf.\ \citeNP{chryssomalakos-stephens:04}). This marginalization becomes apparent
1088: in the hypercube colorings as the abundance of zeros (non-colored
1089: vertices and dots in the matrix).
1090:
1091: The $\t$-bases combines the ``frequency'' idea of the Walsh
1092: bases with the marginalization: The highest order bases function
1093: $e_{123}$ is analogous to the Walsh bases $e_{111}$ and detects
1094: highest order dependencies. Lower order dependencies though are only
1095: detected on a marginal.
1096:
1097: However, note that the $\t$ bases is defined in log-space, $\t_a =
1098: \sum_x B_{ax} \log p_x$. We will find some implications of this in the next
1099: section. Note that the transformation matrices of the $\eta$-
1100: (Building-Block-) and the $\t$-bases are related via $B = (\bar
1101: B^{-1})^T$.
1102:
1103: For completeness, we also indicated the Haar bases in table
1104: \ref{tabBases}. It can not be derived as repeated tensor products and
1105: we do not discuss it any further here. One argument made about the
1106: Haar bases \cite{khuri:94} is that the transformation matrix incorporates
1107: a lot of 0s. Thus, the coefficients are more efficient to compute as
1108: the Walsh coefficients. We add here that the ratio of zeros in the
1109: $\eta$ and $\t$ transformation matrices is $1-(3/4)^{n-1}$ and
1110: approaches $1$ exponentially with the dimension $n$.
1111:
1112:
1113:
1114:
1115:
1116:
1117:
1118:
1119:
1120: \section{Mathematical structure on the manifold $\L$}
1121:
1122: In this section we want to develop a more geometric view on the
1123: manifold of distributions, following \cite{amari:99,amari:01}. This
1124: geometry will put a special emphasis on the $\eta$- and $\t$-bases.
1125:
1126:
1127: \paragraph{$m$- and $e$-geodesics}
1128:
1129: An essential ingredient to describe the geometry of a manifold is the
1130: definition of the notion of ``straight lines'', or geodesics, connecting two
1131: points in the manifold. In the case of the manifold of distributions,
1132: there exist at least two ways of defining a straight path connecting two
1133: distributions $q$ and $r$: the one being the linear mixture in direct
1134: coordinates $p_x$, the other being the linear mixture in $\log$
1135: coordinates $l_x$,
1136: \BM\begin{align*}
1137: \text{$m$-geodesic:}\qquad& p(x) = (1\!-\!\a)\, q(x) + \a\, r(x) ~,\\
1138: \text{$e$-geodesic:}\qquad& \log p(x) = (1\!-\!\a)\, \log q(x) + \a\, \log r(x) - \psi(x) ~.
1139: \end{align*}\EM
1140: Here $m$ means \emph{mixture} and $e$ means \emph{exponential}. The
1141: additional term $\psi(x)$ in the $e$-geodesic is necessary to preserve
1142: the normalization of $p(x)$.
1143:
1144: The fact that there exist two ways of defining geodesics means that
1145: there exist two meaningful \emph{affine connections} on the manifold.
1146: %(instead of only the Christoffel symbol derived from the Fisher
1147: %metric---the manifold is non-Riemannian).
1148: Both define a notion of
1149: flatness: we say that a $m$-geodesic is $m$-flat and a $e$-geodesic
1150: is $e$-flat.
1151:
1152: It turns out that the coordinate lines (and planes, hyperplanes, etc.)
1153: of $\eta$ are $m$-flat and those of $\t$ are $e$-flat. The former is
1154: obvious, since an $m$-geodesic can equivalently be written in the
1155: $\eta$ coordinate system as $\eta_a(p) = (1\!-\!\a)\, \eta_a(q) + \a\,
1156: \eta_a(r)$. The second becomes apparent when realizing that the Taylor
1157: expansion of $\log p$ reads
1158: \BM\begin{align*}
1159: l_x
1160: = \sum_i \t_i x_i
1161: + \sum_{i<j} \t_{ij}\, x_i x_j
1162: + \sum_{i<j<k} \t_{ijk}\, x_i x_j x_k
1163: + \cdots + \t_{1..n}\, x_1..x_n - \psi
1164: = \sum_{a\in A} \t_a X^a - \psi
1165: \end{align*}\EM
1166: where $X^a$ is the product of the components $x_{i_1} x_{i_2}\cdots x_{i_k}
1167: \in \{0,1\}$ when $a=(i_1,i_2,..,i_k)$. Thus, an $e$-geodesic is
1168: written, in the $\t$ coordinate system, simply as $\t_a(p) = (1\!-\!\a)\,
1169: \t_a(q) + \a\, \t_a(r)$.
1170:
1171:
1172: \paragraph{Fisher metric, Kullback-Leibler divergence}
1173:
1174: On this manifold $\L$, there is a metric defined, the \emph{Fisher
1175: metric}. In \emph{arbitrary} coordinates $v_i$ (it could be any of
1176: the Walsh, log, $\eta$-, or $\t$-coordinates), it reads
1177: \BM\begin{align*}
1178: g_{ij}(p) = {\rm E}\left\{ \frac{\del \log p}{\del v_i}\, \frac{\del \log p}{\del v_j}\right\} ~.
1179: \end{align*}\EM
1180: Some intuition can be gained by realizing that, locally, the distance
1181: measured by the Fisher metric coincides with the distance measured by
1182: the Kullback-Leibler divergence:\footnote{ The Kullback-Leibler
1183: divergence $\kld{p}{q}$ (also called relative entropy or divergence)
1184: is a measure for the loss of information (or gain of entropy) when a
1185: \emph{true} distribution $p$ is approximated by a model
1186: distributions $q$. For example, when $p(x,y)$ is approximated by
1187: $p(x)\,p(y)$ one looses information on the mutual dependence between
1188: $x$ and $y$. Accordingly, the relative entropy
1189: $\kld{p(x,y)}{p(x)\,p(y)}$ is equal to the mutual information
1190: between $x$ and $y$. Generally, when \emph{knowing} the real
1191: distribution $p$ one needs on average $H(p)$ (entropy of $p$) bits to
1192: describe a random sample. If, however, we know only an approximate
1193: model $q$ we would need on average $H(p) + \kld{p}{q}$ bits to
1194: describe a random sample of $p$. The loss of knowledge about the
1195: true distribution induces an increase of entropy and thereby an
1196: increase of average description length for random samples. }
1197: Consider a point $p \in \L$ and a nearby point $p+\d p$. When we
1198: measure the squared length $\<\d p,\d p\>$ of the variation $\d p$ by
1199: the Kullback-Leibler divergence we find \BM\begin{align*} \<\d p,\d
1200: p\> = \kld{p}{p+\d p} = {\rm E}\left\{ \log p - \log (p+\d
1201: p)\right\} \ddot=~ {\rm E}\left\{- \frac{\d p}{p} + \frac{\d
1202: p^2}{p^2} \right\} = {\rm E}\left\{\frac{\d p^2}{p^2} \right\}
1203: ~.
1204: \end{align*}\EM
1205: Here, the 2nd-order approximation stems from the Taylor expansion of
1206: $\log(p+\d p)$ and ${\rm E}\{\d p/p\} =0$ since $\sum_x \d p(x)=0$ to
1207: preserve normalization. Note that, in this infinitesimal neighborhood,
1208: the Kullback-Leibler divergence becomes symmetric. Generalizing this
1209: to two small variations $\d_1 p= \del_{v_i} p := \frac{\del p}{\del v_i}$ and $\d_2
1210: p= \del_{v_j} p := \frac{\del p}{\del v_j}$ induced by small shifts along some
1211: coordinates we have
1212: \BM\begin{align*}
1213: \<\del_{v_i} p,\del_{v_j} p\>
1214: = {\rm E}\left\{\frac{\del_{v_i} p}{p}\, \frac{\del_{v_i} p}{p} \right\}
1215: = {\rm E}\left\{\frac{\del \log p}{\del v_i}\, \frac{\del \log p}{\del v_i} \right\}
1216: \end{align*}\EM
1217: and retrieve the Fisher metric. In turn, the Fisher metric can also be derived by considering the second order derivatives of the Kullback-Leibler divergence:
1218: \BM\begin{align*}
1219: g_{ij}(q) = \frac{1}{2}\, \frac{\del}{\del v_i}\frac{\del}{\del v_j} \kld{p}{p+\d v}\Big|_{\d v=0} ~.
1220: \end{align*}\EM
1221:
1222:
1223: \paragraph{Orthogonality of $\eta$ and $\t$, the Pythagoras}
1224:
1225: The coordinate systems $\eta$ and $\t$ have a crucial property w.r.t.\
1226: the Fisher metric---they are mutually orthogonal: At any point $p$ in
1227: the manifold the variations induced by shifts along $\t$ and $\eta$
1228: coordinates fulfill
1229: \BM\begin{align*}
1230: \<\del_{\t_a} p, \del_{\eta_b} p\> = \d_{ab} ~,
1231: \end{align*}\EM
1232: where $\d_{ab}$ is the Kronecker delta. Based on this one can derive a
1233: Pythagoras theorem: Let $p$, $r$ and $q$ be three distributions where
1234: the $m$-geodesic connecting $p$ and $r$ is orthogonal to the
1235: $e$-geodesic connecting $r$ and $q$, then
1236: \BM\begin{align*}
1237: \kld{p}{q} = \kld{p}{r} + \kld{r}{q} ~.
1238: \end{align*}\EM
1239: Please figure \ref{figPy} for an illustration.
1240:
1241: \begin{figure}\center
1242: \input figtexs/pythagoras
1243: \caption{\label{figPy}
1244: The Pythagoras in the case when a certain $k$-cut is used to define
1245: the $m$- and $e$ geodesics connecting to $r$, respectively $r'$. It
1246: holds: $\kld{p}{q} = \kld{p}{r} + \kld{r}{q}$ and $\kld{q}{p} = \kld{q}{r'} + \kld{r'}{p}$.
1247: }
1248: \end{figure}
1249:
1250:
1251: \paragraph{$k$-cuts}
1252:
1253: Let $k$ denote an order of interactions that we are interested in.
1254: Then, the coordinates split into those describing interactions of
1255: order $\le k$ and those describing interactions of order $> k$,
1256: \BM\begin{align*}
1257: \vec \eta_k &:= (\text{all $\eta_a$ of order $|a| \le k$}) ~,\\
1258: \vec \t_{k^*} &:= (\text{all $\t_a$ of order $|a| >k$}) ~.
1259: \end{align*}\EM
1260:
1261: These can be mixed into a new coordinate system $(\vec \eta_k,\vec
1262: \t_{k^*})$. The point is that those dimensions spanned by $\vec
1263: \eta_k$ are orthogonal to those spanned by $\vec \t_{k^*}$. To
1264: simplify the discussion we call $\vec \eta_k$ \emph{marginals}
1265: (although they include marginals over $k$-tuples of variables) and
1266: $\vec\t_{k^*}$ \emph{higher order interactions}. Keeping the marginals
1267: $\vec \eta_k$ constant defines $m$-flat sub-manifolds $M_k(\eta_k)$,
1268: which are disjoint for different $\vec \eta_k$ and cover all $\L$.
1269: Keeping higher order interactions $\vec \t_{k^*}$ constant defines
1270: $e$-flat sub-manifolds $E_{k^*}(\t_{k^*})$, which are disjoint for
1271: different $\vec \t_{k^*}$ and cover all $\L$.
1272:
1273:
1274:
1275: \paragraph{Complete decomposition of different order interactions}
1276:
1277: Given a distribution $p$, we define its $k$th order reduction
1278: $p^{(k)}$ as the distribution with same marginals $\vec \eta_k(p)$ as
1279: $p$ but vanishing higher order interactions $\vec \t_{k^*}=0$,
1280: %
1281: \BM\begin{align*}
1282: p^{(k)} = (\vec \eta_k(p),\vec \t_{k^*}=0) ~.
1283: \end{align*}\EM
1284: %
1285: That is, $p^{(k)}$ is the same distributions as $p$ except that all
1286: interactions of order $>k$ have been canceled. We call $p^{(k)}$ the
1287: $k$th-order reduction of $p$. Given the Pythagoras it should be clear
1288: that $p^{(k)}$ can also be defined as the orthogonal projection of $p$
1289: onto the submanifold $E_{k^*}(0)$ or as the orthogonal projection of
1290: the uniform distribution $p^{(0)}$ onto $M_k(\vec\eta_k(p))$, please
1291: see figure \ref{figDecomp} left,
1292: \BM\begin{align*}
1293: p^{(k)}
1294: = \argmin{q \in E_{k^*}(0)} \kld{p}{q}
1295: = \argmin{q \in M_k(\vec\eta_k(p))} \kld{q}{p^{(0)}} ~.
1296: \end{align*}\EM
1297: Further, define $D_k(p) = \kld{p^{(k)}}{p^{(k-1)}}$. Then the
1298: Pythagoras allows to decompose the mutual information $I(p)$ in $p$
1299: (i.e., the measure of all interactions in $p$) into a sum of different
1300: order interactions:
1301: \BM\begin{align*}
1302: I(p) = \kld{p}{p^{(1)}} = \sum_{k=2}^n D_k(p)
1303: \end{align*}\EM
1304: Please see figure \ref{figDecomp} right for an illustration.
1305:
1306: This result should be highlighted. The given formalism allows to
1307: explicitly distinguish different order interactions between variables
1308: in a distribution and directly assigns coordinates $\t$ to those
1309: different order interactions. The quantities $D_k(p) =
1310: \kld{p^{(k)}}{p^{(k-1)}}$ measure precisely and only the $k$th-order
1311: interactions in entropic units.
1312:
1313: For instance, consider three random variables $X_1,\, X_2,\, X_3$
1314: which are pair-wise dependent in the sense $I(X_i|X_j) \not=0$. The
1315: question is whether there exist ``true'' 3rd-order interactions or only
1316: concatenated 2nd-order interactions---in other terms, can they be
1317: described by a Markov process $X_1 \to X_2 \to X_3$. The formalism
1318: gives an answer: if $D_3(p)=0$ it is a Markov process, otherwise there
1319: exist 3rd-order interactions.
1320:
1321: \begin{figure}\center
1322: \input{figtexs/decomp}\hfill
1323: \input figtexs/cuts
1324: \caption{\label{figDecomp}
1325: The left figure illustrates a distribution $p$ and its $k$th-order
1326: reduction $p^{(k)}$: It is the orthogonal projection of $p$ along
1327: $M_k(p)$ onto $E_{k^*}(0)$. The ``distance'' $D(p:p^{(k)})$ measures
1328: ``norm'' of $\vec\t_{k^*}$, i.e., it measures the amount of mutual
1329: information of order higher than $k$. The right figure illustrates
1330: the complete decomposition of $p$ in reductions $p^{(k)}$ of all
1331: orders. Every projection from $p^{(k)}$ to $p^{(k-1)}$ is an
1332: orthogonal projection onto $E_{(k-1)^*}(0)$. Every ``distance''
1333: $D(p^{(k)}:p^{(k-1)})$ measures the mutual information specifically
1334: of order $k$. }
1335: \end{figure}
1336:
1337:
1338:
1339:
1340:
1341:
1342: \section{Geometric view on evolution operators}
1343:
1344: \paragraph{Crossover}
1345:
1346: In Evolutionary Algorithms, crossover is one means of mixing a parent
1347: population to an offspring population. Populations can be formalized
1348: as distributions $p$ and a definition of a simple form of crossover
1349: (uniform crossover parameterized with $c\in\RRR$) reads
1350: \BM\begin{align*}
1351: \CC p &= (1-c)\, p + c\, p^{(1)} ~.
1352: \end{align*}\EM
1353: See, for instance, \cite{toussaint:03-gecco-cross} for a general
1354: definition of a crossover operator in more conventional notation and
1355: details of when it reduces to this simple form.
1356:
1357: This crossover simply mixes the original distribution (or population)
1358: $p$ with its $1$st-order reduction. The $1$st-order reduction is the
1359: product of all single variable marginals, i.e., it is the distribution
1360: with the same marginals (gene frequencies) as $p$ but all dependencies
1361: (gene linkages) between the variables eliminated. From the geometrical
1362: point of view, crossover makes a step along the $m$-geodesic
1363: connecting $p$ and $p^{(1)}$. It can be illustrated as a step along
1364: the projection onto the submanifold $E_{1^*}(p)$, please see figure
1365: \ref{figCross}.
1366:
1367:
1368: From this view it becomes clear that a reasonable coordinate system to
1369: describe crossover is $(\vec\eta_1,\vec\t_{1^*})$. Crossover does not
1370: change $\vec\eta_1$ (it operates orthogonally to $\eta_1$) but
1371: continuously reduces the $\vec\t_{1^*}$ variables. That $\vec\t_{1^*}$
1372: are reduced and not increased is intuitive from figure \ref{figCross}
1373: (recall that $\t$'s are always positive) and becomes apparent from that
1374: the ``distance'' from $p$ to $p^{(1)}$, $I(p)=\kld{p}{p^{(1)}}$, is a
1375: norm of $\vec\t_{1^*}$. \hide{ More explicitly, consider the
1376: derivative of the $\t$-coordinates along the path, i.e., with
1377: varying $c$, \BM\begin{align*}
1378: \frac{\del}{\del c} p_x(c) &= p_x^{(1)} - p_x \\
1379: \frac{\del}{\del c} \t_a(c) &= \sum_x B_{ax}\, \frac{\del}{\del c}
1380: l_x(c) = \sum_x B_{ax}\, \frac{p_x^{(1)}-p_x}{p_x(c)}
1381: \end{align*}\EM
1382:
1383: -- In the $\O=\{0,1\}^2$ two gene case we have
1384: \BM\begin{align*}
1385: & |a|\le 1 \To \eta_a(c) = \eta_a \\
1386: &\eta_{12}(c) = (1-c)\, \eta_{12} + c\, \eta_1\, \eta_2
1387: \end{align*}\EM
1388: with crossover probability $c$. Further
1389: \BM\begin{align*}
1390: \dot\eta_{12}
1391: &= -\eta_{12} + \eta_1\, \eta_2 \\
1392: \t_{12}
1393: &= \sum_x B_{(12)x} \log \sum_a B^T_{xa} \eta_a \\
1394: \dot \t_{12}
1395: &= \sum_x B_{(12)x} \frac{1}{p_x}[\sum_a B^T_{xa} \dot\eta_a]
1396: = \sum_x B_{(12)x} \frac{B^T_{x(12)} \dot\eta_{12}}{p_x}
1397: = \dot\eta_{12} \sum_x \frac{B_{(12)x} B^T_{x(12)} }{p_x}
1398: = \dot\eta_{12} \sum_x \frac{1}{p_x}
1399: \end{align*}\EM
1400: where the $\dot{}$ means the derivative $\del_c$ at $c=0$. Since, on
1401: the path $c\in [0,1]$, $\dot\eta_{12}$ is constant and does not change
1402: the sign, $\t_{12}$ monotonously approaches zero.
1403: }
1404:
1405: \begin{figure}\center
1406: \input{figtexs/cross}
1407: \caption{\label{figCross}
1408: Crossover is an operator that takes a step along the projection of
1409: $p$ towards the first order reduction $p^{(1)}$.}
1410: \end{figure}
1411:
1412: \paragraph{Max Entropy}
1413:
1414: \citeN{wright-et-al:04} recently proposed an evolutionary search
1415: scheme that constructs the new search distribution (offspring
1416: population) via a maximum entropy principle: From the parent
1417: population all second order scheme frequencies are calculated. Then,
1418: from all the distributions which have the same second order schema
1419: frequencies, the new offspring distribution is the one with maximum
1420: entropy.
1421:
1422: In our formalism, constraining the schema frequencies corresponds to
1423: fixing $\vec \eta_2$, i.e., constraining the offspring distribution to
1424: the submanifold $M_2(\vec \eta_2)$. The distribution with maximal
1425: entropy in $M_2(\vec \eta_2)$ must have minimal higher order
1426: (3rd-order or higher) interactions $\vec\t_{2^*}$ since interactions
1427: (mutual information) reduce entropy. Thus, the max entropy rule simply
1428: amounts to setting $\vec\t_{2^*}=0$, i.e., choosing
1429: $p^{(2)}=(\vec\eta_2,0)$ as the new offspring distribution.
1430:
1431: Again, this can be viewed geometrically as the orthogonal projection
1432: of the parent population $p$ onto $E_{2^*}(0)$ according to
1433: \BM\begin{align*}
1434: \argmin{q \in E_{2^*}(0)} \kld{p}{q}
1435: \end{align*}\EM
1436: or as the orthogonal projection of the uniform distribution $p^{(0)}$
1437: onto $M_2(\vec\eta_2)$
1438: \BM\begin{align*}
1439: \argmin{q \in M_2(\vec\eta_2)} \kld{q}{p^{(0)}} ~.
1440: \end{align*}\EM
1441: This latter way of writing the max entropy principle is quite
1442: intuitive: find the distribution that fulfills the required constraints
1443: (lies on $M_2(\vec\eta_2)$) but is closest to the uniform distribution
1444: $p^{(0)}$.
1445:
1446: Eventually, note the strong analogy of the maximum entropy principle
1447: proposed by \cite{wright-et-al:04} and the simple crossover operator
1448: given before: Crossover moves $p$ toward $p^{(1)}$, while the search
1449: heuristic considered by Wright et.\ al.\ chooses $p^{(2)}$ as the new
1450: search distribution.
1451:
1452:
1453:
1454: \section{Discussion}
1455:
1456: The methods information geometry provides to analyze and describe the
1457: structure of distributions are deeply grounded in information
1458: theory. For instance, it seems very beneficial to have coordinate
1459: systems for distributions which capture precisely arbitrary $k$th
1460: order interactions between variables and have a direct link to
1461: measures like mutual information and the Kullback-Leibler
1462: divergence. Also the geometric aspects, e.g., that some operations
1463: can be described as orthogonal to certain submanifolds, add to a more
1464: comprehensive picture of the space of distributions. In that sense,
1465: information geometric methods enhance more common approaches in
1466: Evolutionary Computation, like the Walsh bases, in describing the
1467: structure of distributions and operators.
1468:
1469: However, the question remains how and whether these methods can be
1470: used to (1) actually propose new heuristic search algorithms or (2) to
1471: provide new theoretical tools to analyze the dynamics of evolutionary
1472: processes.
1473:
1474:
1475: \subsection*{Acknowledgment}
1476:
1477: I would like to thank the German Research Foundation (DFG) for their
1478: funding of the Emmy Noether fellowship TO 409/1-1.
1479:
1480: \footer\small
1481: \bibliography{/cygdrive/c/home/tex/bibs}
1482: \end{document}
1483: