0408:nlin0408040/z.tex

1: %------------------------------------------------------------------------------

2: % standard

3: \newcommand{\mytex}{}

4: \newcommand{\stdpackages}{

5:   \usepackage{amsmath}

6:   \usepackage{amssymb}

7:   \usepackage{amsfonts}

8:   \allowdisplaybreaks

9:   \usepackage{amsthm}

10:   \usepackage{eucal}

11:   %\usepackage{\mytex ntheorem}

12:   %\usepackage{\mytex calrsfs}

13:   %\usepackage{\mytex calligra}

14:   \usepackage{graphicx}

15:   \usepackage{color}

16:   %\usepackage{psfrag}

17:   \usepackage{multicol}

18:   \usepackage{fancyhdr}

19:   \renewcommand{\headrulewidth}{.0pt}\renewcommand{\footrulewidth}{.0pt}\cfoot{}

20:   %\setlength{\headsep}{10mm}

21:   \fancyhead[OL]{\it\theauthor---\today}

22:   %\fancyhead[OL]{\rightmark}

23:   \fancyhead[ER]{\leftmark}

24:   \fancyhead[OR,EL]{\thepage}

25:   \fancyfoot[EL,OR]{}

26:

27:   \newcommand{\draft}{\usepackage[light,first]{draftcopy}\draftcopyName{draft}{350}}

28:   \newcommand{\labels}{\usepackage{\mytex showlabels}}

29:   \newcommand{\maple}{\usepackage{maple2e}}

30:   \newcommand{\makeidx}{\usepackage{makeidx}\makeindex}

31:   \newcommand{\chicago}{\usepackage{\mytex chicago}\bibliographystyle{\mytex chicago}

32:     \renewcommand{\refname}{References\thispagestyle{empty}\renewcommand{\refname}{}}}

33:   \newcommand{\numberlines}{

34:     \usepackage[mathlines,modulo]{\mytex lineno} %options: pagewise, modulo, mathlines

35:     \newcommand{\BM}{\begin{linenomath}}

36:     \newcommand{\EM}{\end{linenomath}}

37:     \linenumbers

38:     \modulolinenumbers[5]

39:   }%\newcommand{\BM}{}\newcommand{\EM}{}

40:   \newcommand{\pdflatex}{

41:     \definecolor{bluecol}{rgb}{0,0,.5}

42:     \definecolor{greencol}{rgb}{0,.6,0}

43:     \usepackage[

44:     pdftex,

45: %    letterpaper,

46:     bookmarks,

47:     bookmarksnumbered,

48:     colorlinks,

49:     urlcolor=bluecol,

50:     citecolor=bluecol,

51:     linkcolor=bluecol,

52:     pagecolor=bluecol,

53:     pdfborder={0 0 0},

54: %    backref,     %link from bibliography back to sections

55: %    pagebackref, %link from bibliography back to pages

56: %    pdfstartview=FitH, %fitwidth instead of fit window

57:     pdfpagemode=None, %UseOutlines, %bookmarks are displayed by acrobat

58: %    pdftitle={\thetitle},

59:     pdfauthor={Marc Toussaint}

60:     ]{hyperref}

61:     \DeclareGraphicsExtensions{.jpg,.pdf}

62:     \renewcommand{\r}{\varrho}

63:     \renewcommand{\l}{\lambda}

64:     \renewcommand{\L}{\Lambda}

65:     \renewcommand{\s}{\sigma}

66:     \renewcommand{\O}{\Omega}

67:     \renewcommand{\SS}{{\cal S}}

68:     \renewcommand{\boldsymbol}{}

69:     %\renewcommand{\Chapter}{\chapter}

70:     %\renewcommand{\Subsection}{\subsection}

71:   }

72: }

73: \newcommand{\stdtheorems}{

74:   \theoremstyle{plain}

75:   \newtheorem{theorem}{Theorem}[section]

76:   \newtheorem{lemma}[theorem]{Lemma}

77:   \newtheorem{corollary}[theorem]{Corollary}

78:   \newtheorem{proposition}{Proposition}[section]

79:   \newtheorem{result}{Result}[section]

80:   \newtheorem{hypothesis}{Hypothesis}[section]

81:   \theoremstyle{definition}

82:   \newtheorem{definition}{Definition}[section]

83:   \theoremstyle{remark}

84:   \newtheorem{remark}{Remark}[section]

85:   \newtheorem{example}{Example}[section]

86: }

87: \newcommand{\stdstyle}[1]{

88:   \stdpackages

89:   \stdtheorems

90:   \renewcommand{\labelenumi}{\textbf{(\roman{enumi})}}

91:   \renewcommand{\theenumi}{(\roman{enumi})} %for ref

92:   %\renewcommand{\labelenumi}{${}^{\bf (\roman{enumi})}$}

93:   %\renewcommand{\labelitemi}{\bf $\cdot$}

94:   \newcommand{\itemdot}{\renewcommand{\labelitemi}{\bf $\cdot$}}

95:   \newcommand{\enumA}{\renewcommand{\labelenumi}{\textbf{\Alph{enumi}}}}

96:   \newcommand{\blockindent}{3ex}

97:   \renewcommand{\baselinestretch}{#1}

98:   \renewcommand{\arraystretch}{1.2}

99:   %\renewcommand{\textfloatsep}{3ex}

100:   %\setlength{\mathindent}{2.5em}

101:   %\setlength{\jot}{0pt} %zwischen den math zeilen

102:   %\setlength{\abovedisplayskip}{-10pt}

103:   %\setlength{\belowdisplayskip}{-10pt}

104:   %\setlength{\mathsurround}{-10pt}

105:   %\renewcommand{\floatsep}{-1ex}

106:   \renewcommand{\topfraction}{1}

107:   \renewcommand{\bottomfraction}{1}

108:   \renewcommand{\textfraction}{0}

109:   \columnsep 5ex

110:   \parindent 3ex

111:   \parskip 1ex

112:

113:   % Lists and paragraphs

114:   \parindent 0pt

115:   \topsep 4pt plus 1pt minus 2pt

116:   \partopsep 1pt plus 0.5pt minus 0.5pt

117:   \itemsep 2pt plus 1pt minus 0.5pt

118:   \parsep 2pt plus 1pt minus 0.5pt

119:   \parskip .5pc

120:

121:   \setcounter{tocdepth}{3}

122:   \setcounter{secnumdepth}{3}

123:

124:   \usepackage{\mytex geometry}

125:   \geometry{a4paper,hdivide={35mm,*,35mm},vdivide={35mm,*,35mm}}

126:

127:   %\usepackage{layout}\layout

128:

129:   %\thispagestyle{fancy}

130:   %\pagestyle{fancy}

131:

132:   \renewenvironment{abstract}

133:     {\vspace*{5ex}\begin{rblock}\hrule\vspace{2ex}{\bf Abstract.~}\small}

134:     {\vspace{3ex}\hrule\end{rblock}\vspace{5ex}}

135:   \usepackage{\mytex mt}

136: }

137: \newcommand{\cleardefs}{

138:   \renewcommand{\article}[2]{}

139:   \renewcommand{\book}[2]{}

140:   \renewcommand{\draft}{}

141:   \renewcommand{\labels}{}

142:   \renewcommand{\maple}{}

143:   \renewcommand{\makeidx}{}

144:   \renewcommand{\chicago}{}

145:   \renewcommand{\pdflatex}{}

146:   \renewcommand{\header}{}

147: }

148:

149: % A0  1189 x 841 mm   1,000 qm

150: % A1  841 x 594 mm    0,500 qm

151: % A2  594 x 420 mm    0,25O qm

152: % A3  420 x 297 mm    0,125 qm

153: % A4  297 x 210 mm    0,063 qm

154: % A5  210 x 148 mm    0,032 qm

155: % A6  148 x 105 mm    0,016 qm

156: % A7  105 x 74 mm     0,008 qm

157: % A8  74 x 52 mm      0,004 qm

158: % A9  37 x 52 mm      0,002 qm

159: % A10 26 x 37 mm      0,001 qm

160: % B0  1414 x 1000 mm  14.140 qcm

161: % B1  1000 x 707 mm   7.070 qcm

162: % B2  707 x 500 mm    3.535 qcm

163: % B3  500 x 353 mm    1.765 qcm

164: % B4  353 x 250 mm    882 qcm

165: % B5  250 x 176 mm    440 qcm

166: % B6  176 x 125 mm    220 qcm

167: % C0  1297 x 917 mm   11.894 qcm

168: % C1  917 x 648 mm    5.942 qcm

169: % C2  648 x 458 mm    2.968 qcm

170: % C3  458 x 324 mm    1.484 qcm

171: % C4  324 x 229 mm    742 qcm

172: % C5  229 x 162 mm    371 qcm

173: % C6  162 x 115 mm    186 qcm

174: % C7  115 x 81 mm     93 qcm

175:

176:

177: %------------------------------------------------------------------------------

178: % classes

179:

180: \newcommand{\article}[2]{

181:   \documentclass[#1pt,twoside,fleqn]{article}

182:   \stdstyle{#2}

183:   \macros

184:   \newcommand{\mytitle}{

185:     \thispagestyle{empty}

186:     \mbox{~}

187:     \begin{list}{}{\leftmargin6ex \rightmargin6ex \topsep0ex \parsep3ex}\item[]

188:       \begin{center}

189:         {\LARGE\bf \thetitle \\}

190:

191:         \vspace{5ex}

192:         {\large \theauthor}

193:

194:         {\footnotesize{\sl \address}\\ \email}

195:

196:         {\footnotesize \today}

197:

198:         \vspace{1ex}

199:         {\small \published}

200:       \end{center}

201:     \end{list}

202:     \renewcommand{\mytitle}{\chapter{\thetitle}}

203:   }

204: }

205: \newcommand{\nips}{

206:   \documentclass{article}

207:   \usepackage{\mytex nips2003e,times}

208:   \stdpackages\macros

209: }

210: \newcommand{\ijcnn}{

211:   \documentclass[10pt,twocolumn]{\mytex ijcnn}

212:   %\documentclass[10pt,twocolumn]{article}\usepackage{\mytex wcci}

213:   \stdpackages\macros

214:   \bibliographystyle{abbrv}

215: }

216: \newcommand{\springer}{

217:   \documentclass{\mytex springer_llncs}

218:   \renewcommand{\theenumi}{\alph{enumi}}

219:   \renewcommand{\labelenumi}{(\alph{enumi})}

220:   \renewcommand{\labelitemi}{$\bullet$}

221:   \stdpackages\macros

222: }

223: \newcommand{\foga}{

224:   \documentclass{article}

225:   \stdpackages\macros

226:   \usepackage{\mytex foga-02}

227:   \usepackage{\mytex chicago}

228:   \bibliographystyle{\mytex foga-chicago}

229: }

230: \newcommand{\book}[2]{

231:   \documentclass[#1pt,twoside,fleqn]{book}

232:   \newenvironment{abstract}{\begin{rblock}{\bf Abstract.~}\small}{\end{rblock}}

233:   \stdstyle{#2}

234:   %\renewcommand{\thechapter}{\Roman{chapter}}

235:   \newcommand{\mytitle}{

236:     \thispagestyle{empty}

237:     \mbox{~}

238:     \begin{list}{}{\leftmargin4ex \rightmargin4ex \topsep10ex \parsep3ex}\item[]

239:       \begin{center}

240:         {\LARGE \thetitle \\}

241:

242:         \vspace{8ex}

243:         {\large \theauthor}

244:

245:         {\footnotesize{\sl \address}\\ \email}

246:

247:         {\footnotesize \today}

248:

249:         \vspace{1ex}

250:         {\small \published}

251:       \end{center}

252:     \end{list}

253:     \renewcommand{\mytitle}{\chapter{\thetitle}}

254:   }

255:   \macros

256: }

257:

258: \newcommand{\slides}{

259:   \documentclass[fleqn]{article}

260:   \stdpackages

261:   \stdtheorems

262:   \renewcommand{\baselinestretch}{1}

263:   \renewcommand{\arraystretch}{1.2}

264:

265:   \usepackage{\mytex geometry}

266:   \geometry{

267:     a4paper,landscape,

268:     headheight=30mm,

269:     headsep=0mm,

270:     footskip=5mm,

271:     hdivide={10mm,*,10mm},vdivide={30mm,*,8mm}}

272:

273:   \columnsep 0mm

274:   \columnseprule 0pt

275:   \parindent 0ex

276:   \parskip 0ex

277:   \setlength{\itemsep}{8ex}

278:   \renewcommand{\labelitemi}{\rule[.4ex]{.6ex}{.6ex}~}

279:

280:   \pagestyle{fancy}

281:   \renewcommand{\headrulewidth}{0pt} %1pt}

282:   \renewcommand{\footrulewidth}{0pt}

283:   \renewcommand{\labelenumi}{\textbf{\arabic{enumi}.}~~}

284:   \newcommand{\theauthor}{Marc Toussaint}

285:   \rhead{}

286:   \lhead{}

287:   \rfoot{\thepage}

288:

289:   \definecolor{grey}{rgb}{.9,.9,.9}

290:   \newcommand{\inverted}{

291:     \definecolor{main}{rgb}{1,1,1}

292:     \color{main}

293:     \pagecolor[rgb]{.3,.3,.3}

294:   }

295:

296:   \macros

297:

298:   \newcommand{\mytitle}{\huge\sf}

299: }

300:

301: \newenvironment{titleslide}[2][30mm]{

302:   \onecolumn

303:   \lhead{{{\Huge\textsf{\quad#2}}\\}}

304:   \begin{center}\begin{list}{\labelitemi}{\leftmargin#1 \rightmargin#1

305:       \labelsep1ex \labelwidth3ex \topsep0pt}\item[]

306:     ~\vfill

307:     \begin{center}

308:       {\Huge\sc \thetitle}\\[3ex]

309:       \theauthor\\{\Large\ini}\\{\Large\email}

310:     \end{center}

311:     ~\vfill

312: }{

313:     ~\vfill

314:   \end{list}\end{center}

315: }

316:

317: \newenvironment{slide}[2][30mm]{

318:   \onecolumn

319:   \lhead{{{\Huge\textsf{#2}}\\}}

320:   %\setlength{\unitlength}{1mm}

321:   %\begin{picture}(0,0)(20,-34)

322:   %\put(0,-35){\color{grey}\rule{296mm}{30mm}}

323:   %\put(0,-214){\color{grey}\rule{296mm}{10mm}}

324:   %\end{picture}

325:   \begin{center}\begin{list}{\labelitemi}{\leftmargin#1 \rightmargin#1

326:       \labelsep1ex \labelwidth3ex \topsep0pt}\huge\sf\item[]%\vfill

327: }{

328:   %\vfill

329:   \end{list}\end{center}

330: }

331:

332: \newenvironment{slidetwo}[2][15mm]{

333:   \twocolumn

334:   \lhead{{{\Huge\textsf{#2}}\\}}

335:   \begin{center}\begin{list}{\labelitemi}{\leftmargin#1 \rightmargin#1

336:       \labelsep1ex \labelwidth3ex \topsep0pt}\huge\sf\item[]%\vfill

337: }{

338:   %\vfill

339:   \end{list}\end{center}

340: }

341: %\newcommand{\slidebreak}{\vfill\pagebreak\item[]\vfill}

342: \newcommand{\slidebreak}{\pagebreak\item[]}

343:

344: \newcommand{\poster}{

345:   \documentclass[fleqn]{article}

346:   \stdpackages

347:   \renewcommand{\baselinestretch}{1}

348:   \renewcommand{\arraystretch}{1.8}

349:

350:   \usepackage{\mytex geometry}

351:   \geometry{

352:     paperwidth=1189mm,

353:     paperheight=841mm, %841mm, %91.3cm, % 120cm

354: %    landscape,

355:     headheight=0mm,

356:     headsep=0mm,

357:     footskip=0mm,

358:     hdivide={5mm,*,5mm},vdivide={5mm,*,5mm}}

359:

360: %\textwidth  86.3cm     %  Paper=91.3cm

361: %\textheight  108cm     %  Paper=???,  banner=5cm

362: %\oddsidemargin  0pt

363: %\parindent  0pt

364: %\parskip  0pt

365: %\topmargin  1cm

366: %\footskip  0pt

367: %\headheight  0pt

368: %\headsep  0pt

369:

370:   \setlength{\columnsep}{0ex}

371:   \columnseprule 3pt

372:   \renewcommand{\labelitemi}{\rule[.4ex]{.6ex}{.6ex}~}

373:

374:   \pagestyle{fancy}

375:   \renewcommand{\headrulewidth}{0pt}

376:   \renewcommand{\footrulewidth}{0pt}

377:   \renewcommand{\labelenumi}{\textbf{(\roman{enumi})}}

378:   \newcommand{\theauthor}{Marc Toussaint}

379:   \rhead{}

380:   \lhead{}

381:   \rfoot{}

382:

383:   \definecolor{grey}{rgb}{.9,.9,.9}

384:   \newcommand{\inverted}{

385:     \definecolor{main}{rgb}{1,1,1}

386:     \color{main}

387:     \pagecolor[rgb]{.3,.3,.3}

388:   }

389:

390:   \macros

391: }

392: \newenvironment{postersection}[1]{

393: \vspace{1cm}

394: \section{#1}

395: \begin{list}{\labelitemi}{\leftmargin4ex \rightmargin3ex

396:       \labelsep1ex \labelwidth2ex \topsep0pt \parsep2ex}\item[]

397: }{

398: \end{list}

399: }

400:

401:

402: %------------------------------------------------------------------------------

403: % title page

404:

405: \author{Marc Toussaint}

406:

407: \newcommand{\inilogo}[1][.25]{\includegraphics[scale=#1]{\mytex INI}}

408: \newcommand{\rublogo}[1][.25]{\includegraphics[scale=#1]{\mytex RUB}}

409:

410: \newcommand{\addressCologne}{

411:   Institute for Theoretical Physics\\

412:   University of Cologne\\

413:   50923 K\"oln---Germany\\

414:   {\tt mt@thp.uni-koeln.de}\\

415:   {\tt www.thp.uni-koeln.de/\~{}mt/}

416: }

417:

418: \newcommand{\ini}{Institut f\"ur Neuroinformatik, Ruhr-Universit\"at Bochum, Germany}

419: \newcommand{\homepageINI}{\texttt{www.neuroinformatik.rub.de/PEOPLE/mt/}}

420: \newcommand{\emailINI}{mt@neuroinformatik.ruhr-uni-bochum.de}

421: \newcommand{\phoneINI}{+49-234-32-27974}

422: \newcommand{\faxINI}{+49-234-32-14209}

423: \newcommand{\phone}{+44 131 650 3089}

424: \newcommand{\fax}{+44 131 650 6899}

425: \newcommand{\email}{mtoussai@inf.ed.ac.uk}

426: \newcommand{\homepage}{homepages.inf.ed.ac.uk/mtoussai}

427:

428: \newcommand{\addressINI}{

429:   Institut~f\"ur~Neuroinformatik,

430:   Ruhr-Universit\"at~Bochum, ND~04,

431:   44780~Bochum---Germany

432: }

433: \newcommand{\AddressINI}{

434:   Institut~f\"ur~Neuroinformatik\\

435:   Ruhr-Universit\"at Bochum, ND~04\\

436:   44780~Bochum---Germany

437: }

438: \newcommand{\address}{

439:   Institute~for~Adaptive~and~Neural~Computation,

440:   University~of~Edinburgh, 5~Forrest~Hill,

441:   Edinburgh~EH1~2QL, Scotland,~UK

442: }

443: \newcommand{\Address}{

444:   Institute~for~Adaptive~and~Neural~Computation\\

445:   University~of~Edinburgh, 5~Forrest~Hill\\

446:   Edinburgh~EH1~2QL, Scotland,~UK

447: }

448:

449: \newcommand{\published}{}

450:

451: %------------------------------------------------------------------------------

452: % environments / commands

453:

454: \newlength{\subsecwidth}

455:

456: \newcommand{\subsec}[1]{

457:   \addtocontents{toc}{

458:     \protect\setlength{\subsecwidth}{\textwidth}\protect\addtolength{\subsecwidth}{-27ex}

459:       \protect\vspace*{-1.5ex}\protect\hspace*{20ex}

460:       \protect\begin{minipage}[t]{\subsecwidth}\protect\footnotesize\protect\textsf{#1}\protect\end{minipage}

461:       \protect\par

462:   }

463:   \begin{rblock}\it #1\end{rblock}\medskip\noindent

464: }

465: \newcommand{\tocsep}{

466:   \addtocontents{toc}{\protect\bigskip}

467: }

468: \newcommand{\Chapter}[1]{

469: \chapter*{#1}\thispagestyle{empty}

470: \addcontentsline{toc}{chapter}{\protect\numberline{}#1}

471: }

472: \newcommand{\Section}[1]{

473:   \section*{#1}

474:   \addcontentsline{toc}{section}{\protect\numberline{}#1}

475: }

476: \newcommand{\Subsection}[1]{

477:   \subsection*{#1}

478:   \addcontentsline{toc}{subsection}{\protect\numberline{}#1}

479: }

480:

481: \newcommand{\content}[1]{

482: %  \begin{rblock}\it #1\end{rblock}\medskip

483: %  \addtocontents{toc}{\protect\begin{list}{}{\leftmargin9ex

484: %        \rightmargin9ex \topsep-2ex \parsep.5ex}}

485: %  \addtocontents{toc}{\protect\item[] \protect\small\protect\it #1}

486: %  \addtocontents{toc}{\protect\end{list}\protect\medskip}

487: }

488:

489: \newcommand{\sepline}[1][200]{

490:   \begin{center} \begin{picture}(#1,0)

491:     \line(1,0){#1}

492:   \end{picture}\end{center}

493: }

494:

495: \newcommand{\sepstar}{

496:   \begin{center} {\vspace{0.5ex}\rule[1.2ex]{5ex}{.1pt}~*~\rule[1.2ex]{5ex}{.1pt}} \end{center}\vspace{-1.5ex}\noindent

497: }

498:

499: \newcommand{\partsection}[1]{

500:   \vspace{5ex}

501:   \centerline{\sc\LARGE #1}

502:   \addtocontents{toc}{\contentsline{section}{{\sc #1}}{}}

503: }

504:

505: \newcommand{\intro}[1]{\textbf{#1}\index{#1}}

506:

507:

508: \newcounter{parac}

509: \newcommand{\para}{\noindent\refstepcounter{parac}{\bf [{\roman{parac}}]}~~}

510: \newcommand{\Pref}[1]{[\emph{\ref{#1}}\,]}

511:

512:

513:

514: \newenvironment{items}{

515: \begin{list}{}{\leftmargin1ex \topsep-\parskip}

516: \item[]

517: }{

518: \end{list}

519: }

520:

521: \newenvironment{block}[1][]{{\noindent\bf #1}

522: \begin{list}{}{\leftmargin\blockindent \topsep-\parskip}

523: \item[]

524: }{

525: \end{list}

526: }

527:

528: \newenvironment{rblock}{

529: \begin{list}{}{\leftmargin\blockindent \rightmargin\blockindent \topsep-\parskip}\item[]}{\end{list}}

530:

531: \newenvironment{algorithm}{

532: \begin{list}{\raisebox{.3ex}{\footnotesize\bf\arabic{enumi}.}}

533: {\usecounter{enumi} \leftmargin7ex \rightmargin7ex \labelsep1ex

534:   \labelwidth5ex \topsep1ex \parsep.5ex \itemsep0pt} \small\sf

535: }{

536: \end{list}

537: }

538:

539: %\newenvironment{keywords}{\paragraph{Keywords}\begin{rblock}\small}{\end{rblock}}

540:

541: \newenvironment{colpage}{

542: \addtolength{\columnwidth}{-3ex}

543: \begin{minipage}{\columnwidth}

544: \vspace{.5ex}

545: }{

546: \vspace{.5ex}

547: \end{minipage}

548: }

549:

550: \newenvironment{enum}{

551: \begin{list}{}{\leftmargin3ex \topsep0ex \itemsep0ex}

552: \item[\labelenumi]

553: }{

554: \end{list}

555: }

556:

557: \newenvironment{cramp}{

558: \begin{quote} \begin{picture}(0,0)

559:         \put(-5,0){\line(1,0){20}}

560:         \put(-5,0){\line(0,-1){20}}

561: \end{picture}

562: }{

563: \begin{picture}(0,0)

564:         \put(-5,5){\line(1,0){20}}

565:         \put(-5,5){\line(0,1){20}}

566: \end{picture} \end{quote}

567: }

568:

569: %------------------------------------------------------------------------------

570: % symbol & operator macros

571:

572: \newcommand{\macros}{

573:   \newcommand{\0}{{\hat 0}}

574:   \newcommand{\1}{{\hat 1}}

575:   \newcommand{\2}{{\hat 2}}

576:   \newcommand{\3}{{\hat 3}}

577:   \newcommand{\5}{{\hat 5}}

578:

579:   \renewcommand{\a}{\ensuremath\alpha}

580:   \renewcommand{\b}{\beta}

581:   \renewcommand{\c}{\gamma}

582:   \renewcommand{\d}{\delta}

583:     \newcommand{\D}{\Delta}

584:     \newcommand{\e}{\epsilon}

585:     \newcommand{\g}{\gamma}

586:     \newcommand{\G}{\Gamma}

587:   \renewcommand{\l}{\lambda}

588:   \renewcommand{\L}{\Lambda}

589:     \newcommand{\m}{\mu}

590:     \newcommand{\n}{\nu}

591:     \newcommand{\N}{\nabla}

592:   \renewcommand{\k}{\kappa}

593:   \renewcommand{\o}{\omega}

594:   \renewcommand{\O}{\Omega}

595:     \newcommand{\p}{\phi}

596:     \newcommand{\ph}{\varphi}

597:   \renewcommand{\P}{\Phi}

598:   \renewcommand{\r}{\varrho}

599:     \newcommand{\s}{\sigma}

600:     \newcommand{\Si}{\Sigma}

601:   \renewcommand{\t}{\theta}

602:     \newcommand{\T}{\Theta}

603:   \renewcommand{\v}{\vartheta}

604:     \newcommand{\x}{\xi}

605:     \newcommand{\X}{\Xi}

606:     \newcommand{\Y}{\Upsilon}

607:

608:   \renewcommand{\AA}{{\cal A}}

609:     \newcommand{\BB}{{\cal B}}

610:     \newcommand{\CC}{{\cal C}}

611:     \newcommand{\EE}{{\cal E}}

612:     \newcommand{\FF}{{\cal F}}

613:     \newcommand{\GG}{{\cal G}}

614:     \newcommand{\HH}{{\cal H}}

615:     \newcommand{\II}{{\cal I}}

616:     \newcommand{\KK}{{\cal K}}

617:     \newcommand{\LL}{{\cal L}}

618:     \newcommand{\MM}{{\cal M}}

619:     \newcommand{\NN}{{\cal N}}

620:     \newcommand{\OO}{{\cal O}}

621:     \newcommand{\PP}{{\cal P}}

622:     \newcommand{\QQ}{{\cal Q}}

623:     \newcommand{\RR}{{\cal R}}

624:   \renewcommand{\SS}{{\cal S}}

625:     \newcommand{\TT}{{\cal T}}

626:     \newcommand{\uu}{{\cal u}}

627:     \newcommand{\UU}{{\cal U}}

628:     \newcommand{\XX}{{\cal X}}

629:     \newcommand{\YY}{{\cal Y}}

630:     \newcommand{\SOSO}{{\cal SO}}

631:     \newcommand{\GLGL}{{\cal GL}}

632:

633:     \newcommand{\Ee}{{\rm E}}

634:

635:   \newcommand{\NNN}{{\mathbb{N}}}

636:   \newcommand{\ZZZ}{{\mathbb{Z}}}

637:   %\newcommand{\RRR}{{\mathrm{I\!R}}}

638:   \newcommand{\RRR}{{\mathbb{R}}}

639:   \newcommand{\CCC}{{\mathbb{C}}}

640:   \newcommand{\one}{{{\bf 1}}}

641:   \newcommand{\eee}{\text{e}}

642:

643:   \renewcommand{\[}{\Big[}

644:   \renewcommand{\]}{\Big]}

645:   \renewcommand{\(}{\Big(}

646:   \renewcommand{\)}{\Big)}

647:   \renewcommand{\|}{\big|}

648:   \newcommand{\<}{{\ensuremath\langle}}

649:   \renewcommand{\>}{{\ensuremath\rangle}}

650:

651:   \newcommand{\Prob}{{\rm Prob}}

652:   \newcommand{\Aut}{{\rm Aut}}

653:   \newcommand{\cor}{{\rm cor}}

654:   \newcommand{\corr}{{\rm corr}}

655:   \newcommand{\cov}{{\rm cov}}

656:   \newcommand{\sd}{{\rm sd}}

657:   \newcommand{\tr}{{\rm tr}}

658:   \newcommand{\Tr}{{\rm Tr}}

659:   \newcommand{\id}{{\rm id}}

660:   \newcommand{\Gl}{{\rm Gl}}

661:   \newcommand{\lag}{\mathcal{L}}

662:   \newcommand{\inn}{\rfloor}

663:   \newcommand{\lie}{\pounds}

664:   \newcommand{\longto}{\longrightarrow}

665:   \newcommand{\speer}{\parbox{0.4ex}{\raisebox{0.8ex}{$\nearrow$}}}

666:   \renewcommand{\dag}{ {}^\dagger }

667:   \newcommand{\h}{{}^\star}

668:   \newcommand{\w}{\wedge}

669:   \newcommand{\too}{\longrightarrow}

670:   \newcommand{\To}{\Rightarrow}

671:   \newcommand{\Too}{\;\Longrightarrow\;}

672:   \newcommand{\oto}{\leftrightarrow}

673:   \newcommand{\ow}{\stackrel{\circ}\wedge}

674:   \newcommand{\feed}{\nonumber \\}

675:   \newcommand{\comma}{~,\quad}

676:   \newcommand{\period}{~.\quad}

677:   \newcommand{\del}{\partial}

678: %  \newcommand{\quabla}{\Delta}

679:   \newcommand{\point}{$\bullet~~$}

680:   \newcommand{\doubletilde}{

681:   ~ \raisebox{0.3ex}{$\widetilde {}$} \raisebox{0.6ex}{$\widetilde {}$} \!\!

682:   }

683:   \newcommand{\topcirc}{\parbox{0ex}{~\raisebox{2.5ex}{${}^\circ$}}}

684:   \newcommand{\topdot} {\parbox{0ex}{~\raisebox{2.5ex}{$\cdot$}}}

685:   \newcommand{\topddot} {\parbox{0ex}{~\raisebox{1.3ex}{$\ddot{~}$}}}

686:   \newcommand{\sym}{\topcirc}

687:

688:   \newcommand{\half}{\frac{1}{2}}

689:   \newcommand{\third}{\frac{1}{3}}

690:   \newcommand{\fourth}{\frac{1}{4}}

691:

692:   \newcommand{\ubar}{\underline}

693:

694:   %\renewcommand{\vec}{\underline}

695:   \renewcommand{\vec}{\boldsymbol}

696:   \renewcommand{\_}{\underset}

697:   \renewcommand{\^}{\overset}

698:   %\renewcommand{\*}{{\rm\raisebox{-.6ex}{\text{*}}{}}}

699:   \renewcommand{\*}{\text{\footnotesize\raisebox{-.4ex}{*}{}}}

700:

701:   \newcommand{\gto}{{\raisebox{.5ex}{${}_\rightarrow$}}}

702:   \newcommand{\gfrom}{{\raisebox{.5ex}{${}_\leftarrow$}}}

703:   \newcommand{\gnto}{{\raisebox{.5ex}{${}_\nrightarrow$}}}

704:   \newcommand{\gnfrom}{{\raisebox{.5ex}{${}_\nleftarrow$}}}

705:

706:   \newcommand{\RND}{{\SS}}

707:   \newcommand{\IF}{\text{if }}

708:   \newcommand{\AND}{\textsc{and }}

709:   \newcommand{\OR}{\textsc{or }}

710:   \newcommand{\XOR}{\textsc{xor }}

711:   \newcommand{\NOT}{\textsc{not }}

712: }

713:

714: %\newcommand{\argmax}[1]{{\rm arg}\!\max_{#1}}

715: %\newcommand{\argmin}[1]{{\rm arg}\!\min_{#1}}

716: \newcommand{\argmax}[1]{\underset{~#1}{\rm argmax}\;}

717: \newcommand{\argmin}[1]{\underset{~#1}{\rm argmin}\;}

718: \newcommand{\ee}[1]{\ensuremath{\cdot10^{#1}}}

719: \newcommand{\sub}[1]{\ensuremath{_{\text{#1}}}}

720: \newcommand{\up}[1]{\ensuremath{^{\text{#1}}}}

721: %\newcommand{\kld}[2]{D\big(#1\,\big|\!\big|\,#2\big)}

722: \newcommand{\kld}[2]{D\big(#1:#2\big)}

723: \newcommand{\sprod}[2]{\big<#1\,,\,#2\big>}

724: \newcommand{\End}{\text{End}}

725: \newcommand{\txt}[1]{\quad\text{#1}\quad}

726: \newcommand{\Over}[2]{\genfrac{}{}{0pt}{0}{#1}{#2}}

727: \newcommand{\mat}[1]{{\bf #1}}

728: \newcommand{\arr}[2]{\hspace*{-1ex}\begin{array}{#1}#2\end{array}\hspace*{-1ex}}

729: \newcommand{\matr}[2]{\left(\begin{array}{#1}#2\end{array}\right)}

730: \newcommand{\seq}[1]{\textsf{\<#1\>}}

731: \newcommand{\seqq}[1]{\textsf{#1}}

732:

733: %------------------------------------------------------------------------------

734: % stuff

735:

736: \newcommand{\url}[1]{\texttt{#1}}

737: \newcommand{\anchor}[2]{\begin{picture}(0,0)\put(#1){#2}\end{picture}}

738: \newcommand{\pagebox}{\begin{picture}(0,0)\put(-3,-23){

739: \textcolor[rgb]{.5,1,.5}{\framebox[\textwidth]{\rule[-\textheight]{0pt}{0pt}}}}

740: \end{picture}}

741:

742: \newcommand{\pathmt}{./}

743: \newcommand{\basepath}{./}

744: \newcommand{\setpath}[1]{\renewcommand{\pathmt}{#1}\renewcommand{\basepath}{#1}}

745: \newcommand{\pathinput}[2]{

746:   \renewcommand{\pathmt}{\basepath #1}

747:   \input{\pathmt #2} \renewcommand{\pathmt}{\basepath}}

748:

749: \newcommand{\hide}[1]{$\ll${\sf{\footnotesize #1}}$\gg$\message{^^JHIDE--Warning!^^J}}

750: %\newcommand{\hide}[1]{{\tt[hide:~}{\footnotesize\sf #1}{\tt]}\message{^^JHIDE--Warning!^^J}}

751: \newcommand{\Hide}{\renewcommand{\hide}[1]{\message{^^JHIDE--Warning (hidden)!^^J}}}

752: \newcommand{\HIDE}{\renewcommand{\hide}[1]{}}

753: \newcommand{\todo}[1]{{\tt[TODO: #1]}\message{^^JTODO--Warning: #1^^J}}

754: \newcommand{\Todo}{\renewcommand{\todo}[1]{\message{^^JTODO--Warning (hidden)!^^J}}}

755: \newcommand{\thetitle}{bla}

756: %\renewcommand{\title}[1]{\renewcommand{\thetitle}{#1}}

757: \newcommand{\header}{\begin{document}\mytitle\cleardefs}

758: \newcommand{\contents}{{\tableofcontents}\renewcommand{\contents}{}}

759: \newcommand{\footer}{\small\bibliography{\mytex bibs}\end{document}}

760:

761: \article{10}{1}

762: \chicago

763: %\numberlines

764: %\usepackage{rotate}

765:  \Hide

766:

767: \newcommand{\EM}{}\newcommand{\BM}{}

768: \newcommand{\pl}{{\protect\rule[.3ex]{1ex}{1ex}}}

769: \newcommand{\mi}{\ensuremath{\circ}}

770: \newcommand{\ze}{\ensuremath{\cdot}}

771: \newcommand{\rb}[1]{\hspace{4pt}\raisebox{-.5ex}{\rotatebox{90}{#1}}\hspace{4pt}}

772:

773: \title{Notes on information geometry\\ and evolutionary processes}

774:

775: \header

776:

777: \begin{abstract}

778:   In order to analyze and extract different structural properties of

779:   distributions, one can introduce different coordinate systems over

780:   the manifold of distributions. In Evolutionary Computation, the

781:   Walsh bases and the Building Block Bases are often used to describe

782:   populations, which simplifies the analysis of evolutionary operators

783:   applying on populations. Quite independent from these approaches,

784:   information geometry has been developed as a geometric way to

785:   analyze different order dependencies between random variables (e.g.,

786:   neural activations or genes).

787:

788:   In these notes I briefly review the essentials of various coordinate

789:   bases and of information geometry. The goal is to give an overview

790:   and make the approaches comparable. Besides introducing meaningful

791:   coordinate bases, information geometry also offers an explicit way

792:   to distinguish different order interactions and it offers a

793:   geometric view on the manifold and thereby also on operators that

794:   apply on the manifold. For instance, uniform crossover can be

795:   interpreted as an orthogonal projection of a population along an

796:   $m$-geodesic, monotonously reducing the $\t$-coordinates that

797:   describe interactions between genes.

798: \end{abstract}

799:

800: \section{Introduction}

801:

802: Evolution can be understood as a process on the space $\L$ of

803: distributions over the search $\O$. Essentially, a parent population

804: can be captured as a (finite) distribution $p \in \L$. Mutation and

805: recombination operators ($\MM \CC$) applied on the parent population

806: specify a search (offspring) distribution $q \in\L$. And a (stochastic) selection

807: operator ($\SS^\m\, \FF\, \SS^\n$) maps $q$ to a new parent population

808: $p'$. In this view, evolution can be understood as a process

809: \BM\begin{align*}

810: p

811:  ~\stackrel{\MM \CC}\longmapsto~ q

812:  ~\stackrel{\SS^\m \FF \SS^\n}\longmapsto~ p'

813:  ~\stackrel{\MM \CC}\longmapsto~ q'

814:  ~\stackrel{\SS^\m \FF \SS^\n}\longmapsto~ p''

815:  ~\stackrel{\MM \CC}\longmapsto~ \cdots

816: \end{align*}\EM

817:

818: We do not need to go into the details of the indicated recombination,

819: mutation, and selection operators here. Instead, we would like to

820: emphasize an information theoretic point of view on this process.

821: Typically, the mapping $p \mapsto q$ (which one could also call search

822: heuristic) from the parent population to the search distribution adds

823: entropy whereas selection $q \mapsto p'$ reduces entropy. Another

824: interesting observable in this process is the \emph{structure} of the

825: distributions---by which we mean the mutual information present in

826: these distributions. For instance, one can show that ordinary mutation

827: and crossover operators (on a direct genetic representation) generally

828: reduce mutual information, i.e., destroy structural content that might

829: have been present in $p$ after selection \cite{toussaint:04-ecj}.

830:

831: The analysis of the structure of distributions is an important topic

832: in various areas. In evolutionary computation, the Walsh

833: spectrum is a prominent way to analyze the structure of $p$, often

834: with the aim to transport it to $q$. The Walsh coefficients may also

835: be considered as a way of describing epistasis. In complex systems,

836: certain mutual information measures are often used to define the

837: structuredness (in their terms: complexity) of dynamics systems

838: \cite{langton:90,sporns-tononi:02}.

839:

840: In these notes, I want to briefly review the information geometric way

841: to describe the structure of a distribution \cite{amari:99,amari:01}

842: and relate it to the field of evolutionary computation. The first step

843: is simply to present the coordinates introduced by Walsh coefficients

844: side-by-side with those used in information geometry to make them

845: comparable. This gives an intuition about the ``bases'' over which

846: distributions can be analyzed and reveals, for instance, that the

847: so-called Building-Block-Basis \cite{chryssomalakos-stephens:04}, as

848: introduced in Evolutionary Computation, is the same as Amari's

849: $\eta$-basis. Maybe Amari's $\t$-bases is most interesting in its

850: capabilities to precisely capture $k$th-order mutual dependencies. It

851: offers a notion of the ``order-spectrum of mutual information''

852: alternative to the Walsh spectrum. Eventually, Amari's formalism

853: allows to completely decompose any distribution into its different

854: $k$th-oder components.

855:

856: Finally, the \emph{geometry} introduced over the space of

857: distributions by Amari gives very insightful interpretations of

858: distances between distributions. A Pythagoras theorem can be

859: formulated for the Kullback-Leibler divergence. Under some conditions,

860: minimizations of the Kullback-Leibler divergence can often be

861: interpreted as orthogonal projections. This offers a geometric view on

862: some evolutionary operators.

863:

864:

865: \section{Notations}

866:

867: \paragraph{Distributions, $\log$-probabilities, and hypercube bases}

868:

869: The most direct ``coordinate system'' that can be introduced on the

870: manifold of distributions is given by the probabilities $p(x)$ for all

871: $x\in \O$ itself. To preserve notational uniformity with other

872: coordinate systems we write these numbers as $p_x := p(x)$, which

873: means that $p_x$ is the $x$-th component of $p \in \L$ in the direct

874: basis. Because of the normalization constraint $\sum_x p_x = 1$, these

875: are only $|\O|-1$ independent coordinates.

876:

877: Clearly, instead of using $p_x$ as coordinates, one can also use their

878: $\log$'s $l_x := -\log p_x$. Taking the log of probabilities is, very

879: roughly spoken, related to changing to entropic units. (Note the

880: definition of the entropy of $p$ as $H(p) = -\sum_x p_x \log p_x = {\rm

881:   E}_p \{l_x\}$.) Thus, coordinates that have some ``entropic

882: meaning'' (i.e., are related to information theoretic measures like

883: entropy, mutual information, or Kullback-Leibler divergence) will be

884: based on these log quantities. Namely, this will be the $\t$-coordinate

885: system introduced by Amari (see \citeNP{amari:99,amari:01}).

886:

887: In the following we will speek of bases of coordinate systems.

888: Essentially, what we mean are basis functions, similar to the sine

889: and cosine in the Fourier transform. For illustration, we will always

890: think of $\O$ as the hypercube; the basis function then correspond to

891: ``colorings'' of the hypercube with function values (mostly $1$, $0$,

892: or $-1$). E.g., if $e_i:~ \O=\{0,1\}^3 \to \{1,0,-1\}$ is the $i$-th

893: basis function, then the $i$-th coordinate of a distributions $p$ in

894: this coordinate system is the convolution of $p$ with $e_i$: $p_i =

895: \<e_i,p\> := \sum_{x\in\O} e_i(x)\, p(x)$. We illustrate such basis

896: functions by 3D-hypercubes, \raisebox{-3mm}{\input{figtexs/samplecube}}

897: where the bullet corresponds to $1$, the circle to $-1$ and empty

898: vertices to $0$.

899:

900: The basis of direct coordinate system is the $\d$-basis: the set of

901: all hypercubes where only one vertex is $1$ and all others are $0$.

902:

903:

904: \paragraph{Marginals over $k$-tuples of variables and schemata}

905:

906: In the following, we will also need a compact notation for the

907: different marginals of a distribution. Let $\O$ be a product space

908: $\O=\O^1 \times \cdots \times \O^n$ such that we can define the

909: marginals of a distributions $p$ over single variables but also pairs,

910: triples, and $k$-tuples of variables. We use indices $i,j,..\in I=

911: \{1,..,n\}$ to indicate variables and write the marginals as $p^{ij..}$,

912: \BM\begin{align*}

913: p^{ij..}(a,b,..)=\Pr\{x_i=a,~ x_j=b,~ ..\} ~.

914: \end{align*}\EM

915: The set of all possible marginals is given by considering all single

916: indices $i$, all pairs $i<j$, all triples $i<j<k$, etc. To simplify

917: notation (e.g., summation over such objects), we collect all these

918: tuples of indices in a set

919: \BM\begin{align*}

920: A

921: &= I

922:   ~\cup~ \{ (i,j) ~|~ i<j \in I \}

923:   ~\cup~ \{ (i,j,k) ~|~ i<j<k \in I \}

924:   ~\cup~ \cdots ~\cup~ \{ (1,2,..,n) \} \\

925: &=\{1,..,n,~ (1,2),(1,3)..,(1,n),(2,3),(2,4)...,(n-1,n),~

926:     (1,2,3),..,~ (1,2,3,..,n)\} ~.

927: \end{align*}\EM

928: In that way, all marginals of $p$ are given as $p^a$ for $a \in

929: A$. Note that $|A|=|\O|-1$.

930:

931: Besides using $a \in A$ to indicate a marginal, one can equivalently

932: use the schemata notation of length-$n$ strings in $\{\*,d\}^n$:

933: For a given $a$, the corresponding schema is the string of all

934: $\*$'s except for those positions indicated in the tuple $a$. E.g.,

935: for $n=6$:

936: \BM\begin{align*}

937: p^{245} \equiv p^{\*d\*dd\*}

938: \end{align*}\EM

939:

940:

941:

942: \section{Walsh, $\eta$-, $\theta$-, Building Block, and Haar bases}

943:

944: Table \ref{tabBases} captures the basics of the Walsh, $\eta$-,

945: $\theta$-, and Haar bases. In all cases, the coordinate system is

946: defined by the basis functions $e_i$ depicted for the 3D-case as

947: hypercubes. Actually, these 3D illustrations of the basis functions

948: $e_i$ are already sufficient to infer the basis functions for all $n$

949: since they are constructed in a very systematic way---which seems

950: obvious by simply looking at them and becomes rigorous by considering

951: the transformation matrices into these coordinates systems:

952:

953: The transformation matrices map linearly (mod 2) from the direct

954: coordinates $p_x$ to the new coordinates. E.g., in the Walsh case,

955: $w_y = \sum_x W_{yx} p_x$. The rows in these matrices correspond to

956: the basis functions $e_y = W_{y\cdot}$. An important property is that

957: in all cases (except the Haar bases!), the transformation matrices can

958: be constructed by repeated tensor products of a 2D matrix. For

959: instance, for $n=2$ in the Walsh case:

960: \BM\begin{align*}

961: W^{n=2}

962:  = \matr{rrrr}{1&1&1&1\\1&-1&1&-1\\1&1&-1&-1\\1&-1&-1&1}

963:  = \matr{rr}{1&1\\1&-1} \otimes \matr{rr}{1&1\\1&-1}

964:  =: \matr{rr}{1&1\\1&-1}^{\!\!\otimes 2}

965: \end{align*}\EM

966: Here, we introduced the superscript notation ${}^{\otimes n}$ to

967: indicate the $n$-fold tensor product.

968:

969:

970:

971:

972:

973:

974:

975: \newcommand{\inclfig}[1]{\begin{minipage}[t]{35mm}\raisebox{-43mm}{\input{figtexs/#1}}\end{minipage}}

976:

977:

978: \begin{table}

979: \begin{tabular}{@{}p{43mm}rr@{}}

980: {\bf Walsh} \newline

981:  $w_y = \sum_x W_{yx}\, p_x$\newline

982:  $p_x = \frac{1}{n}\, \sum_y W_{xy} w_y$\newline

983:  $W_{yx} = (-1)^{|x\, \AND y|}$\newline

984:  $~~~~  = \matr{cc}{\pl & \pl \\ \pl & \mi}^{\!\!\otimes n}$\newline

985:  $W^{-1} = \frac{1}{n}\, W$

986:  & \begin{tabular}[t]{@{}c@{\,}|@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}}

987:     & \rb{000} & \rb{001} & \rb{010} & \rb{011} & \rb{100} & \rb{101} & \rb{110} & \rb{111} \\

988: \hline

989: 000 & \pl & \pl & \pl & \pl & \pl & \pl & \pl & \pl \\

990: 001 & \pl & \mi & \pl & \mi & \pl & \mi & \pl & \mi \\

991: 010 & \pl & \pl & \mi & \mi & \pl & \pl & \mi & \mi \\

992: 011 & \pl & \mi & \mi & \pl & \pl & \mi & \mi & \pl \\

993: 100 & \pl & \pl & \pl & \pl & \mi & \mi & \mi & \mi \\

994: 101 & \pl & \mi & \pl & \mi & \mi & \pl & \mi & \pl \\

995: 110 & \pl & \pl & \mi & \mi & \mi & \mi & \pl & \pl \\

996: 111 & \pl & \mi & \mi & \pl & \mi & \pl & \pl & \mi \\

997: \end{tabular}

998:   &\inclfig{walsh} \\\hline

999: {\bf Amari's $\eta$ / BBB} \newline

1000:  $\eta_a = \sum_x \bar B_{ax} p_x$\newline

1001:  $~~~~ = \sum_x (B^{-1})^T_{ax} p_x$ \newline

1002:  $p_x = \sum_a B^T_{xa} \eta_a$\newline

1003:  $\bar B = (B^{-1})^T = \matr{cc}{\pl & \pl \\ \ze & \pl}^{\!\!\otimes n}$ \newline

1004:  $\bar B^{-1} = \matr{cc}{\pl & \mi \\ \ze & \pl}^{\!\!\otimes n}$

1005:  & \begin{tabular}[t]{@{}c@{\,}|@{\,}c@{\,}|@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}}

1006:   &   & \rb{000} & \rb{001} & \rb{010} & \rb{011} & \rb{100} & \rb{101} & \rb{110} & \rb{111} \\

1007: \hline

1008: $\cdot$ & $\*\*\*$ & \pl & \pl & \pl & \pl & \pl & \pl & \pl & \pl \\

1009: 3 & $\*\*1$  & \ze & \pl & \ze & \pl & \ze & \pl & \ze & \pl \\

1010: 2  & $\*1\*$ & \ze & \ze & \pl & \pl & \ze & \ze & \pl & \pl \\

1011: 23 & $\*11$ & \ze & \ze & \ze & \pl & \ze & \ze & \ze & \pl \\

1012: 1  & $1\*\*$ & \ze & \ze & \ze & \ze & \pl & \pl & \pl & \pl \\

1013: 13 & $1\*1$ & \ze & \ze & \ze & \ze & \ze & \pl & \ze & \pl \\

1014: 12 & $11\*$ & \ze & \ze & \ze & \ze & \ze & \ze & \pl & \pl \\

1015: 123 & $111$ & \ze & \ze & \ze & \ze & \ze & \ze & \ze & \pl \\

1016: \end{tabular}

1017:  &\inclfig{eta} \\\hline

1018: {\bf Amari's $\t$} \newline

1019:  $\t_a = \sum_x B_{ax} l_x$ \newline

1020:  $l_x = \sum_a \bar B^T_{xa} \t_a$ \newline

1021:  $B =  (\bar B^{-1})^T = \matr{cc}{\pl & \ze \\ \mi & \pl}^{\!\!\otimes n}$ \newline

1022:  $B^{-1} = \matr{cc}{\pl & \ze \\ \pl & \pl}^{\!\!\otimes n}$

1023:  & \begin{tabular}[t]{@{}c@{\,}|@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}}

1024:     & \rb{000} & \rb{001} & \rb{010} & \rb{011} & \rb{100} & \rb{101} & \rb{110} & \rb{111} \\

1025: \hline

1026: $\cdot$ & \pl & \ze & \ze & \ze & \ze & \ze & \ze & \ze \\

1027: 3   & \mi & \pl & \ze & \ze & \ze & \ze & \ze & \ze \\

1028: 2   & \mi & \ze & \pl & \ze & \ze & \ze & \ze & \ze \\

1029: 23  & \pl & \mi & \mi & \pl & \ze & \ze & \ze & \ze \\

1030: 1   & \mi & \ze & \ze & \ze & \pl & \ze & \ze & \ze \\

1031: 13  & \pl & \mi & \ze & \ze & \mi & \pl & \ze & \ze \\

1032: 12  & \pl & \ze & \mi & \ze & \mi & \ze & \pl & \ze \\

1033: 123 & \mi & \pl & \pl & \mi & \pl & \mi & \mi & \pl \\

1034: \end{tabular}

1035:  &\inclfig{theta} \\\hline

1036: {\bf Haar}\newline

1037: please see \cite{khuri:94}

1038:  & \begin{tabular}[t]{@{}c@{\,}|@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}}

1039:     & \rb{000} & \rb{001} & \rb{010} & \rb{011} & \rb{100} & \rb{101} & \rb{110} & \rb{111} \\

1040: \hline

1041: 000 & \pl & \pl & \pl & \pl & \pl & \pl & \pl & \pl \\

1042: 001 & \pl & \pl & \pl & \pl & \mi & \mi & \mi & \mi \\

1043: 010 & \pl & \pl & \mi & \mi & \ze & \ze & \ze & \ze \\

1044: 011 & \ze & \ze & \ze & \ze & \pl & \pl & \mi & \mi \\

1045: 100 & \pl & \mi & \ze & \ze & \ze & \ze & \ze & \ze \\

1046: 101 & \ze & \ze & \pl & \mi & \ze & \ze & \ze & \ze \\

1047: 110 & \ze & \ze & \ze & \ze & \pl & \mi & \ze & \ze \\

1048: 111 & \ze & \ze & \ze & \ze & \ze & \ze & \pl & \mi \\

1049: \end{tabular}

1050:  &\inclfig{haar}

1051: \end{tabular}

1052: \caption{\label{tabBases}

1053: Overview over the different bases for the space of distributions. The

1054: first column gives the definitions of the transformations and their

1055: inverse. Note that the $\t$-bases is defined in log-space. The

1056: transformation matrices are illustrated in the section column

1057: for $n=3$ using the symbols $\pl =1$, $\mi=-1$, and $\ze=0$. The third

1058: column illustrates the bases functions $e_y$ (or $e_a$) as colorings

1059: of the hypercube $\{0,1\}^3$. Note that the basis functions

1060: correspond to rows of the transformation matrix. The 1-norm $|x\, \AND y|$

1061: of the \AND of two binary strings counts the 1-bits that they have in common.

1062: }

1063: \end{table}

1064:

1065:

1066:

1067: Table \ref{tabBases} summarizes the most important properties of these

1068: transformation matrices: their closed form expression, their tensor

1069: product construction, and their inverse. When looking at the table one

1070: should first observe the self-similar regularity of the transformation

1071: matrices, which stems from their definition of repeated tensor

1072: products. The meaning of the various bases become more intuitive when

1073: looking at the hypercube illustrations of the basis. The Walsh bases,

1074: e.g., can nicely be compared to a Fourier basis: $e_{000}$ corresponds

1075: to the constant function $1$, $e_{001},e_{010},e_{100}$ could be view

1076: as sinus functions along the $x$-, $y$-, and $z$-axes, respectively;

1077: $e_{011},e_{101},e_{110}$ are products of sinus functions---and

1078: capture 2nd order dependencies; and $e_{111}$ is the ``highest

1079: frequency'' bases function capturing 3rd order dependencies.

1080:

1081: The $\eta$-bases captures certain marginals relative to the all-1s string:

1082: \BM\begin{align*}

1083: \eta_a=p^a(11..) ~.

1084: \end{align*}\EM

1085: These can be thought of the marginals over all possible

1086: Building-Blocks---thus it is also called the Building-Block-Bases

1087: (BBB, cf.\ \citeNP{chryssomalakos-stephens:04}). This marginalization becomes apparent

1088: in the hypercube colorings as the abundance of zeros (non-colored

1089: vertices and dots in the matrix).

1090:

1091: The $\t$-bases combines the ``frequency'' idea of the Walsh

1092: bases with the marginalization: The highest order bases function

1093: $e_{123}$ is analogous to the Walsh bases $e_{111}$ and detects

1094: highest order dependencies. Lower order dependencies though are only

1095: detected on a marginal.

1096:

1097: However, note that the $\t$ bases is defined in log-space, $\t_a =

1098: \sum_x B_{ax} \log p_x$. We will find some implications of this in the next

1099: section. Note that the transformation matrices of the $\eta$-

1100: (Building-Block-) and the $\t$-bases are related via $B =  (\bar

1101: B^{-1})^T$.

1102:

1103: For completeness, we also indicated the Haar bases in table

1104: \ref{tabBases}. It can not be derived as repeated tensor products and

1105: we do not discuss it any further here. One argument made about the

1106: Haar bases \cite{khuri:94} is that the transformation matrix incorporates

1107: a lot of 0s. Thus, the coefficients are more efficient to compute as

1108: the Walsh coefficients. We add here that the ratio of zeros in the

1109: $\eta$ and $\t$ transformation matrices is $1-(3/4)^{n-1}$ and

1110: approaches $1$ exponentially with the dimension $n$.

1111:

1112:

1113:

1114:

1115:

1116:

1117:

1118:

1119:

1120: \section{Mathematical structure on the manifold $\L$}

1121:

1122: In this section we want to develop a more geometric view on the

1123: manifold of distributions, following \cite{amari:99,amari:01}. This

1124: geometry will put a special emphasis on the $\eta$- and $\t$-bases.

1125:

1126:

1127: \paragraph{$m$- and $e$-geodesics}

1128:

1129: An essential ingredient to describe the geometry of a manifold is the

1130: definition of the notion of ``straight lines'', or geodesics, connecting two

1131: points in the manifold. In the case of the manifold of distributions,

1132: there exist at least two ways of defining a straight path connecting two

1133: distributions $q$ and $r$: the one being the linear mixture in direct

1134: coordinates $p_x$, the other being the linear mixture in $\log$

1135: coordinates $l_x$,

1136: \BM\begin{align*}

1137: \text{$m$-geodesic:}\qquad& p(x) = (1\!-\!\a)\, q(x) + \a\, r(x) ~,\\

1138: \text{$e$-geodesic:}\qquad& \log p(x) = (1\!-\!\a)\, \log q(x) + \a\, \log r(x) - \psi(x) ~.

1139: \end{align*}\EM

1140: Here $m$ means \emph{mixture} and $e$ means \emph{exponential}. The

1141: additional term $\psi(x)$ in the $e$-geodesic is necessary to preserve

1142: the normalization of $p(x)$.

1143:

1144: The fact that there exist two ways of defining geodesics means that

1145: there exist two meaningful \emph{affine connections} on the manifold.

1146: %(instead of only the Christoffel symbol derived from the Fisher

1147: %metric---the manifold is non-Riemannian).

1148: Both define a notion of

1149: flatness: we say that a $m$-geodesic is $m$-flat and a $e$-geodesic

1150: is $e$-flat.

1151:

1152: It turns out that the coordinate lines (and planes, hyperplanes, etc.)

1153: of $\eta$ are $m$-flat and those of $\t$ are $e$-flat. The former is

1154: obvious, since an $m$-geodesic can equivalently be written in the

1155: $\eta$ coordinate system as $\eta_a(p) = (1\!-\!\a)\, \eta_a(q) + \a\,

1156: \eta_a(r)$. The second becomes apparent when realizing that the Taylor

1157: expansion of $\log p$ reads

1158: \BM\begin{align*}

1159: l_x

1160:  = \sum_i \t_i x_i

1161:  + \sum_{i<j} \t_{ij}\, x_i x_j

1162:  + \sum_{i<j<k} \t_{ijk}\, x_i x_j x_k

1163:  + \cdots + \t_{1..n}\, x_1..x_n - \psi

1164:  = \sum_{a\in A} \t_a X^a - \psi

1165: \end{align*}\EM

1166: where $X^a$ is the product of the components $x_{i_1} x_{i_2}\cdots x_{i_k}

1167: \in \{0,1\}$ when $a=(i_1,i_2,..,i_k)$. Thus, an $e$-geodesic is

1168: written, in the $\t$ coordinate system, simply as $\t_a(p) = (1\!-\!\a)\,

1169: \t_a(q) + \a\, \t_a(r)$.

1170:

1171:

1172: \paragraph{Fisher metric, Kullback-Leibler divergence}

1173:

1174: On this manifold $\L$, there is a metric defined, the \emph{Fisher

1175:   metric}.  In \emph{arbitrary} coordinates $v_i$ (it could be any of

1176: the Walsh, log, $\eta$-, or $\t$-coordinates), it reads

1177: \BM\begin{align*}

1178: g_{ij}(p) = {\rm E}\left\{ \frac{\del \log p}{\del v_i}\, \frac{\del \log p}{\del v_j}\right\} ~.

1179: \end{align*}\EM

1180: Some intuition can be gained by realizing that, locally, the distance

1181: measured by the Fisher metric coincides with the distance measured by

1182: the Kullback-Leibler divergence:\footnote{ The Kullback-Leibler

1183:   divergence $\kld{p}{q}$ (also called relative entropy or divergence)

1184:   is a measure for the loss of information (or gain of entropy) when a

1185:   \emph{true} distribution $p$ is approximated by a model

1186:   distributions $q$. For example, when $p(x,y)$ is approximated by

1187:   $p(x)\,p(y)$ one looses information on the mutual dependence between

1188:   $x$ and $y$.  Accordingly, the relative entropy

1189:   $\kld{p(x,y)}{p(x)\,p(y)}$ is equal to the mutual information

1190:   between $x$ and $y$. Generally, when \emph{knowing} the real

1191:   distribution $p$ one needs on average $H(p)$ (entropy of $p$) bits to

1192:   describe a random sample. If, however, we know only an approximate

1193:   model $q$ we would need on average $H(p) + \kld{p}{q}$ bits to

1194:   describe a random sample of $p$.  The loss of knowledge about the

1195:   true distribution induces an increase of entropy and thereby an

1196:   increase of average description length for random samples. }

1197: Consider a point $p \in \L$ and a nearby point $p+\d p$. When we

1198: measure the squared length $\<\d p,\d p\>$ of the variation $\d p$ by

1199: the Kullback-Leibler divergence we find \BM\begin{align*} \<\d p,\d

1200:   p\> = \kld{p}{p+\d p} = {\rm E}\left\{ \log p - \log (p+\d

1201:     p)\right\} \ddot=~ {\rm E}\left\{- \frac{\d p}{p} + \frac{\d

1202:       p^2}{p^2} \right\} = {\rm E}\left\{\frac{\d p^2}{p^2} \right\}

1203:   ~.

1204: \end{align*}\EM

1205: Here, the 2nd-order approximation stems from the Taylor expansion of

1206: $\log(p+\d p)$ and ${\rm E}\{\d p/p\} =0$ since $\sum_x \d p(x)=0$ to

1207: preserve normalization. Note that, in this infinitesimal neighborhood,

1208: the Kullback-Leibler divergence becomes symmetric. Generalizing this

1209: to two small variations $\d_1 p= \del_{v_i} p := \frac{\del p}{\del v_i}$ and $\d_2

1210: p= \del_{v_j} p := \frac{\del p}{\del v_j}$ induced by small shifts along some

1211: coordinates we have

1212: \BM\begin{align*}

1213: \<\del_{v_i} p,\del_{v_j} p\>

1214:  =  {\rm E}\left\{\frac{\del_{v_i} p}{p}\, \frac{\del_{v_i} p}{p} \right\}

1215:  =  {\rm E}\left\{\frac{\del \log p}{\del v_i}\, \frac{\del \log p}{\del v_i} \right\}

1216: \end{align*}\EM

1217: and retrieve the Fisher metric. In turn, the Fisher metric can also be derived by considering the second order derivatives of the Kullback-Leibler divergence:

1218: \BM\begin{align*}

1219: g_{ij}(q) = \frac{1}{2}\, \frac{\del}{\del v_i}\frac{\del}{\del v_j} \kld{p}{p+\d v}\Big|_{\d v=0} ~.

1220: \end{align*}\EM

1221:

1222:

1223: \paragraph{Orthogonality of $\eta$ and $\t$, the Pythagoras}

1224:

1225: The coordinate systems $\eta$ and $\t$ have a crucial property w.r.t.\

1226: the Fisher metric---they are mutually orthogonal: At any point $p$ in

1227: the manifold the variations induced by shifts along $\t$ and $\eta$

1228: coordinates fulfill

1229: \BM\begin{align*}

1230: \<\del_{\t_a} p, \del_{\eta_b} p\> = \d_{ab} ~,

1231: \end{align*}\EM

1232: where $\d_{ab}$ is the Kronecker delta. Based on this one can derive a

1233: Pythagoras theorem: Let $p$, $r$ and $q$ be three distributions where

1234: the $m$-geodesic connecting $p$ and $r$ is orthogonal to the

1235: $e$-geodesic connecting $r$ and $q$, then

1236: \BM\begin{align*}

1237: \kld{p}{q} = \kld{p}{r} + \kld{r}{q} ~.

1238: \end{align*}\EM

1239: Please figure \ref{figPy} for an illustration.

1240:

1241: \begin{figure}\center

1242: \input figtexs/pythagoras

1243: \caption{\label{figPy}

1244: The Pythagoras in the case when a certain $k$-cut is used to define

1245: the $m$- and $e$ geodesics connecting to $r$, respectively $r'$. It

1246: holds: $\kld{p}{q} = \kld{p}{r} + \kld{r}{q}$ and $\kld{q}{p} = \kld{q}{r'} + \kld{r'}{p}$.

1247: }

1248: \end{figure}

1249:

1250:

1251: \paragraph{$k$-cuts}

1252:

1253: Let $k$ denote an order of interactions that we are interested in.

1254: Then, the coordinates split into those describing interactions of

1255: order $\le k$ and those describing interactions of order $> k$,

1256: \BM\begin{align*}

1257: \vec \eta_k &:= (\text{all $\eta_a$ of order $|a| \le k$}) ~,\\

1258: \vec \t_{k^*} &:= (\text{all $\t_a$ of order $|a| >k$}) ~.

1259: \end{align*}\EM

1260:

1261: These can be mixed into a new coordinate system $(\vec \eta_k,\vec

1262: \t_{k^*})$. The point is that those dimensions spanned by $\vec

1263: \eta_k$ are orthogonal to those spanned by $\vec \t_{k^*}$. To

1264: simplify the discussion we call $\vec \eta_k$ \emph{marginals}

1265: (although they include marginals over $k$-tuples of variables) and

1266: $\vec\t_{k^*}$ \emph{higher order interactions}. Keeping the marginals

1267: $\vec \eta_k$ constant defines $m$-flat sub-manifolds $M_k(\eta_k)$,

1268: which are disjoint for different $\vec \eta_k$ and cover all $\L$.

1269: Keeping higher order interactions $\vec \t_{k^*}$ constant defines

1270: $e$-flat sub-manifolds $E_{k^*}(\t_{k^*})$, which are disjoint for

1271: different $\vec \t_{k^*}$ and cover all $\L$.

1272:

1273:

1274:

1275: \paragraph{Complete decomposition of different order interactions}

1276:

1277: Given a distribution $p$, we define its $k$th order reduction

1278: $p^{(k)}$ as the distribution with same marginals $\vec \eta_k(p)$ as

1279: $p$ but vanishing higher order interactions $\vec \t_{k^*}=0$,

1280: %

1281: \BM\begin{align*}

1282:   p^{(k)} = (\vec \eta_k(p),\vec \t_{k^*}=0) ~.

1283: \end{align*}\EM

1284: %

1285: That is, $p^{(k)}$ is the same distributions as $p$ except that all

1286: interactions of order $>k$ have been canceled. We call $p^{(k)}$ the

1287: $k$th-order reduction of $p$. Given the Pythagoras it should be clear

1288: that $p^{(k)}$ can also be defined as the orthogonal projection of $p$

1289: onto the submanifold $E_{k^*}(0)$ or as the orthogonal projection of

1290: the uniform distribution $p^{(0)}$ onto $M_k(\vec\eta_k(p))$, please

1291: see figure \ref{figDecomp} left,

1292: \BM\begin{align*}

1293: p^{(k)}

1294:  = \argmin{q \in E_{k^*}(0)} \kld{p}{q}

1295:  = \argmin{q \in M_k(\vec\eta_k(p))} \kld{q}{p^{(0)}} ~.

1296: \end{align*}\EM

1297: Further, define $D_k(p) = \kld{p^{(k)}}{p^{(k-1)}}$. Then the

1298: Pythagoras allows to decompose the mutual information $I(p)$ in $p$

1299: (i.e., the measure of all interactions in $p$) into a sum of different

1300: order interactions:

1301: \BM\begin{align*}

1302: I(p) = \kld{p}{p^{(1)}} = \sum_{k=2}^n D_k(p)

1303: \end{align*}\EM

1304: Please see figure \ref{figDecomp} right for an illustration.

1305:

1306: This result should be highlighted. The given formalism allows to

1307: explicitly distinguish different order interactions between variables

1308: in a distribution and directly assigns coordinates $\t$ to those

1309: different order interactions. The quantities $D_k(p) =

1310: \kld{p^{(k)}}{p^{(k-1)}}$ measure precisely and only the $k$th-order

1311: interactions in entropic units.

1312:

1313: For instance, consider three random variables $X_1,\, X_2,\, X_3$

1314: which are pair-wise dependent in the sense $I(X_i|X_j) \not=0$. The

1315: question is whether there exist ``true'' 3rd-order interactions or only

1316: concatenated 2nd-order interactions---in other terms, can they be

1317: described by a Markov process $X_1 \to X_2 \to X_3$. The formalism

1318: gives an answer: if $D_3(p)=0$ it is a Markov process, otherwise there

1319: exist 3rd-order interactions.

1320:

1321: \begin{figure}\center

1322: \input{figtexs/decomp}\hfill

1323: \input figtexs/cuts

1324: \caption{\label{figDecomp}

1325:   The left figure illustrates a distribution $p$ and its $k$th-order

1326:   reduction $p^{(k)}$: It is the orthogonal projection of $p$ along

1327:   $M_k(p)$ onto $E_{k^*}(0)$. The ``distance'' $D(p:p^{(k)})$ measures

1328:   ``norm'' of $\vec\t_{k^*}$, i.e., it measures the amount of mutual

1329:   information of order higher than $k$. The right figure illustrates

1330:   the complete decomposition of $p$ in reductions $p^{(k)}$ of all

1331:   orders. Every projection from $p^{(k)}$ to $p^{(k-1)}$ is an

1332:   orthogonal projection onto $E_{(k-1)^*}(0)$. Every ``distance''

1333:   $D(p^{(k)}:p^{(k-1)})$ measures the mutual information specifically

1334:   of order $k$. }

1335: \end{figure}

1336:

1337:

1338:

1339:

1340:

1341:

1342: \section{Geometric view on evolution operators}

1343:

1344: \paragraph{Crossover}

1345:

1346: In Evolutionary Algorithms, crossover is one means of mixing a parent

1347: population to an offspring population. Populations can be formalized

1348: as distributions $p$ and a definition of a simple form of crossover

1349: (uniform crossover parameterized with $c\in\RRR$) reads

1350: \BM\begin{align*}

1351: \CC p &= (1-c)\, p + c\, p^{(1)} ~.

1352: \end{align*}\EM

1353: See, for instance, \cite{toussaint:03-gecco-cross} for a general

1354: definition of a crossover operator in more conventional notation and

1355: details of when it reduces to this simple form.

1356:

1357: This crossover simply mixes the original distribution (or population)

1358: $p$ with its $1$st-order reduction. The $1$st-order reduction is the

1359: product of all single variable marginals, i.e., it is the distribution

1360: with the same marginals (gene frequencies) as $p$ but all dependencies

1361: (gene linkages) between the variables eliminated. From the geometrical

1362: point of view, crossover makes a step along the $m$-geodesic

1363: connecting $p$ and $p^{(1)}$. It can be illustrated as a step along

1364: the projection onto the submanifold $E_{1^*}(p)$, please see figure

1365: \ref{figCross}.

1366:

1367:

1368: From this view it becomes clear that a reasonable coordinate system to

1369: describe crossover is $(\vec\eta_1,\vec\t_{1^*})$. Crossover does not

1370: change $\vec\eta_1$ (it operates orthogonally to $\eta_1$) but

1371: continuously reduces the $\vec\t_{1^*}$ variables. That $\vec\t_{1^*}$

1372: are reduced and not increased is intuitive from figure \ref{figCross}

1373: (recall that $\t$'s are always positive) and becomes apparent from that

1374: the ``distance'' from $p$ to $p^{(1)}$, $I(p)=\kld{p}{p^{(1)}}$, is a

1375: norm of $\vec\t_{1^*}$.  \hide{ More explicitly, consider the

1376:   derivative of the $\t$-coordinates along the path, i.e., with

1377:   varying $c$, \BM\begin{align*}

1378:     \frac{\del}{\del c} p_x(c) &= p_x^{(1)} - p_x \\

1379:     \frac{\del}{\del c} \t_a(c) &= \sum_x B_{ax}\, \frac{\del}{\del c}

1380:     l_x(c) = \sum_x B_{ax}\, \frac{p_x^{(1)}-p_x}{p_x(c)}

1381: \end{align*}\EM

1382:

1383: -- In the $\O=\{0,1\}^2$ two gene case we have

1384: \BM\begin{align*}

1385: & |a|\le 1 \To \eta_a(c) = \eta_a \\

1386: &\eta_{12}(c) = (1-c)\, \eta_{12} + c\, \eta_1\, \eta_2

1387: \end{align*}\EM

1388: with crossover probability $c$. Further

1389: \BM\begin{align*}

1390: \dot\eta_{12}

1391:  &= -\eta_{12} + \eta_1\, \eta_2 \\

1392: \t_{12}

1393:  &= \sum_x B_{(12)x} \log \sum_a B^T_{xa} \eta_a \\

1394: \dot \t_{12}

1395:  &= \sum_x B_{(12)x} \frac{1}{p_x}[\sum_a B^T_{xa} \dot\eta_a]

1396:   = \sum_x B_{(12)x} \frac{B^T_{x(12)} \dot\eta_{12}}{p_x}

1397:   = \dot\eta_{12} \sum_x \frac{B_{(12)x} B^T_{x(12)} }{p_x}

1398:   = \dot\eta_{12} \sum_x \frac{1}{p_x}

1399: \end{align*}\EM

1400: where the $\dot{}$ means the derivative $\del_c$ at $c=0$. Since, on

1401: the path $c\in [0,1]$, $\dot\eta_{12}$ is constant and does not change

1402: the sign, $\t_{12}$ monotonously approaches zero.

1403: }

1404:

1405: \begin{figure}\center

1406: \input{figtexs/cross}

1407: \caption{\label{figCross}

1408:   Crossover is an operator that takes a step along the projection of

1409:   $p$ towards the first order reduction $p^{(1)}$.}

1410: \end{figure}

1411:

1412: \paragraph{Max Entropy}

1413:

1414: \citeN{wright-et-al:04} recently proposed an evolutionary search

1415: scheme that constructs the new search distribution (offspring

1416: population) via a maximum entropy principle: From the parent

1417: population all second order scheme frequencies are calculated. Then,

1418: from all the distributions which have the same second order schema

1419: frequencies, the new offspring distribution is the one with maximum

1420: entropy.

1421:

1422: In our formalism, constraining the schema frequencies corresponds to

1423: fixing $\vec \eta_2$, i.e., constraining the offspring distribution to

1424: the submanifold $M_2(\vec \eta_2)$. The distribution with maximal

1425: entropy in $M_2(\vec \eta_2)$ must have minimal higher order

1426: (3rd-order or higher) interactions $\vec\t_{2^*}$ since interactions

1427: (mutual information) reduce entropy. Thus, the max entropy rule simply

1428: amounts to setting $\vec\t_{2^*}=0$, i.e., choosing

1429: $p^{(2)}=(\vec\eta_2,0)$ as the new offspring distribution.

1430:

1431: Again, this can be viewed geometrically as the orthogonal projection

1432: of the parent population $p$ onto $E_{2^*}(0)$ according to

1433: \BM\begin{align*}

1434: \argmin{q \in E_{2^*}(0)} \kld{p}{q}

1435: \end{align*}\EM

1436: or as the orthogonal projection of the uniform distribution $p^{(0)}$

1437: onto $M_2(\vec\eta_2)$

1438: \BM\begin{align*}

1439: \argmin{q \in M_2(\vec\eta_2)} \kld{q}{p^{(0)}} ~.

1440: \end{align*}\EM

1441: This latter way of writing the max entropy principle is quite

1442: intuitive: find the distribution that fulfills the required constraints

1443: (lies on $M_2(\vec\eta_2)$) but is closest to the uniform distribution

1444: $p^{(0)}$.

1445:

1446: Eventually, note the strong analogy of the maximum entropy principle

1447: proposed by \cite{wright-et-al:04} and the simple crossover operator

1448: given before: Crossover moves $p$ toward $p^{(1)}$, while the search

1449: heuristic considered by Wright et.\ al.\ chooses $p^{(2)}$ as the new

1450: search distribution.

1451:

1452:

1453:

1454: \section{Discussion}

1455:

1456: The methods information geometry provides to analyze and describe the

1457: structure of distributions are deeply grounded in information

1458: theory. For instance, it seems very beneficial to have coordinate

1459: systems for distributions which capture precisely arbitrary $k$th

1460: order interactions between variables and have a direct link to

1461: measures like mutual information and the Kullback-Leibler

1462: divergence. Also the geometric aspects, e.g., that some operations

1463: can be described as orthogonal to certain submanifolds, add to a more

1464: comprehensive picture of the space of distributions. In that sense,

1465: information geometric methods enhance more common approaches in

1466: Evolutionary Computation, like the Walsh bases, in describing the

1467: structure of distributions and operators.

1468:

1469: However, the question remains how and whether these methods can be

1470: used to (1) actually propose new heuristic search algorithms or (2) to

1471: provide new theoretical tools to analyze the dynamics of evolutionary

1472: processes.

1473:

1474:

1475: \subsection*{Acknowledgment}

1476:

1477: I would like to thank the German Research Foundation (DFG) for their

1478: funding of the Emmy Noether fellowship TO 409/1-1.

1479:

1480: \footer\small

1481: \bibliography{/cygdrive/c/home/tex/bibs}

1482: \end{document}

1483: