0610:cs0610126/Fuo.tex

1:

2: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

3: %%                Fitness Uniform Optimization               %%

4: %%           Marcus Hutter & Shane Legg: (2000-2005)         %%

5: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

6:

7: \documentclass[twoside,twocolumn,10pt]{article}

8: \usepackage{graphicx,wrapfig}

9:

10: \topmargin=-15mm  \oddsidemargin=-10mm \evensidemargin=-10mm

11: \textwidth=18cm \textheight=25cm

12: \sloppy\lineskip=0pt

13: \pagestyle{myheadings}

14: \markboth{\sc Marcus Hutter \& Shane Legg % \hfil IDSIA-16-06

15: }{\sc Fitness Uniform Optimization}

16:

17: %-------------------------------%

18: %     My Math-Environments      %

19: %-------------------------------%

20:

21: \def\,{\mskip 3mu} \def\>{\mskip 4mu plus 2mu minus 4mu} \def\;{\mskip 5mu plus 5mu} \def\!{\mskip-3mu}

22: \def\dispmuskip{\thinmuskip= 3mu plus 0mu minus 2mu \medmuskip=  4mu plus 2mu minus 2mu \thickmuskip=5mu plus 5mu minus 2mu}

23: \def\textmuskip{\thinmuskip= 0mu                    \medmuskip=  1mu plus 1mu minus 1mu \thickmuskip=2mu plus 3mu minus 1mu}

24: \textmuskip

25: \def\beq{\dispmuskip\begin{equation}}    \def\eeq{\end{equation}\textmuskip}

26: \def\beqn{\dispmuskip\begin{displaymath}}\def\eeqn{\end{displaymath}\textmuskip}

27: \def\bqa{\dispmuskip\begin{eqnarray}}    \def\eqa{\end{eqnarray}\textmuskip}

28: \def\bqan{\dispmuskip\begin{eqnarray*}}  \def\eqan{\end{eqnarray*}\textmuskip}

29:

30: %-------------------------------%

31: %   Macro-Definitions           %

32: %-------------------------------%

33:

34: \newenvironment{keywords}{\vskip 3ex\noindent {\bf\Large Keywords}\vskip 2ex\noindent}{\par\vskip 1ex}

35: %\def\subsection#1{\paragraph{#1.}}

36: \def\subsection#1{\vspace{1ex plus 0.5ex minus 0.5ex}\noindent{\bfseries\boldmath{#1.}}}

37:

38: \def\toinfty#1{\stackrel{#1\to\infty}{\longrightarrow}}

39: \def\nq{\hspace{-1em}}

40: \def\qed{\hspace*{\fill}$\Box\quad$\\}

41: \def\odt{{\textstyle{1\over 2}}}

42: \def\eps{\varepsilon}

43: \def\v#1{{\bf #1}}

44: \def\approxleq{\mbox{\raisebox{-0.8ex}{$\stackrel{\displaystyle<}\sim$}}} %% make nicer

45: \def\approxgeq{\mbox{\raisebox{-0.8ex}{$\stackrel{\displaystyle>}\sim$}}} %% make nicer

46: \def\SetR{{I\!\!R}}

47:

48: \begin{document}

49: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

50: %                      T i t l e - P a g e                      %

51: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

52:

53: \title{\vskip -10mm\normalsize\sc Technical Report \hfill IDSIA-16-06

54: \vskip 2mm\bf\huge\hrule height5pt \vskip 6mm

55: Fitness Uniform Optimization

56: \vskip 6mm \hrule height2pt \vskip 5mm}

57: \author{

58: {\bf Marcus Hutter}\\[2mm]

59: IDSIA, Galleria 2, CH-6928\\ Manno-Lugano, Switzerland\\

60: marcus@idsia.ch

61: \and

62: {\bf Shane Legg}\\[2mm]

63: IDSIA, Galleria 2, CH-6928\\ Manno-Lugano, Switzerland\\

64: shane@idsia.ch}

65:

66: \maketitle

67:

68: \begin{abstract}

69: In evolutionary algorithms, the fitness of a population increases with

70: time by mutating and recombining individuals and by a biased selection

71: of more fit individuals. The right selection pressure is critical in

72: ensuring sufficient optimization progress on the one hand and in

73: preserving genetic diversity to be able to escape from local optima on

74: the other hand. Motivated by a universal similarity relation on the

75: individuals, we propose a new selection scheme, which is uniform in

76: the fitness values. It generates selection pressure toward sparsely

77: populated fitness regions, not necessarily toward higher fitness, as

78: is the case for all other selection schemes. We show analytically on a

79: simple example that the new selection scheme can be much more

80: effective than standard selection schemes.  We also propose a new

81: deletion scheme which achieves a similar result via deletion and show

82: how such a scheme preserves genetic diversity more effectively than

83: standard approaches.  We compare the performance of the new schemes to

84: tournament selection and random deletion on an artificial deceptive

85: problem and a range of NP-hard problems: traveling salesman, set

86: covering and satisfiability.

87: \end{abstract}

88:

89: \begin{keywords}

90: Evolutionary algorithms, fitness uniform selection scheme, fitness

91: uniform deletion scheme, preserve diversity, local optima, evolution,

92: universal similarity relation, correlated recombination, fitness tree

93: model, traveling salesman, set covering, satisfiability

94: \end{keywords}

95:

96: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

97: \section{Introduction}\label{secInt}

98: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

99:

100: %------------------------------%

101: \subsection{Evolutionary algorithms (EA)}

102: %------------------------------%

103: Evolutionary algorithms are capable of solving complicated

104: optimization tasks in which an objective function $f:I\to\SetR$ shall

105: be maximized. $i\in I$ is an individual from the set $I$ of feasible

106: solutions. Infeasible solutions due to constraints may also be

107: considered by reducing $f$ for each violated constraint. A population

108: $P$ is a multi-set of individuals from $I$ which is maintained and

109: updated as follows: one or more individuals are selected according to

110: some selection strategy.

111: %

112: In generation based EAs, the selected individuals are recombined

113: (e.g.\ crossover) and mutated, and constitute the new population.

114: We prefer the more incremental, steady-state population update,

115: which selects (and possibly deletes) only one or two individuals from

116: the current population and adds the newly recombined and mutated

117: individuals to it.

118: %

119: We are interested in finding a single individual of maximal objective value

120: $f$ for difficult multi-modal and deceptive problems.

121:

122: %------------------------------%

123: \subsection{Standard selection schemes (STD)}

124: %------------------------------%

125: The standard selection schemes (abbreviated by STD in the

126: following), proportionate, truncation, ranking and tournament

127: selection all favor individuals of higher fitness

128: \cite{Goldberg:89,Goldberg:91,Blickle:95a,Blickle:97}. This is

129: also true for less common schemes, like Boltzmann selection

130: \cite{Maza:93}. The

131: fitness function is identified with the objective

132: function (possibly after a monotone transformation).

133: In linear proportionate selection the probability

134: of selecting an individual depends linearly on its fitness

135: \cite{Holland:75}. In truncation selection the $\alpha\%$ fittest

136: individuals are selected, usually with multiplicity

137: ${1\over\alpha\%}$ to keep the population size fixed

138: \cite{Muehlenbein:94}.(Linear) ranking selection orders the

139: individuals according to their fitness. The selection probability

140: is, then, a (linear) function of the rank \cite{Whitley:89}.

141: Tournament selection \cite{Baker:85}, which selects the best $l$

142: out of $k$ individuals has primarily developed for steady-state

143: EAs, but can be adapted to generation based EAs. All these

144: selection schemes have the property (and goal!) to increase the

145: average fitness of a population, i.e.\ to evolve the population

146: toward higher fitness.

147:

148: %------------------------------%

149: \subsection{The problem of the right selection pressure}

150: %------------------------------%

151: The standard selection schemes STD, together with mutation and

152: recombination, evolve the population toward higher fitness. If the

153: selection pressure is too high, the EA gets stuck in a local optimum,

154: since the genetic diversity rapidly decreases. The suboptimal genetic

155: material which might help in finding the global optimum is deleted too

156: rapidly (premature convergence). On the other hand, the selection

157: pressure cannot be chosen arbitrarily low if we want the EA to be

158: effective. In difficult optimization problems, suitable population

159: sizes, mutation and recombination rates, and selection parameters,

160: which influence the selection intensity, are usually not known

161: beforehand. Often, constant values are not sufficient at all

162: \cite{Eiben:99}.  There are various suggestions to dynamically

163: determine and adapt the parameters

164: \cite{Eshelman:91,Baeck:91,Herdy:92,Schlierkamp:94}.

165: Other approaches to preserve genetic diversity are fitness sharing

166: \cite{Goldberg:87} and crowding \cite{DeJong:75}.

167: They depend on the proper design of a neighborhood function based on

168: the specific problem structure and/or coding.  One approach which does

169: not require a neighborhood function based on the genome is local

170: mating \cite{Collins:91}, however it has been shown that rapid

171: takeover can still occur for basic spatial topologies

172: \cite{Rudolph:00}.  Another approach which has not been widely studied

173: is preselection \cite{Cavicchio:70}.

174:

175: We are interested in evolutionary algorithms which do not require

176: special problem insight (problem specific neighborhood function and/or

177: coding) and is able to effectively prevent population takeover.  In

178: this paper we introduce and analyze two potential approaches to this

179: problem: the Fitness Uniform Selection Scheme (FUSS) and the Fitness

180: Uniform Deletion Scheme (FUDS).

181:

182: %------------------------------%

183: \subsection{The fitness uniform selection scheme}

184: %------------------------------%

185: FUSS is based on the insight that we are not primarily interested in a

186: population converging to maximal fitness, but only in a single

187: individual of maximal fitness.  The scheme automatically creates a

188: suitable selection pressure and preserves genetic diversity better

189: than STD. The proposed fitness uniform selection scheme FUSS (see also

190: Figure \ref{figsel}) is defined as follows: {\em if the lowest/highest

191: fitness values in the current population $P$ are $f_{min/max}$ we

192: select a fitness value $f$ uniformly in the interval

193: $[f_{min},f_{max}]$. Then, the individual $i\in P$ with fitness

194: nearest to $f$ is selected and a copy is added to $P$, possibly after

195: mutation and recombination.} We will see that FUSS maintains genetic

196: diversity better than STD, since a distribution over the fitness

197: values is used, unlike STD, which all use a distribution over

198: individuals. Premature convergence is avoided in FUSS by abandoning

199: convergence at all. Nevertheless there is a selection pressure in FUSS

200: toward higher fitness.

201: %

202: The probability of selecting a specific individual is proportional

203: to the distance to its nearest fitness neighbor. In a

204: population with a high density of unfit and low density of fit

205: individuals, the fitter ones are effectively favored.

206:

207: %------------------------------%

208: \subsection{The fitness uniform deletion scheme}

209: %------------------------------%

210: We may also preserve diversity through deletion rather than through

211: selection.  By always deleting from those individuals which have very

212: commonly occurring fitness values we achieve a population which is

213: uniformly distributed across fitness values, like with FUSS.  Because

214: these deleted individuals are ``commonly occurring'' in some sense

215: this should help preserve population diversity.  Under FUDS the role

216: of the selection scheme is to govern how actively different parts of

217: the solution space are searched rather than to move the population as

218: a whole toward higher fitness.  Thus, like with FUSS, premature

219: convergence is avoided by abandoning convergence as our goal.  However

220: as FUDS is only a deletion scheme, the EA still requires a selection

221: scheme which may require a selection intensity parameter to be set.

222: Thus we do not necessarily have a parameterless EA, as we do with

223: FUSS.  Nevertheless due to the impossibility of population collapse

224: the performance is more robust than usual with respect to variation in

225: selection intensity.  Thus FUDS is at least a partial solution to the

226: problem of having to correctly set a selection intensity parameter.

227:

228: %------------------------------%

229: \subsection{Contents}

230: %------------------------------%

231: This paper extends and supersedes the earlier results reported in the

232: conference papers \cite{Hutter:01fuss}, \cite{Legg:04fussexp} and

233: \cite{Legg:05fuds}.

234: Among other things, this paper: extends the previous theoretical

235: analysis of FUSS and gives the first theoretical analysis of FUDS and

236: of their performance when combined; presents a new method of analysis

237: called fitness tree analysis; is the first set of experimental results

238: which directly compares the two proposed schemes on the same problems

239: with the same parameters, including when they are used together; gives

240: the first full analysis of population diversity measurements for FUSS

241: and in particular extends and corrects some of the earlier speculation

242: about performance problems in some situations.

243:

244: The paper is structured as follows:

245:

246: In {\em Section \ref{secSim}} we discuss the problems of local

247: optima and population takeover \cite{Goldberg:91} in STD, which

248: could be lowered by restricting the number of {\em similar}

249: individuals in a population. As we often do not have an

250: appropriate functional similarity relation, we define a universal

251: distance (semi-metric) $d(i,j):=|f(i)-f(j)|$ based on the

252: available fitness only, which will serve our needs.

253:

254: Motivated by the universal similarity relation $d$ and by the need to

255: preserve genetic diversity, we define in {\em Section

256: \ref{secFuss}} the fitness uniform selection scheme. We

257: discuss under which circumstances FUSS leads to an (approximate)

258: fitness uniform population.

259:

260: Further properties of FUSS are discussed in \emph{Section

261: \ref{secProp}}, especially, how FUSS creates selection pressure

262: toward higher fitness and how it preserves diversity better than

263: STD. Further topics are the equilibrium distribution, the

264: transformation properties of FUSS under linear and non-linear

265: transformations of $f$.

266:

267: Another way to utilize the ability of the universal similarity

268: relation $d$ to preserve diversity, is to use it to help target

269: deletion.  This gives us the fitness uniform {\em deletion} scheme

270: which we define in {\em Section \ref{secFUDS}}.  As this produces a

271: population which is approximately uniformly distributed across fitness

272: levels, like with FUSS, many of the properties of FUSS carry over to

273: an EA using FUDS.  Some of these properties are highlighted in {\em

274: Section

275: \ref{secPropFUDS}}.

276:

277: In {\em Section \ref{secEx}} we theoretically demonstrate, by way of a

278: simple optimization example, that an EA with FUSS or FUDS can optimize

279: much faster than with STD. We show that crossover can be effective in

280: FUSS, even when ineffective in STD. Furthermore, FUSS, FUDS and STD

281: are compared to random search with and without crossover.

282:

283: In {\em Section \ref{secTree}} we develop a fitness tree model, which

284: we believe to cover the essential features of fitness landscapes for

285: difficult problems with many local optima. Within this model we derive

286: heuristic expressions for the optimization time of random walk, FUSS,

287: FUDS and STD.  They are compared, and a worst case slowdown of FUSS

288: relative to STD is obtained.

289:

290: There is a possible additional slowdown when including recombination,

291: as discussed in {\em Section \ref{secCross}}, which can be avoided by

292: using a scale independent pair selection. It is a ``best'' compromise

293: between unrestricted recombination and recombination of $d$-similar

294: individuals only.  It also has other interesting properties when used

295: without crossover.

296:

297: To simplify the discussion we have concentrated on the case of

298: discrete, equi-spaced fitness values. In many practical problems, the

299: fitness function is continuously valued. FUSS and some of the

300: discussion of the previous sections is generalized to the continuous

301: case in {\em Section~\ref{secCont}}.

302:

303:

304:

305: \emph{Section~\ref{secJfuss}} begins our experimental analysis of

306: FUSS and FUDS.  In this section we give a detailed account of the EA

307: software we have used for our experiments, including links to where

308: the source code can be downloaded.

309:

310: \emph{Section \ref{secEx2}} examines the empirical performance of

311: FUSS and FUDS on the artificially constructed deceptive optimization

312: problem described in Section~\ref{secEx}.  These results confirm the

313: correctness of our theoretical analysis.

314:

315: In \emph{Section \ref{secTSP}} we test randomly generated traveling

316: salesman problems.

317:

318: In \emph{Section \ref{secSetCover}} we examine the set covering

319: problem, an NP hard optimization problem which has many real world

320: applications.

321:

322: For our final test in \emph{Section \ref{secSAT}} we look at random

323: CNF3 SAT problems.  These are also NP hard optimization problems.

324:

325: \emph{Section \ref{secConc}} contains a summary of our results and

326: possible avenues for future research.

327:

328:

329: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

330: \section{Universal Similarity Relation}\label{secSim}

331: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

332:

333: %------------------------------%

334: \subsection{The problem of local optima}

335: %------------------------------%

336: Proportionate, truncation, ranking and tournament are the standard

337: (STD) selection algorithms used in evolutionary optimization. They

338: have the following property: if a local optimum $i^{lopt}$ has been

339: found, the number of individuals with fitness $f^{lopt}=f(i^{lopt})$

340: tends to increase rapidly. Assume a low mutation and recombination

341: rate, or, for instance, truncation selection {\em after} mutation and

342: recombination. Further, assume that it is very difficult to find an

343: individual fitter than $i^{lopt}$. The population will then degenerate

344: and will consist mostly of $i^{lopt}$ after a few rounds. This

345: decreased diversity makes it even less likely that $f^{lopt}$ gets

346: improved. The suboptimal genetic material which might help in finding

347: the global optimum has been deleted too rapidly. On the other hand,

348: too high mutation and recombination rates convert the EA into an

349: inefficient random search.

350:

351: %------------------------------%

352: \subsection{Possible solution}

353: %------------------------------%

354: Sometimes it is possible to appropriately choose the mutation and

355: recombination rate and population size by some insight into the nature

356: of the problem. More often this is a trial and error process, or no

357: single fixed rate works at all.

358:

359: A naive fix of the problem is to artificially limit the number of

360: identical individuals to a significant but small fraction $\eps$.

361: If the space of individuals $I$ is large, there could be many very

362: similar (but not identical) individuals of, for instance, fitness

363: $f^{lopt}$. The EA can still converge to a population containing

364: only this class of similar individuals, with all others becoming

365: extinct. In order for the limitation approach to work, one has to

366: restrict the number of {\em similar} individuals. Significant

367: contributions in this direction are fitness sharing

368: \cite{Goldberg:87} and crowding \cite{DeJong:75}.

369:

370: %------------------------------%

371: \subsection{The problem of finding a similarity relation}

372: %------------------------------%

373: If the individuals are coded binary one might use the Hamming distance

374: as a similarity relation. This distance is consistent with a mutation

375: operator which flips a few bits. It produces Hamming-similar

376: individuals, but recombination (like crossover) can produce very

377: dissimilar individuals w.r.t.\ this measure. In any case, genotypic

378: similarity relations, like the Hamming distance, depend on the

379: representation of the individuals as binary strings. Individuals with

380: very dissimilar genomes might actually be functionally

381: (phenotypically) very similar. For instance, when most bits are unused

382: (like introns in genetic programming), they can be randomly disturbed

383: without affecting the properties of the individual. For specific

384: problems at hand, it might be possible to find suitable

385: representation-independent functional similarity relations. On the

386: other hand, in genetic programming, for instance, it is in general

387: undecidable whether two individuals are functionally similar.

388:

389: %------------------------------%

390: \subsection{A universal similarity relation}

391: %------------------------------%

392: Here we want to take a different approach. We define the

393: difference or distance between two individuals as

394: \beqn

395:   d(i,j) \;:=\; |f(i)-f(j)|.

396: \eeqn

397: The distance is based solely on the fitness function, which is

398: provided as part of the problem specification.

399: %

400: It is independent of the coding/representation and other problem

401: details, and of the optimization algorithm (e.g.\ the genetic mutation

402: and recombination operators), and can trivially be computed from

403: the fitness values.

404: %

405: If we make the natural assumption that functionally similar

406: individuals have similar fitness, they are also similar w.r.t.\ the

407: distance $d$. On the other hand, individuals with very different

408: coding, and even functionally dissimilar individuals may be

409: $d$-similar, but we will see that this is acceptable. For instance,

410: individuals from different local optima of equal height are

411: $d$-similar.

412:

413:

414: %------------------------------%

415: \subsection{Relation to niching and crowding}

416: %------------------------------%

417: Unlike fitness uniform optimization, diversity control methods like

418: niching or crowding require a metric $g$ to be defined over the genome

419: space.  By looking at the relationship between $g$ and $f$ we can

420: relate these two types of diversity control: We say that a fitness

421: function $f$ is \emph{smooth} with respect to $g$, if $g( i, j )$

422: being small implies that $|f(i) - f(j)|$ is also small, that is, $d(

423: i, j )$ is small.  This implies that if $d( i, j )$ is not small, $g(

424: i, j )$ also cannot be small.  Thus, if we limit the number of $d$

425: similar individuals, as we do in fitness uniform optimization, this

426: will also limit the number of $g$ similar individuals, as is done in

427: crowding and niching methods.  The advantage of fitness uniform

428: optimization is that we do not need to know what $g$ is, or to compute

429: its value.  Indeed, the above argument is true for \emph{any} metric

430: $g$ on the genome space that $f$ is smooth with respect to.

431:

432: On the other hand, if the fitness function $f$ is not generally smooth

433: with respect to $g$, then such a comparison between the methods cannot

434: be made.  However, in this case an EA is less likely to be effective

435: as small mutations in genome space with respect to $g$ will produce

436: unpredictable changes in fitness.

437:

438: %------------------------------%

439: \subsection{Topologies on individual space $I$}

440: %------------------------------%

441: The distance $d:I \times I \to \SetR_0^+$ induced by the fitness

442: function $f$ is a semi-metric on the individual space $I$ (semi only

443: because $d(i,j)=0$ for $i\neq j$ is possible). The semi-metric induces

444: a topology on $I$. Equal fitness suffices to declare two individuals

445: as $d$-equivalent, i.e. $d$ is a rather small semi-metric in the sense

446: that the induced topology is rather coarse. We will see that a

447: non-zero distance between individuals of different fitness is

448: sufficient to avoiding the population takeover. $d$ induces the

449: coarsest topology (is the ``smallest'' distance) avoiding population

450: takeover.

451:

452: %------------------------------%

453: \subsection{The problem of genetic drift}

454: %------------------------------%

455: Besides elitist selection, the other major cause of diversity loss in

456: a population is genetic drift.  This occurs due to the stochastic

457: nature of the selection operator breeding some individuals more often

458: than others.  In a finite population this will cause some individuals

459: to be replaced which have no close relatives, thus reducing diversity.

460: Indeed, without a sufficient rate of mutation, eventually a population

461: will converge on a single genome; even if no selection pressure is

462: applied.

463:

464: Although fitness uniform optimization does not attempt to address this

465: problem, some implications can be drawn.  Clearly, with fitness

466: uniform optimization a complete collapse in diversity is impossible as

467: individuals with a wide range of fitness values are always preserved

468: in the population.  However, within a given fitness level genetic

469: drift can occur, although the sustained presence of many individuals

470: in other fitness levels to breed with will reduce this effect.

471:

472: Theoretical analysis of genetic drift is often performed by

473: calculating the Markov chain transition matrices to compute the time

474: for the system to reach an absorption state where all of the

475: population members have the same genome.  As these results can be

476: difficult to generalize, an alternative approach has been to measure

477: genetic drift by measuring the loss in fitness diversity in a

478: population over time \cite{Rogers:99gd}.  This is interesting as

479: fitness uniform optimization attempts to maximize the entropy of the

480: fitness values in the population, producing a very high variance in

481: population fitness.  Thus, at least according to the second method of

482: analysis, very little genetic drift would be evident in the

483: population.

484:

485:

486: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

487: \section{Fitness Uniform Selection Scheme (FUSS)}\label{secFuss}

488: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

489:

490: %------------------------------%

491: \subsection{Discrete fitness function}

492: %------------------------------%

493: In this section we propose a new selection scheme, which limits

494: the fraction of $d$-similar individuals. For simplicity we start

495: with a fitness function $f:I\to F$ with discrete equi-spaced

496: values $F=\{f_{min},f_{min}+\eps,f_{min}+2\eps,...,

497: f_{max}-\eps,f_{max}\}$. We call two individuals $i$ and $j$

498: $\delta$-similar if $d(i,j)\equiv|f(i)-f(j)|\leq\delta$. The

499: continuous valued case $F=[f_{min},f_{max}]$ is considered

500: later. In the following we assume $\delta<\eps$. In this case,

501: two individuals are $\delta$-similar if and only if they have the

502: same fitness.

503:

504: %------------------------------%

505: \subsection{The goal}

506: %------------------------------%

507: We have argued that in order to escape local optima, genetic

508: variety should be preserved somehow. One way is to limit the

509: number of $\delta$-similar individuals in the population. In an exact

510: fitness uniform distribution there would be $|P|/|F|$ individuals

511: for each of the $|F|$ fitness values, i.e.\ each fitness level

512: would be occupied by a fraction of $1/|F|$ individuals.

513: The following selection scheme asymptotically transforms any

514: finite population into a fitness uniform one.

515:

516: %------------------------------%

517: \subsection{The fitness uniform selection scheme (FUSS)}

518: %------------------------------%

519: FUSS is defined as follows: randomly select a fitness value $f$

520: uniformly from the fitness values $F$.  Then, uniformly at random

521: select an individual $i\in P$ with fitness $f$. Add another copy of

522: $i$ to $P$.

523:

524: Note the two stage uniform selection process which is very

525: different from a one step uniform selection of an individual of

526: $P$ (see Figure \ref{figsel}).

527: \begin{figure}

528: \centerline{\includegraphics[width=1.0\columnwidth,height=0.6\textheight]{select.eps}}

529: \caption{\label{figsel}Effects of proportionate, truncation,

530: ranking \& tournament, uniform, and fitness uniform (FUSS) selection

531: on the fitness distribution in a generation based EA.  The left/right

532: diagrams depict fitness distributions before/after applying the

533: selection schemes depicted in the middle diagrams.  Note that for

534: populations with a non-Gaussian distribution of fitness values (left

535: column), the graph of selection probability vs. fitness for FUSS

536: (center bottom) can be totally different to that illustrated above,

537: however the population distribution that results (right bottom) will be

538: the same.}

539: \end{figure}

540: In STD, inertia increases with population size. A large mass of

541: unfit individuals reduces the probability of selecting fit

542: individuals. This is not the case for FUSS. Hence, without loss of

543: performance, we can define a {\em pure model}, in which no

544: individual is ever deleted; the population size increases with

545: time. No genetic material is ever discarded and no fine-tuning in

546: population size is necessary. What may prevent the pure model from

547: being applied to practical problems are not computation time

548: issues, but memory problems.

549: If space becomes a problem we delete random individuals, as is

550: usually done with a steady state EA.

551:

552: %------------------------------%

553: \subsection{Asymptotically fitness uniform distribution}

554: %------------------------------%

555: The expected number of

556: individuals per fitness level $f$ after $t$ selections is

557: $n_t(f)=n_0(f)+t/|F|$, where $n_0(f)$ is the initial distribution.

558: Hence, asymptotically each fitness level gets occupied uniformly

559: by a fraction

560: \beqn

561:   {n_t(f)\over |P_t|} \;=\;

562:   {n_0(f)+t/|F|\over |P_0|+t} \;\to\; {1\over |F|}

563:   \quad\mbox{for}\quad t\to\infty,

564: \eeqn

565: where $P_t$ is the population at time $t$. The same limit holds if

566: each selection is accompanied by uniformly deleting one individual

567: from the (now constant sized) population.

568:

569: %------------------------------%

570: \subsection{Fitness gaps and continuous fitness}

571: %------------------------------%

572: We made two unrealistic assumptions. First, we assumed that each

573: fitness level is initially occupied. If the smallest/largest

574: fitness values in $P_t$ are $f_{min/max}^t$ we extend the

575: definition of FUSS by selecting a fitness value $f$ uniformly in

576: the interval $[f_{min}^t-\odt\eps,f_{max}^t+\odt\eps]$ and an

577: individual $i\in P_t$ with fitness nearest to $f$ (see Figure

578: \ref{figfuss}). This also covers the case when there are missing

579: intermediate fitness values, and also works for continuous valued

580: fitness functions ($\eps\to 0$).

581:

582: \begin{figure}

583: \centerline{\includegraphics[width=1.0\columnwidth,height=0.2\textheight]{FussPop.eps}}

584: \caption{\label{figfuss}If the lowest/highest fitness values

585: in the current population $P$ are $f_{min/max}$, FUSS selects a

586: fitness value $f$ uniformly in the interval $[f_{min},f_{max}]$,

587: then, the individual $i\in P$ with fitness nearest to $f$ is

588: selected and a copy is added to $P$, possibly after mutation and

589: recombination.}

590: \end{figure}

591:

592: %------------------------------%

593: \subsection{Mutation and recombination}

594: %------------------------------%

595: The second assumption was that there is no mutation and

596: recombination. In the presence of a small mutation and/or

597: recombination rate eventually each fitness level will become

598: occupied and the occupation fraction is still asymptotically

599: approximately uniform. For larger rate the distribution will be no

600: longer uniform, but the important point is that the occupation

601: fraction of {\em no} fitness level decreases to zero for

602: $t\to\infty$, unlike for STD.

603: Furthermore, FUSS selects by construction uniformly in the fitness

604: levels, even if the levels are not uniformly occupied.

605:

606: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

607: \section{Properties of FUSS}\label{secProp}

608: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

609:

610: \begin{figure}

611: \centerline{\includegraphics[width=1\columnwidth,height=0.2\textheight]{popinit.eps}}

612: \centerline{\includegraphics[width=1\columnwidth,height=0.2\textheight]{popstd.eps}}

613: \centerline{\includegraphics[width=1\columnwidth,height=0.2\textheight]{popfuss.eps}}

614: \caption{\label{figevolve}Evolution of the population under

615: FUSS versus standard selection schemes (STD): STD may get stuck

616: in a local optimum if all unfit individuals were eliminated too

617: quickly. In FUSS, all fitness levels remain occupied with ``free''

618: drift within and in-between fitness levels, from which new mutants

619: are steadily created, occasionally leading to further

620: evolution in a more promising direction.}

621: \end{figure}

622:

623: %------------------------------%

624: \subsection{FUSS effectively favors fit individuals}

625: %------------------------------%

626: FUSS preserves diversity better than STD, but the latter have a

627: (higher) selection pressure toward higher fitness, which is

628: necessary for optimization. At first glance it seems that there is

629: no such pressure at all in FUSS, but this is deceiving. As FUSS

630: selects uniformly in the fitness levels, individuals of low

631: populated fitness levels are effectively favored. The probability

632: of selecting a specific individual with fitness $f$ is inversely

633: proportional to $n_t(f)$ (see Figure \ref{figsel}). In an initial typical

634: (FUSS) population there are many unfit and only a few fit

635: individuals. Hence, fit individuals are effectively favored until

636: the population becomes fitness uniform. Occasionally, a new higher

637: fitness level is discovered and occupied by a single new

638: individual, which then, again, is favored.

639:

640: %------------------------------%

641: \subsection{No takeover in FUSS}

642: %------------------------------%

643: With FUSS, takeover of the highest fitness level never happens.  The

644: concept of takeover time \cite{Goldberg:91} is meaningless for

645: FUSS. The fraction of fittest individuals in a population is always

646: small. This implies that the average population fitness is always much

647: lower than the best fitness. Actually, a large number of fit

648: individuals is usually not the true optimization goal. A single

649: fittest individual usually suffices to solve the optimization task.

650:

651: %------------------------------%

652: \subsection{FUSS may also favor unfit individuals}

653: %------------------------------%

654: Note, if it is also difficult to find individuals of low fitness,

655: i.e.\ if there are only a few individuals of low fitness, FUSS will

656: also favor these individuals. Half of the time is ``wasted'' in

657: searching on the wrong end of the fitness scale. This possible

658: slowdown by a factor of 2 is usually acceptable. In Section

659: \ref{secEx} we will see that in certain circumstances this

660: behavior can actually speedup the search. In general, fitness

661: levels which are difficult to reach, are favored.

662:

663: %------------------------------%

664: \subsection{Distribution within a fitness level}

665: %------------------------------%

666: Within a fitness level there is no selection pressure which could

667: further exponentially decrease the population in certain regions

668: of the individual space. This (exponential) reduction is the major

669: enemy of diversity, which is suppressed by FUSS. Within a fitness

670: level, the individuals freely drift around (by mutation).

671: Furthermore, there is a steady stream of individuals into and out

672: of a level by (d)evolution from (higher)lower levels.

673: Consequently, FUSS develops an equilibrium distribution which is

674: nowhere zero. This does not mean that the distribution within a

675: level is uniform. For instance, if there are two (local) maxima of

676: same height, a very broad one and a very narrow one, the broad one

677: may be populated much more than the narrow one, since it is much

678: easier to ``find''.

679:

680: %------------------------------%

681: \subsection{Steady creation of individuals from every fitness level}

682: %------------------------------%

683: In STD, a wrong step (mutation) at some point in evolution might cause

684: further evolution in the wrong direction. Once a local optimum has

685: been found and all unfit individuals were eliminated it is very

686: difficult to undo the wrong step. In FUSS, all fitness levels remain

687: occupied from which new mutants are steadily created, occasionally

688: leading to further evolution in a more promising direction (see Figure

689: \ref{figevolve}).

690:

691: %------------------------------%

692: \subsection{Transformation properties of FUSS}

693: %------------------------------%

694: FUSS (with continuous fitness) is independent of a scaling and a

695: shift of the fitness function, i.e.\ FUSS($\tilde f$) with $\tilde

696: f(i):=a\cdot f(i)+b$ is identical to FUSS($f$). This is true

697: even for $a<0$, since FUSS searches for maxima {\em and}

698: minima, as we have seen. It is not independent of a non-linear

699: (monotone) transformation unlike tournament, ranking and

700: truncation selection. The non-linear transformation properties are

701: more like the ones of proportionate selection.

702:

703:

704: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

705: \section{Fitness Uniform Deletion Scheme (FUDS)}\label{secFUDS}

706: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

707:

708: For a steady state evolutionary algorithm each cycle of the system

709: consists of both selecting which individual or individuals to

710: crossover and mutate, and then selecting which individual is to be

711: deleted in order to make space for the new child.  The usual deletion

712: scheme used is \emph{random deletion} as this is neutral in the sense

713: that it does not bias the distribution of the population in any way

714: and does not require additional work to be done, such as evaluating

715: the similarity of individuals based on their genes.  Another common

716: strategy is to use an elitist deletion scheme.

717:

718: Here we propose to use the similarity semi-metric $d$ defined in

719: Section \ref{secSim} to achieve a uniform distribution across fitness

720: levels, like with FUSS, except that we achieve this by selectively

721: deleting those members of the population which have very commonly

722: occurring fitness values.  Of course this leaves the selection scheme

723: unspecified, indeed we may use any standard selection scheme such as

724: tournament selection in combination with FUDS.  It also means that we

725: lose one of the nice features of FUSS as we now need to manually tune

726: the selection intensity for our application --- FUSS of course is

727: parameterless.  Nevertheless it allows us to give many FUSS like

728: properties to an existing EA using a standard selection scheme with

729: only a minor modification to the deletion scheme.

730:

731: The intuition behind why FUDS preserves population diversity is very

732: simple: If an individual has a fitness value which is very rare in the

733: population then this individual almost certainly contains unique

734: information which, if it were to be deleted, would decrease the total

735: population diversity.  Conversely, if we delete an individual with

736: very commonly occurring fitness then we are unlikely to be losing

737: significant diversity.  Presumably most of these individuals are

738: common in some sense and likely exist in parts of the solution space

739: which are easy to reach.  Thus the fitness uniform deletion strategy

740: is now clear: Only delete individuals with very commonly occurring

741: fitness values as these individuals are less likely to contain

742: important genetic diversity.

743:

744: Practically FUDS is implemented as follows.  Let $f_{min}$ and

745: $f_{max}$ be the minimum and maximum fitness values possible for a

746: problem, or at least reasonable upper and lower bounds.  We divide the

747: interval $[f_{min}, f_{max}]$ into a collection of subintervals of

748: equal length $\{ [f_{min}, f_{min} + a ), [f_{min} + a, f_{min} + 2a

749: ), \ldots, [f_{max}-a, f_{max}] \}$ which we call \emph{fitness

750: levels}.  As individuals are added to the population their fitness is

751: computed and they are placed in the set of individuals corresponding

752: to the fitness level they belong to.  Thus the number of individuals

753: in each fitness level describes how common fitness values within this

754: interval are in the current population.  When a deletion is required

755: the algorithm locates the fitness level with the greatest number of

756: individuals and then deletes a random individual from this level.  In

757: the case where multiple fitness levels have maximal size the lowest of

758: these levels is used.

759:

760: If the number of fitness levels is chosen too low, say 5 levels, then

761: the resulting model of the distribution of individuals across the

762: fitness range will be too coarse.  Alternatively if a large number of

763: fitness levels is used with a very small population the individuals

764: may become too thinly spread across the fitness levels.  While in

765: these extreme cases this could affect the performance of FUDS, in

766: practice we have found that the system is not very sensitive to the

767: setting of this parameter.  If $n$ is the population size then setting

768: the number of fitness levels to be $\sqrt{n}$ is a good rule of thumb.

769:

770: For discrete valued fitness functions there is a natural lower bound on

771: the interval length $a$ because below a certain value there will be

772: more intervals than unique fitness values.  Of course this cannot

773: happen when the fitness function is continuous.  Other than this small

774: technical detail, the two cases are treated identically.

775:

776: As FUDS spreads the individuals out across a wide range of fitness

777: values, for small populations the EA may become inefficient as only a

778: few individuals will have relatively high fitness.  For problems which

779: are not deceptive this is especially true as there will be little

780: value in having individuals in the population with low to medium

781: fitness.  Of course these are not the kinds of problems for which FUDS

782: was designed.  In practice we have always used populations of between

783: 250 and 5,000 individuals and have not observed a decline in

784: performance relative to random deletion at the lower end of this

785: range.

786:

787: An alternative implementation that avoids discretization is to choose

788: the two individuals that have the most similar fitness and delete one

789: of them. An efficient implementation keeps a list of the individuals

790: ordered by their fitness along with an ordered list of the distances

791: between the individuals.  Then in each cycle one of the two

792: individuals with closest fitness to each other is selected for

793: deletion.  Although the performance of this algorithm was better than

794: random deletion, it was not as good as the implementation of FUDS

795: using bins.  We conjecture that the reason for this is as follows:

796: When there are just a few very fit individuals in the population it is

797: quite likely that they will be highly related to each other and have

798: very similar fitness.  This means that if we delete the individuals

799: with most similar fitness it is likely that many of the very fit

800: individuals will be deleted.  However with the bins approach this will

801: not happen as there are typically few individuals in the high fitness

802: bins.  Thus, although deleting one of the closest individuals in terms

803: of fitness might preserve diversity well, it also changes the pressure

804: on the population distribution over fitness levels.  This small change

805: in distribution dynamics appears to reduce performance in practice.

806:

807:

808: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

809: \section{Properties of FUDS}\label{secPropFUDS}

810: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

811:

812: As FUDS uniformly distributes the population across fitness levels,

813: like FUSS does, many of the key properties of FUSS also carry over to

814: an EA that is using a standard selection scheme (STD) combined with

815: FUDS deletion.

816:

817: %------------------------------%

818: \subsection{No takeover in FUDS}

819: %------------------------------%

820: Under FUDS the takeover of the highest fitness level, or indeed any

821: fitness level, is impossible.  This is easy to see because as soon as

822: any fitness level starts to dominate, all of the deletions become

823: focused on this level until it is no longer the most populated fitness

824: level.  As a by-product, this also means that individuals on

825: relatively unpopulated fitness levels are preserved.

826:

827: %------------------------------%

828: \subsection{Steady creation of individuals from every fitness level}

829: %------------------------------%

830: Another similarity with FUSS is the steady creation of individuals on

831: many different fitness levels.  This occurs because under FUDS some

832: individuals on each fitness level are always kept.  This makes it

833: relatively easy for the EA to find its way out of local optima as it

834: keeps on exploring evolutionary paths which do not at first appear to

835: be promising.

836:

837: %------------------------------%

838: \subsection{Robust performance with respect to selection intensity}

839: %------------------------------%

840: Because FUDS is only a deletion scheme, we still need to choose a

841: selection scheme for the EA.  Of course this selection scheme may then

842: require us to set a selection intensity parameter.  While this is not

843: as desirable as FUSS, which has no such parameter, at least with FUDS

844: we expect the performance of the system to be less sensitive to the

845: correct setting of this parameter.  For example, if the selection

846: intensity is set too high the normal problem is that the population

847: rushes into a local optimum too soon and becomes stuck before it has

848: had a chance to properly explore the genotype space for other

849: promising regions.  However, as we noted above, with FUDS a total

850: collapse in population diversity is impossible.  Thus much higher

851: levels of selection intensity may be used without the risk of

852: premature convergence.

853:

854: In some situations if very low section intensity is used along with

855: random deletion, the population tends not to explore the higher areas

856: of the fitness landscape at all.  This can be illustrated by a simple

857: example.  Consider a population which contains 1,000 individuals.

858: Under random deletion all of these individuals, including the highly

859: fit ones, will have a 1 in 1,000 chance of being deleted in each cycle

860: and so the expected life time of an individual is 1,000 deletion

861: cycles.  Thus if a highly fit individual is to contribute a child of

862: the same fitness or higher, it must do so reasonably quickly.  However

863: for some optimization problems the probability of a fit individual

864: having such a child when it is selected is very low, so low in fact

865: that it is more likely to be deleted before this happens.  As a result

866: the population becomes stuck, unable to find individuals of greater

867: fitness before the fittest individuals are killed off.

868:

869: The usual solution to this problem is to increase the selection

870: intensity because then the fit individuals are selected more often and

871: thus are more likely to contribute a child of similar or greater

872: fitness before they are deleted.  Another is to change the deletion

873: scheme so that these individuals live longer.  This is what happens

874: with FUDS as rare fit individuals are not deleted.  Effectively it

875: means that with FUDS we can often use much lower selection intensity

876: without the population becoming stuck.

877:

878: %------------------------------%

879: \subsection{Transformation properties of FUDS}

880: %------------------------------%

881: While with FUDS we have the added complication of having to choose the

882: number of subintervals with which to break up the fitness values, this

883: number is only a function of the population size and distributional

884: characteristics of the problem.  Thus any linear transformation of the

885: fitness function has no effect on FUDS.  However, non-linear

886: transformations will affect performance.

887:

888: %------------------------------%

889: \subsection{Problem and representation independence}

890: %------------------------------%

891: Because FUDS only requires the fitness of individuals, the method is

892: completely independent of the problem and genotype representation,

893: i.e.\ how the individuals are coded.

894:

895: %------------------------------%

896: \subsection{Simple implementation and low computational cost}

897: %------------------------------%

898: As the algorithm is simple and the fitness function is given as part

899: of the problem specification, FUDS is very easy to implement and

900: requires few computational resources.

901:

902:

903: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

904: \section{A Simple Example}\label{secEx}

905: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

906:

907: In the following, we use a simple example problem to compare the

908: performance of fitness uniform selection (FUSS), random search (RAND)

909: and standard selection (STD), each used both with and without

910: recombination.  We also examine the performance of standard selection

911: when used with the fitness uniform deletion scheme (FUDS).  We regard

912: this problem as a prototype for deceptive multi-modal functions.  The

913: example demonstrates how FUSS and FUDS can be superior to RAND and STD

914: in some situations.  More generic situations will be considered in

915: Section \ref{secTree}.  An experimental analysis of this problem

916: appears in Section~\ref{secEx2}.

917:

918: %------------------------------%

919: \subsection{Simple 2D example}

920: %------------------------------%

921: Consider individuals $(x,y)\in I:=[0,1]\times[0,1]$,

922: which are tuples of real numbers, each coordinate in the interval $[0,1]$.

923: The example models individuals possessing up to 2 ``features''.

924: Individual $i$ possesses feature $I_1$ if

925: $i\in I_1:=[a,a+\Delta]\times[0,1]$, and feature $I_2$

926: if $i\in I_2:=[0,1]\times[b,b+\Delta]$.

927: The fitness function $f:I\to\{1,2,3\}$ is defined as

928: \beqn

929:   f(x,y) = \left\{

930:   \begin{array}{l}

931:     1 \quad\mbox{if}\quad (x,y)\in I_1\backslash I_2, \\

932:     2 \quad\mbox{if}\quad (x,y)\in I_2\backslash I_1, \\

933:     3 \quad\mbox{if}\quad (x,y)\not\in I_1\cup I_2, \\

934:     4 \quad\mbox{if}\quad (x,y)\in I_1\cap I_2. \\

935:   \end{array}\right.

936: \parbox{2cm}{\hfill \unitlength=0.6mm

937: %\linethickness{0.4pt}

938: \begin{picture}(45,45)

939: \scriptsize

940: \put(5,5){\vector(0,1){40}}

941: \put(5,5){\vector(1,0){40}}

942: \put(20,5){\line(0,1){35}}

943: \put(25,5){\line(0,1){35}}

944: \put(40,5){\line(0,1){35}}

945: \put(5,15){\line(1,0){35}}

946: \put(5,20){\line(1,0){35}}

947: \put(5,40){\line(1,0){35}}

948: \put(22.5,17.5){\makebox(0,0)[cc]{4}}

949: \put(12.5,30){\makebox(0,0)[cc]{3}}

950: \put(22.5,30){\makebox(0,0)[cc]{1}}

951: \put(32.5,30){\makebox(0,0)[cc]{3}}

952: \put(32.5,10){\makebox(0,0)[cc]{3}}

953: \put(12.5,10){\makebox(0,0)[cc]{3}}

954: \put(12.5,17.5){\makebox(0,0)[cc]{2}}

955: \put(32.5,17.5){\makebox(0,0)[cc]{2}}

956: \put(22.5,10){\makebox(0,0)[cc]{1}}

957: \put(44,2.5){\makebox(0,0)[cc]{$x$}}

958: \put(22.5,2.5){\makebox(0,0)[cc]{$\Delta$}}

959: \put(20,3.5){\makebox(0,0)[cc]{$a$}}

960: \put(2.5,17.5){\makebox(0,0)[cc]{$\Delta$}}

961: \put(4,14.5){\makebox(0,0)[cc]{$b$}}

962: \put(40,3){\makebox(0,0)[cc]{1}}

963: \put(3.5,40){\makebox(0,0)[cc]{1}}

964: \put(2.5,44){\makebox(0,0)[cc]{$y$}}

965: \put(22.5,42.5){\makebox(0,0)[cc]{$f(x,y)$}}

966: \end{picture}

967: }

968: \eeqn

969: We assume $\Delta\ll 1$. Individuals with neither of the two features

970: ($i\in I\backslash(I_1\cup I_2)$) have fitness $f=3$. These ``local

971: $f=3$ optima'' occupy most of the individual space $I$, namely a

972: fraction $(1-\Delta)^2$. It is disadvantageous for an individual to

973: possess only one of the two features ($i\in(I_1\backslash I_2)\cup

974: (I_2\backslash I_1)$), since $f=1$ or 2 in this case. In combination

975: ($i\in I_1\cap I_2)$), the two features lead to the highest fitness,

976: but the global maximum $f=4$ occupies the smallest fraction $\Delta^2$

977: of the individual space $I$. With a fraction $\Delta(1-\Delta)$, the

978: $f=1/f=2$ minima are in between. The example has sort of an XOR

979: structure, which is hard for many optimizers.

980:

981: %------------------------------%

982: \subsection{Random search}

983: %------------------------------%

984: Individuals are created uniformly in the unit square. The ``local

985: optimum'' $f=3$ is easy to ``find'', since it occupies nearly the

986: whole space. The global optimum $f=4$ is difficult to find, since it

987: occupies only $\Delta^2\ll 1$ of the space. The expected time, i.e.\

988: the expected number of individuals created and tested until one with

989: $f=4$ is found, is $T_{RAND}={1\over\Delta^2}$. Here and in the

990: following, the ``time'' $T$ is defined as the expected number of

991: created individuals until the {\it first} optimal individual (with

992: $f=4$) is found. $T$ is neither a takeover time nor the number of

993: generations (we consider steady-state EAs).

994:

995: %------------------------------%

996: \subsection{Random search with crossover}

997: %------------------------------%

998: Let us occasionally perform a recombination of individuals in the

999: current population. We combine the $x$-coordinate of one uniformly

1000: selected individual $i_1$ with the $y$ coordinate of another

1001: individual $i_2$. This crossover operation maintains a uniform

1002: distribution of individuals in $[0,1]^2$. It leads to the global

1003: optimum if $i_1\in I_1$ and $i_2\in I_2$. The probability of

1004: selecting an individual in $I_i$ is

1005: $\Delta(1-\Delta)\approx\Delta$ (we assumed that the global

1006: optimum has not yet been found). Hence, the probability that $I_1$

1007: crosses with $I_2$ is $\Delta^2$. The time to find the global

1008: optimum by random search including crossover is still

1009: $\sim{1\over\Delta^2}$ ($\sim$ denotes asymptotic proportionality).

1010:

1011: %------------------------------%

1012: \subsection{Mutation}

1013: %------------------------------%

1014: The result remains valid (to leading order in ${1\over\Delta}$)

1015: if, instead of a random search, we uniformly select an individual

1016: and mutate it according to some probabilistic, sufficiently mixing

1017: rule, which preserves uniformity in $[0,1]$. One popular such

1018: mutation operator is to use a sufficiently long binary

1019: representation of each coordinate, like in genetic algorithms, and

1020: flip a single bit. For simplicity we assume in the following a

1021: mutation operator which replaces with probability $\odt/\odt$ the

1022: first/second coordinate by a new uniform random number. Other

1023: mutation operators which mutate with probability $\odt/\odt$ the

1024: first/second coordinate, preserve uniformity, are sufficiently

1025: mixing, and leave the other coordinate unchanged (like the

1026: single-bit-flip operator) lead to the same scaling of $T$ with

1027: $\Delta$ (but with different proportionality constants).

1028:

1029: %------------------------------%

1030: \subsection{Standard selection with crossover}

1031: %------------------------------%

1032: The $f=1$ and $f=2$ individuals contain useful building

1033: blocks, which could speedup the search by a suitable selection and

1034: crossover scheme. Unfortunately, the standard selection schemes

1035: favor individuals of higher fitness and will diminish the

1036: $f=1/f=2$ population fraction. The probability of

1037: selecting $f=1/f=2$ individuals is even smaller than in

1038: random search. Hence $T_{STD}\sim{1\over\Delta^2}$. Standard

1039: selection does not improve performance, even not in combination

1040: with crossover, although crossover is well suited to produce the

1041: needed recombination.

1042:

1043: %------------------------------%

1044: \subsection{FUSS}

1045: %------------------------------%

1046: At the beginning, only the $f=3$ level is occupied and

1047: individuals are uniformly selected and mutated. The expected time

1048: until an $f=1$ or $f=2$ individual in $I_1\cup I_2$ is created is

1049: $T_1\approx{1\over \Delta}$ (not ${1\over 2\Delta}$, since only

1050: one coordinate is mutated). From this time on FUSS will select one

1051: half(!) of the time the $f=1/f=2$ individual(s) and only the

1052: remaining half the abundant $f=3$ individuals. When level

1053: $f=1$ {\em and} level $f=2$ are occupied, the selection

1054: probability is ${1\over 3}+{1\over 3}$ for these levels.

1055: With probability $\odt$ the

1056: mutation operator will mutate the $y$ coordinate of an individual

1057: in $I_1$ or the $x$ coordinate of an individual in $I_2$ and

1058: produces a new $f=1/2/4$ individual. The relative probability

1059: of creating an $f=4$ individual is $\Delta$. The expected time

1060: to find this global optimum from the $f=1/f=2$ individuals, hence,

1061: is $T_2=[({1\over 2}...{2\over 3})\times{1\over

1062: 2}\times\Delta]^{-1}$. The total expected time is

1063: $T_{FUSS}\approx T_1+T_2= {4\over\Delta}...{5\over\Delta}\ll

1064: {1\over\Delta^2}\sim T_{STD}$. FUSS is much faster by exploiting

1065: unfit $f=1/f=2$ individuals. This is an example where (local)

1066: minima can help the search. Examples where a low local maxima

1067: can help in finding the global maximum, but where standard

1068: selection sweeps over too quickly to higher but useless local

1069: maxima, can also be constructed.

1070:

1071: %------------------------------%

1072: \subsection{FUSS with crossover}

1073: %------------------------------%

1074: The expected time until an $f=1$ individual in $I_1$ and an

1075: $f=2$ individual in $I_2$ is found is $T_1\sim{1\over

1076: \Delta}$, even with crossover. The probability of selecting an

1077: $f=1/f=2$ individual is ${1\over 3}/{1\over 3}$. Thus, the

1078: probability that a crossing operation crosses $I_1$ with $I_2$ is

1079: $({1\over 3})^2$. The expected time to find the global optimum

1080: from the $f=1/f=2$ individuals, hence, is $T_2=9\cdot O(1)$,

1081: where the $O(1)$ factor depends on the frequency of crossover

1082: operations. This is far faster than by STD, even if the

1083: $f=1/f=2$ levels were local maxima, since to get a high standard

1084: selection probability, the level has first to be taken over, which

1085: itself needs some time depending on the population size. In FUSS a

1086: single $f=1$ and a single $f=2$ individual suffice to

1087: guarantee a high selection probability and an effective crossover.

1088: Crossover does not significantly decrease the {\em total} time

1089: $T_{FUSSX}\approx T_1+T_2\sim {1\over \Delta}+O(9)$, but for a

1090: suitable 3D generalization we get a large speedup by a factor of

1091: ${1\over\Delta}$.

1092:

1093: %------------------------------%

1094: \subsection{FUDS with crossover}

1095: %------------------------------%

1096: Assume that initially all of the individuals have $f=3$ and that we

1097: are using random selection.  For any mutation the probability of the

1098: child being in $I_1 \cup I_2$ is $\Delta$.  Until $I_1 \cup I_2$

1099: becomes quite full FUDS will never delete individuals from these

1100: areas.  Furthermore if an individual in $I_1 \cup I_2$ is mutated then

1101: the mutant will also be in $I_1 \cup I_2$ with probability

1102: $\frac{1}{2}( 1 + \Delta) \gg \Delta$.  Therefore while most of the

1103: population has $f=3$ we can lower bound the probability of a new child

1104: being in $I_1 \cup I_2$ by $\Delta$.  It then follows that if $P$ is

1105: the size of the population we can upper bound the expected time for

1106: $I_1 \cup I_2$ to contain half the total population by $\frac{P}{2}

1107: \frac{1}{\Delta}

1108: \propto \frac{1}{\Delta}$.  Once this occurs (and most likely well

1109: before this point) crossover will produce an individual with $f=4$

1110: almost immediately by crossing a member of $I_1$ with a member of

1111: $I_2$.  Thus $T_{FUDS} \propto \frac{1}{\Delta} \ll \frac{1}{\Delta^2}

1112: \sim T_{STD}$.  This gives FUDS when used with random selection scaling

1113: characteristics which are similar to FUSS.  If we use a selection

1114: scheme with higher intensity our bound on the expected time for half

1115: the population to have $f=3$ remains unchanged as the bound holds in

1116: the worst case situation where only individuals with $f=3$ are

1117: selected.  However higher selection intensity makes the final

1118: crossover required to find an individual with $f=4$ less likely.  For

1119: moderate levels of selection intensity this is clearly not a

1120: significant factor and more importantly it is O(1) and independent of

1121: $\Delta$.  Thus the order of scaling for $T_{FUDS}$ is just

1122: $\frac{1}{\Delta}$ for this difficult problem, which is the same as

1123: $T_{FUSSX}$.

1124:

1125: %------------------------------%

1126: \subsection{Simple 3D example}

1127: %------------------------------%

1128: We generalize the 2D example to D-dimensional individuals

1129: $\vec x\in[0,1]^D$ and a

1130: fitness function

1131: \beqn

1132:   f(\vec x) \;:=\; (D+1)\!\cdot\!\prod_{d=1}^D\chi_d(\vec x)\;

1133:   - \max_{1\leq d\leq D} d\!\cdot\!\chi_d(\vec x)\; +D+1,

1134: \eeqn

1135: where $\chi_d(\vec x)$ is the characteristic function of feature

1136: $I_d$

1137: \beqn

1138:   \chi_d(\vec x) \;:=\; \left\{

1139:   \begin{array}{l}

1140:     1 \quad\mbox{if}\quad a_i\leq x_i\leq a_i+\Delta, \\

1141:     0 \quad\mbox{else.} \\

1142:   \end{array}\right.

1143: \eeqn

1144: For $D=2$, $f$ coincides with the 2D example. For $D=3$,

1145: the fractions of $[0,1]^3$ where $f=1/2/3/4/5$ are approximately

1146: $\Delta^2/\Delta^2/\Delta^2/1/\Delta^3$.

1147: With the same line of reasoning we get the following expected

1148: search times for the global optimum:

1149: \beqn

1150:   T_{RAND}\sim T_{STD}\sim {1\over\Delta^3},

1151: \eeqn \beqn

1152:   T_{FUSS}\sim {1\over\Delta^2},\quad

1153:   T_{FUSSX}\sim T_{FUDS} \sim {1\over\Delta}.

1154: \eeqn

1155: This demonstrates the existence of problems where FUSS is much faster

1156: than RAND and STD, and where crossover can give a further boost to

1157: FUSS, even when it is ineffective in combination with STD.

1158:

1159:

1160: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1161: \section{Fitness-Tree Analysis}\label{secTree}

1162: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1163:

1164: \begin{figure}

1165: \centerline{\includegraphics[width=\columnwidth,height=0.25\textheight]{tree.eps}}

1166: \caption{\label{figtree}Generic 2D fitness landscape with

1167: evolution tree. Each connected slice represents a species. A species

1168: is also symbolized by a node in the slice. The number in a slice and

1169: near a node is the fitness value of the species. If individuals from

1170: one species can evolve to individuals of another species, the nodes are

1171: connected by a solid line. Altogether, they form the fitness tree. The

1172: branching factor $b$ is $2$ and the number of species per fitness

1173: level $s$ is $4$ for intermediate fitness values (3,4,5).}

1174: \end{figure}

1175:

1176: %------------------------------%

1177: subsection{The fitness tree model}

1178: %------------------------------%

1179: A general, problem independent comparison of the various

1180: optimization algorithms is difficult. We are interested in the

1181: performance for difficult fitness landscapes with many local

1182: optima.

1183:

1184: We only consider mutation; recombination is discussed in the next

1185: section. The evolutionary neighborhood (not to be confused with

1186: $d$-similarity) of an individual $i$ is defined as the set of

1187: individuals that can be created from $i$ by a single

1188: mutation\footnote{We have ``small'' mutations in mind, e.g.\ single

1189: bit flips, not macro mutations, which connect {\em all}

1190: individuals.}. Two individuals $i$ and $j$ with the same fitness are

1191: defined to belong to the same {\em species} if there is a finite

1192: sequence of mutations which transforms $i$ into $j$ {\em and} all

1193: individuals of the sequence also have fitness $f(i)=f(j)$. Each

1194: fitness level is partitioned in this way into disjoint species. We say

1195: a species of fitness $f+\eps$ can {\em evolve} from a species of

1196: fitness $f$, if there is a mutation which transforms an individual

1197: from the latter species to one of the former. Those species are

1198: connected by an edge in Figures \ref{figtree} and

1199: \ref{figtree1d}. A species is said to be {\em promising} if it

1200: {\em can} evolve to the global optimum $f_{max}$.

1201:

1202:

1203: %------------------------------%

1204: \subsection{Additional definitions and simplifying assumptions}

1205: %------------------------------%

1206: \begin{itemize}\parskip=0ex\parsep=0ex\itemsep=0ex

1207: \item[i)] Evolution which skips fitness levels is ignored, and also

1208: devolution to species of lower fitness other than the

1209: primordial species.

1210: \item[ii)] Random individuals have

1211: lowest fitness $f_{min}$ with high probability, and there is

1212: only one species of fitness $f_{min}$.

1213: \item[iii)] There is a fixed branching factor $b$, i.e.\ each species

1214: can evolve into $b$ improved species, or represents a local optimum

1215: from which no further evolution is possible.

1216: \item[iv)] There is a single global optimum

1217: $f_{max}$ (or $b$ optima to be consistent with the previous item).

1218: \item[v)] There are $s$ different species per fitness level (except

1219: near $f_{min}$ and $f_{max}$ where there must be fewer to be consistent

1220: with the previous items).

1221: \item[vi)] The probability $p$ that an individual evolves to a higher

1222: fitness is very small. In most cases a mutation keeps an

1223: individual within its species or devolves it.

1224: \item[vii)] The probability to evolve to one of the offspring species is

1225: uniform, i.e.\ $1/b$ for all offspring species.

1226: \end{itemize}

1227:

1228: We have the feeling that this picture covers the essential

1229: features of fitness landscapes for difficult problems. The

1230: qualitative conclusions we will draw should still hold when some

1231: or all of the additional simplifying assumptions are violated.

1232:

1233: \begin{figure}

1234: \centerline{\includegraphics[width=\columnwidth,height=0.25\textheight]{tree1d.eps}}

1235: \caption{\label{figtree1d}

1236: Generic fitness function with evolution tree. Individuals which

1237: are evolutionary neighbors are connected by a dashed line. They

1238: belong to the species indicated by a node on the dashed

1239: line. A species which can evolve from another is connected to

1240: it by a solid line. The smooth curve

1241: visualizes (somewhat misleading, since the fitness is discrete)

1242: the fitness function with many local maxima. \vspace{1.1em}}

1243: \end{figure}

1244:

1245: %------------------------------%

1246: \subsection{Example}

1247: %------------------------------%

1248: Consider the case of individuals, which are real-valued $D$

1249: dimensional vectors, i.e.\ $I=\SetR^D$. Let the fitness function

1250: $\tilde f$ be continuous and positive with many local maxima,

1251: which tends to zero for large arguments. This covers a large range

1252: of physical optimization problems. Mutation shall be local in

1253: $\SetR^D$, i.e. $||i_{original}-i_{mutated}||\ll D$. As FUSS and

1254: the fitness tree model is only defined for discrete fitness

1255: functions, we discretize $\tilde f$ to

1256: $f:=\,_\lfloor{1\over\tilde\eps}\tilde f_\rfloor$, which is

1257: acceptable for sufficiently small $\tilde\eps$. A typical fitness

1258: landscape for $D=2$ and $D=1$ together with their fitness tree are

1259: depicted in Figures \ref{figtree} and \ref{figtree1d}. Since

1260: mutation is a local operation, each species is a (possibly

1261: multiply punched) connected slice ($D$-dimensional sub-volume) and

1262: evolution can only occur from $f$ to $f+1$ ($\eps=1$). Assumption

1263: (i) is generally satisfied. The special fitness landscapes

1264: depicted in Figures \ref{figtree} and \ref{figtree1d} also satisfy

1265: (ii,iii,iv,v) with $b=2$ and $s=4$.

1266:

1267: %------------------------------%

1268: \subsection{Random walk}

1269: %------------------------------%

1270: Consider a mutation induced random walk of a single individual.  Due

1271: to the low evolution probability $p\ll 1$, most of the time will be

1272: spent on individuals of the lowest fitness $f_{min}$. As evolution is

1273: a tree, there is only one evolution sequence which leads to the global

1274: optimum. At each evolution step, the correct offspring species (out of

1275: $b$) has to be evolved. The probability of an evolution step in the

1276: right direction, hence, is $p/b$. $|F|$ evolution steps are necessary

1277: to reach $f_{max}$. Therefore, the expected time to find the global

1278: maximum by random walk is $T_{RW}\approx(b/p)^{|F|}$. Random walk is

1279: very slow; it is exponential in the number of fitness levels $|F|$ to

1280: a very large basis $b/p$.

1281:

1282: %------------------------------%

1283: \subsection{FUSS}

1284: %------------------------------%

1285: Assume that $L$ fitness levels from $f_{min}$ to $f$ are occupied.

1286: The probability that FUSS selects an individual of fitness $f$ is

1287: $1/L$. Under this additional assumption that the occupation of species

1288: within one fitness level is approximately uniform most of the time,

1289: the probability of selecting an individual of the promising species,

1290: which can evolve to the global optimum, is $1/s$. The probability of

1291: an evolution step in the right direction is $p/b$ as in the random

1292: walk case. Hence, the total expected time for an evolution in the

1293: right direction is $L\cdot s\cdot b/p$. The total time

1294: $T_{FUSS}\approx\odt|F|^2\cdots\cdot b/p$ for an evolution from $L=1$

1295: to the global optimum $L=|F|$ is obtained by summation over

1296: $L=1...|F|$.

1297:

1298: %------------------------------%

1299: \subsection{FUDS}

1300: %------------------------------%

1301: A similar analysis can be applied to FUDS.  Assume again that the $L$

1302: fitness levels from $f_{min}$ to $f$ are occupied and that the

1303: occupation of species within each fitness level is approximately

1304: uniform most of the time.  Because FUDS tends to spread the population

1305: out, like FUSS, this assumption is not unreasonable.  As FUDS is only

1306: a deletion scheme we must also specify a selection scheme.  For our

1307: analysis we will take a very simple elitist selection scheme that half

1308: of the time selects an individual from the highest fitness level, and

1309: the other half of the time selects an individual from a lower level.

1310: It follows then that the probability of selecting a promising species

1311: is $1/2s$ and the probability that this then results in an

1312: evolutionary step in the right direction is $p/b$.  Thus the total

1313: expected time for an evolutionary step in the right direction is $2

1314: \cdot s \cdot b/p$.  Therefore by summation the total expected time to

1315: evolve to the global optimum is $T_{FUDS} \approx 2|F| \cdot s \cdot

1316: b/p$.  Of course this analysis rests on our choice of selection scheme

1317: and the assumptions about the uniformity of the population that we

1318: have made.  When FUDS is used with selection schemes which are very

1319: greedy these uniformity assumptions will likely be violated and less

1320: favorable bounds could result.

1321:

1322: %------------------------------%

1323: \subsection{Standard selection}

1324: %------------------------------%

1325: We assumed a fixed number of $s$ species per fitness level and $0$ or

1326: $b$ offspring species. This implies that only a fraction of $1/b$

1327: species can evolve to higher fitness. We assume that fitness level

1328: $f$ has been taken over, i.e.\ most individuals have fitness $f$. The

1329: probability of evolution is $p$. A significant fraction (for

1330: simplicity we assume most) of the $|P|$ individuals must evolve to the

1331: next fitness level before evolution with a relevant rate can occur to

1332: the next to next level. Hence, the time to take over the next fitness

1333: level is roughly $|P|\cdot b/p$.  As there are $|F|$ fitness levels,

1334: the total time is $T_{STD}

1335: \approxgeq |F|\cdot|P|\cdot b/p$.

1336:

1337: %\subsection{The problem of the population size}

1338: We wrote $\approxgeq$ as we have made two significant favorable

1339: assumptions. In order to ensure convergence, the promising species in

1340: the current fitness level has to be occupied. If we assume a uniform

1341: occupation of species within one fitness level, as for FUSS, this

1342: means that all species of the current fitness level have to be

1343: populated. As there are $s$ species, $|P|$ has to be at least $s$,

1344: which can be quite large. On the other hand, STD linearly slows down

1345: with $|P|$, unlike FUSS. Hence, there is a trade-off in the choice of

1346: $|P|$.

1347:

1348: %\subsection{The problem of non-promising takeover}

1349: More serious is the following problem. Assume that the first

1350: individual evolved with fitness $f+\eps$ is one in a non-promising

1351: species $a$. Due to selection pressure it might happen that

1352: species $a$ takes over the whole population before all (or at

1353: least the promising) species with fitness $f+\eps$ can evolve from

1354: the ones of fitness $f$. The probability to find the global

1355: optimum in the worst case scenario, where at each level only one

1356: species is occupied, is $(1/b)^{|F|}$. This is the original

1357: problem of the loss of genetic diversity discussed at the outset,

1358: which lead to the invention of FUSS.

1359:

1360: %\subsection{Conventional fix(es)}

1361: Every other fix the authors are aware of only seems to diminish the

1362: problem, but does not solve it. One fix is to repeatedly restart

1363: the EA, but the huge number of $b^{|F|}$ restarts might be

1364: necessary. The time is exponential in $|F|$ like for random walk

1365: but with a smaller basis $b$. The true time is expected to be

1366: somewhere in between $|F|\cdot|P|\cdot b/p$ and this worst

1367: case analysis, although an unfavorable setting may never reach the

1368: global optimum ($T_{STD}=\infty$ in this case).

1369:

1370:

1371: %------------------------------%

1372: \subsection{Performance comparison}

1373: %------------------------------%

1374: The times $T_{FUSS}$, $T_{FUDS}$ and $T_{STD}$ should be regarded, at

1375: best, as rules of thumb, since the derivation was rather heuristic due

1376: to the list of assumptions. The quotient is more reliable:

1377: \beqn

1378:   {T_{FUSS}\over T_{STD}} \quad\approxleq\quad

1379:   {|F|\!\cdot\!s\over 2|P|} \quad\approxleq\quad

1380:   \odt|F| \quad\leq\quad |F|,

1381: \eeqn

1382: and

1383: \beqn

1384:   {T_{FUDS}\over T_{STD}} \quad\approx\quad

1385:   {\frac{s }{|P|}} \quad\approx\quad 1.

1386: \eeqn

1387:

1388: We will give a more direct argument in Section \ref{secCross} that

1389: the slowdown of FUSS relative to STD is at most $|F|$.

1390:

1391: Finally, a truism has been recovered, namely that an EA can, under

1392: certain circumstances, be much faster than random walk, that is,

1393: $T_{RW}\gg T_{FUSS}, T_{FUDS}, T_{STD}$.

1394:

1395:

1396: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1397: \section{Scale-Independent Selection and Recombination}\label{secCross}

1398: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1399:

1400: %------------------------------%

1401: \subsection{Worst case analysis}

1402: %------------------------------%

1403: We now want to estimate the maximal possible slowdown of FUSS

1404: compared to STD.

1405: %\subsection{Best case for STD}

1406: Let us assume that all individuals in STD have fitness $f$, and

1407: once one individual with fitness $f+\eps$ has been found,

1408: takeover of level $f+\eps$ is quick. Let us assume that this

1409: quick takeover is actually good (e.g.\ if there are no local maxima).

1410: The selection probability of individuals of same fitness is equal.

1411: %\subsection{Worst case for FUSS}

1412: For FUSS we assume individuals in the range of $f_{min}$ and $f$.

1413: Uniformity is {\em not} necessary. In the worst case, a selection of

1414: an individual of fitness $<f$ never leads to an individual of

1415: fitness $\geq f$, i.e.\ is always useless. The probability of selecting

1416: an individual with fitness $f$ is $\geq{1\over|F|}$.

1417: %\subsection{Comparison}

1418: At least every $|F|th$ FUSS selection corresponds to a STD

1419: selection. Hence, we expect a maximal slowdown by a factor of

1420: $|F|$, since FUSS ``simulates'' STD statistically every $|F|th$

1421: selection.

1422: %

1423: It is possible to construct problems where this slowdown occurs

1424: (unimodal function, local mutation $x\to x\pm\eps$, no

1425: crossover). Gradient ascent would be the algorithm of choice in this

1426: case. On the other hand, we have not observed this slowdown in our

1427: simple 2D example and the TSP experiments, where FUSS outperformed STD

1428: in solution quality/time (see the experimental results in

1429: Section~\ref{secEx2}). Since real world problems often lie in between

1430: these extreme cases it is desirable to modify FUSS to cope with simple

1431: problems as well, without destroying its advantages for complex

1432: objective functions.

1433:

1434: %------------------------------%

1435: \subsection{Quadratic slowdown due to recombination}

1436: %------------------------------%

1437: We have seen that $T_{FUSS}\leq|F|\cdot T_{STD}$. In the

1438: presence of recombination, a {\em pair} of individuals has to be

1439: selected. The probability that FUSS selects {\em two} individuals

1440: with fitness $f$ is $\geq{1\over|F|^2}$. Hence, in the worst case,

1441: there could be a slowdown by a factor of $|F|^2$ --- for {\em

1442: independent} selection we expect

1443: $T_{FUSS}\leq|F|^2\cdot T_{STD}$. This potential quadratic

1444: slowdown can be avoided by selecting one fitness value at random,

1445: and then two individuals of this single fitness value. For this

1446: {\em dependent} selection, we expect

1447: $T_{FUSS}\leq|F|\cdot T_{STD}$. On the other hand,

1448: crossing two individuals of different fitness can also be

1449: advantageous, like the crossing of $f=1$ with $f=2$

1450: individuals in the 2D example of Section \ref{secEx}.

1451:

1452: %0012(2ei)

1453: %------------------------------%

1454: \subsection{Scale independent selection}

1455: %------------------------------%

1456: A near optimal compromise is possible: a high selection

1457: probability $p(f)\sim 1$ if $f\approx f_{max}$ and $p(f)\sim

1458: {1\over|F|}$ otherwise. A ``scale

1459: independent'' probability distribution $p(f)\sim{1\over|f_{max}-f|}$

1460: is appropriate for this.

1461: %

1462: We define

1463: \beq\label{ptscale}

1464:   p(f) \;:=\; {c\over\ln|F|}\cdot

1465:   {1\over {1\over\eps}|f_{max}-f|+1}.

1466: \eeq

1467: The $+1$ in the denominator has been added to regularize the

1468: expression for $f=f_{max}$. The factor $c/\ln|F|$ ensures

1469: correct normalization ($\sum_f p(f)=1$). By using $\ln{b+1\over

1470: a}\leq\sum_{i=a}^b{1\over i}\leq\ln{b\over a-1}$, one can show

1471: that

1472: $

1473:   {\ln|F|\over 1+\ln|F|} \leq c \leq 1

1474: $

1475: i.e.\ $c\to 1$ for $|F|\to\infty$. In the following we assume

1476: $|F|\geq 3$, i.e.\ $c\geq \odt$.

1477: Apart from a minor additional logarithmic suppression of order

1478: $\ln|F|$ we have the desired behavior $p(f)\sim 1$

1479: for $f\approx f_{max}$ and $p(f)\sim {1\over|F|}$ otherwise:

1480: \beqn

1481:   p(f_{max}-m\eps) \geq {1\over 2\ln|F|} \cdot

1482:   {1\over m+1},

1483: \eeqn

1484: \beqn

1485:   p(f) \geq {1\over 2\ln|F|} \cdot

1486:   {1\over |F|} \quad\forall\,f

1487: \eeqn

1488: During optimization, the minimal/maximal fitness of an individual in

1489: population $P_t$ is $f_{min/max}^t$. In the definition of $p$ one has

1490: to use $F_t:=\{f_{min}^t,f_{min}^t+\eps,...,f_{max}^t\}$ instead of

1491: $F$, i.e.\ $|F|$ replaced with

1492: $|F_t|={1\over\eps}(f_{max}^t-f_{min}^t)+1\leq|F|$. So (\ref{ptscale})

1493: can not be achieved by a static re-parametrization of fitness $f$

1494: replaced with $g(f)$. Furthermore the important idea of sampling from

1495: a fitness level instead of individuals directly is still

1496: maintained. The only difference now is that the population will no

1497: longer converge to a fitness uniform one but to one with distribution

1498: $p(f)$ which is biased toward higher fitness but still never converges

1499: to a fittest individual. In the worst case, we expect a small slowdown

1500: of the order of $\ln|F|$ as compared to FUSS, as well as compared to

1501: STD.

1502:

1503: %------------------------------%

1504: \subsection{Scale independent pair selection}

1505: %------------------------------%

1506: It is possible to (nearly) have the best of independent and

1507: dependent selection: a high selection probability $p(f,f')\sim

1508: {1\over|F|}$ if $f\approx f'$ and $p(f,f')\sim {1\over|F|^2}$

1509: otherwise, with uniform marginal $p(f)={1\over|F|}$. The idea

1510: is to use a strongly correlated joint distribution for selecting a

1511: fitness pair. A ``scale independent'' probability distribution

1512: $p(f,f')\sim{1\over|f-f'|}$ is appropriate. We define the joint

1513: probability $\tilde p(f,f')$ of selecting two individuals of

1514: fitness $f$ and $f'$ and the marginal $\tilde p(f)$ as

1515: \begin{equation} \label{ptjoint}

1516:   \tilde p(f,f') \;:=\; {1\over 2|F|\ln|F|}\cdot

1517:   {1\over {1\over\eps}|f\!-\!f'|+1},

1518: \end{equation}

1519: \[

1520:   \tilde p(f) \;:=\; \sum_{f'\in F}\tilde p(f,f')

1521:   = \sum_{f'\in F}\tilde p(f',f).

1522: \]

1523:

1524: We assume $|F|\geq 3$ in the following. The $+1$ in the

1525: denominator has been added to regularize the expression for

1526: $f=f'$. The factor $(2|F|\ln|F|)^{-1}$ ensures correct

1527: normalization for $|F|\to\infty$. More precisely, using

1528: $\ln{b+1\over a}\leq\sum_{i=a}^b{1\over i}\leq\ln{b\over a-1}$,

1529: one can show that

1530: \beqn

1531:   1-{\textstyle{1\over\ln|F|}} \;\leq\;

1532:   \sum_{f,f'\in F}\tilde p(f,f') \;\leq\; 1,\quad

1533:   \odt \;\leq\; |F|\!\cdot\!\tilde p(f) \;\leq\; 1,

1534: \eeqn

1535: i.e.\ $\tilde p$ is not strictly normalized to $1$ and the

1536: marginal $\tilde p(f)$ is only approximately (within a factor of 2)

1537: uniform. The first defect can be corrected by appropriately

1538: increasing the diagonal probabilities $\tilde p(f,f)$. This also

1539: solves the second problem.

1540: \beq\label{pjoint}

1541:   p(f,f') \;:=\; \left\{

1542:   \begin{array}{ll}

1543:     \tilde p(f,f') & \mbox{for}\quad f\neq f' \\

1544:     \tilde p(f,f')+[{1\over|F|}-\tilde p(f)] &

1545:     \mbox{for}\quad f=f' \

1546:   \end{array}

1547: \right.

1548: \eeq

1549:

1550: %------------------------------%

1551: \subsection{Properties of $p(f,f')$}

1552: %------------------------------%

1553: $p$ is normalized to $1$ with uniform marginal

1554: \[

1555:   p(f):= \sum_{f'\in F} p(f,f') = {1\over|F|},

1556: \]

1557: \[

1558:   \sum_{f,f'\in F} p(f,f') =

1559:   \sum_{f\in F} p(f) = 1,

1560: \]

1561: \[

1562:   p(f,f')\geq \tilde p(f,f').

1563: \]

1564: Apart from a minor additional logarithmic suppression of order

1565: $\ln|F|$ we have the desired behavior $p(f,f')\sim {1\over|F|}$

1566: for $f\approx f'$ and $p(f,f')\sim {1\over|F|^2}$ otherwise:

1567: \[

1568:   p(f,f\pm m\eps) \geq {1\over 2\ln|F|} \cdot

1569:   {1\over m+1} \cdot {1\over|F|},

1570: \]

1571: \[

1572:   p(f,f') \geq {1\over 2\ln|F|} \cdot

1573:   {1\over |F|^2}.

1574: \]

1575: During optimization, the minimal/maximal fitness of an individual in

1576: population $P_t$ is $f_{min/max}^t$. In the definition of $p$ one has

1577: to use $F_t:=\{f_{min}^t,f_{min}^t+\eps,...,f_{max}^t\}$ instead of

1578: $F$, i.e.\ $|F|$ replaced with

1579: $L:=|F_t|={1\over\eps}(f_{max}^t-f_{min}^t)+1\leq|F|$.

1580:

1581: %------------------------------%

1582: \subsection{Scale-Independent Deletion}

1583: %------------------------------%

1584: Just as the selection scheme FUSS has its dual in the deletion scheme

1585: FUDS, we can likewise create the dual of Scale-Independent Selection

1586: in the form of Scale-Independent Deletion.  Thus rather than targeting

1587: deletion from the population so that the distribution becomes flat, as

1588: we do with FUDS, we now define a convex curve $g$ which is peaked at

1589: the fittest individual in the population and delete the population

1590: down so that it follows the shape of this curve.  This retains some of

1591: the advantages of FUDS, for example the population cannot collapse to

1592: just a few fitness levels, and yet it recognizes that for many

1593: problems it is useful to bias the population distribution toward fit

1594: individuals.  Of course such problems are less deceptive than the kind

1595: that FUSS and FUDS are intended for.

1596:

1597: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1598: \section{Continuous Fitness Functions}\label{secCont}

1599: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1600:

1601: %------------------------------%

1602: \subsection{Effective discretization scale}

1603: %------------------------------%

1604: Up to now we have considered a discrete valued fitness function

1605: with values in $F=\{f_{min},f_{min}+\eps,...,f_{max}\}$.

1606: In many practical problems, the fitness function is continuous

1607: valued with $F=[f_{min},f_{max}]$. We generalize FUSS, and some of

1608: the discussion of the previous sections to the continuous case by

1609: replacing the discretization scale $\eps$ by an effective

1610: (time-dependent) discretization scale $\hat\eps$. By construction, FUSS shifts

1611: the population toward a more uniform one. Although the fitness

1612: values are no longer equi-spaced, they still form a discrete set

1613: for finite population $P$. For a fitness uniform distribution, the

1614: average distance between (fitness) neighboring individuals is

1615: ${1\over|P_t|-1}(f^t_{max}-f^t_{min})=:\hat\eps$. We

1616: define $\hat

1617: F_t:=\{f^t_{min},f^t_{min}+\hat\eps,...,f^t_{max}\}$.

1618: $|\hat F_t| = {1\over\hat\eps}(f^t_{max}-f^t_{min})+1 =

1619: |P_t|$.

1620:

1621: %------------------------------%

1622: \subsection{FUSS}

1623: %------------------------------%

1624: Fitness uniform selection for a continuous valued function has already

1625: been mentioned in Section \ref{secFuss}. We just take a uniform random

1626: fitness $f$ in the interval

1627: $[f_{min}^t-\odt\hat\eps,f_{max}^t+\odt\hat\eps]$.

1628: Independent and dependent fitness pair selection as described in

1629: the last section works analogously. An $\hat\eps=0$ version of

1630: correlated selection does not exist; a non-zero $\hat\eps$ is

1631: important. A discrete pair $(f,f')$ is drawn with probability

1632: $p(f,f')$ as defined in (\ref{ptjoint}) and (\ref{pjoint}) with

1633: $\eps$ and $F$ replaced by $\hat\eps$ and $\hat F_t$. The

1634: additional suppression $\ln|\hat F|=\ln|P_t|$ is small for all

1635: practically realizable population sizes.

1636: In all cases an individual with fitness nearest to $f$ ($f'$) is

1637: selected from the population $P$ (randomly if there is

1638: more than one nearest individual).

1639:

1640: If we assume a fitness uniform distribution then our worst case bound

1641: of $T_{FUSS}\approxleq\sum_{t=1}^{T_{STD}}|P_t|$ is plausible, since

1642: the probability of selecting the best individual is approximately

1643: $|P_t|$. For constant population size we get a bound

1644: $T_{FUSS}\approxleq|P|\cdot T_{STD}$. For the preferred non-deletion

1645: case with population size $|P_t|=t$ the bound gets much worse

1646: $T_{FUSS}\approxleq\odt T_{STD}^2$.

1647: %\subsection{Problems of proportionate selection}

1648: This possible (but not necessary!) slowdown has similarities to

1649: the slowdown problems of proportionate selection in later

1650: optimization stages.

1651: %\subsection{Species definition}

1652: The species definition in Section \ref{secTree} has to be relaxed

1653: by allowing mutation sequences of individuals with

1654: $\hat\eps$-similar fitness.

1655: %\subsection{Larger $\hat\eps$}

1656: Larger choices of $\hat\eps$ may be favorable if the standard

1657: choice causes problems.

1658:

1659: %------------------------------%

1660: \subsection{FUDS}

1661: %------------------------------%

1662: Fitness uniform deletion already requires the range of the fitness

1663: function to be broken up into a finite number of intervals.  While for

1664: discrete valued fitness functions the intervals may correspond to the

1665: unique values of the fitness function, this is not a requirement.

1666: Indeed if the population is small and the fitness function has a large

1667: number of possible values then a more coarse discretization is

1668: necessary.  Continuous valued fitness functions can therefore be

1669: treated in exactly the same way and do not cause any special problems.

1670: In fact they are slightly simpler in that we are now free to choose

1671: the discretization as fine as we like without being limited by the

1672: number of possible fitness values.  Of course, like in the discrete

1673: case, we still must choose a discretization which is appropriate given

1674: the size of the population.

1675:

1676:

1677: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1678: \section{The EA Test System}\label{secJfuss}

1679: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1680:

1681: To test FUSS and FUDS we have implemented an EA test system in Java.

1682: The complete source code along with the test problems presented in

1683: this paper and basic usage instructions can be downloaded from

1684: \cite{Legg:website}.  The EA model we have chosen for our tests is the

1685: so called ``steady state'' model as opposed to the more usual

1686: ``generational'' model.  In a generational EA in each generation we

1687: select an entirely new population based on the old population.  The

1688: old population is then simply discarded.  Under the steady state model

1689: that we use, each step of the optimization adds and removes just one

1690: individual at a time.  Specifically the process occurs as follows:

1691: Firstly an individual is selected by the

1692: \emph{selection scheme} and then with a certain probability another

1693: individual is also selected and the \emph{crossover operator} is

1694: applied to produce a new individual.  Then with another probability a

1695: \emph{mutation operator} is applied to produce the child individual

1696: which is then added to the population.  We refer to the probability of

1697: crossing as the \emph{crossover probability} and the probability of

1698: mutating following a crossover as the \emph{mutation probability}.  In

1699: the case where no crossover takes place the individual is always

1700: mutated to ensure that we are not simply adding a clone of an existing

1701: individual into the population.  Finally, an individual must be

1702: deleted in order to keep the population size constant.  This

1703: individual is selected by the \emph{deletion scheme}.  The deletion

1704: scheme is important as it has the power to bias the population in a

1705: similar way to the selection scheme.

1706:

1707: Our task in this paper is to experimentally analyze how FUSS performs

1708: relative to other selection schemes and how FUDS performs relative to

1709: other deletion schemes.  Because any particular run of a steady state

1710: EA requires both a selection and a deletion scheme to be used, there

1711: are many possible combinations that we could test.  We have narrowed

1712: this range of possibilities down to just a few that are commonly used.

1713:

1714: Among the selection schemes, tournament selection is one of the

1715: simplest and most commonly used and we consider it to be roughly

1716: representative of other standard selection schemes which favor the

1717: fitter individuals in the population; indeed in the case of tournament

1718: size 2 it can be shown that tournament selection is equivalent to the

1719: linear ranking selection scheme \cite[Sec.2.2.4]{Hutter:92cfs}.  With

1720: tournament selection we randomly pick a group of individuals and then

1721: select the fittest individual from this group.  The size of the group

1722: is called the \emph{tournament size} and it is clear that the larger

1723: this group is the more likely we are to select a highly fit individual

1724: from the population.  At some point in the future we may implement

1725: other standard selection schemes to broaden our comparison, however we

1726: expect the performance of these schemes to be at best comparable to

1727: tournament selection when used with a correctly tuned selection

1728: intensity.

1729:

1730: Among the deletion schemes one of the most commonly used in steady

1731: state EAs is random deletion.  The rational for this is that it is

1732: neutral in the sense that it does not skew the distribution of the

1733: population in any way.  Thus whether the population tends toward high

1734: or low fitness etc.\ is solely a function of the selection scheme and

1735: its settings.  Of course random deletion, unlike FUDS, makes no effort

1736: to preserve diversity in the population as all individuals have an

1737: equal chance of being removed.  In this paper we will compare FUDS

1738: against random deletion as this is the standard deletion schemes in

1739: situations where it is difficult or impossible to directly measure the

1740: similarity of individuals based on their genomes.

1741:

1742: \begin{figure*}[t]

1743: \includegraphics[width=0.485\textwidth]{Deceptive-R.eps}

1744: \includegraphics[width=0.01\textwidth]{space.eps}

1745: \includegraphics[width=0.485\textwidth]{Deceptive-F.eps}

1746: \caption{\label{SimpleProb} With random deletion (left graph) FUSS

1747: significantly outperforms TOURx and RAND.  By switching to FUDS (right

1748: graph) the performance of TOURx and RAND now scale the same as FUSS.}

1749: \end{figure*}

1750:

1751: When reporting test results we will adopt the following notation:

1752: TOUR2 means tournament selection with a tournament size of 2.

1753: Similarly for TOUR3, TOUR4 and so on.  Under random selection, denoted

1754: RAND, all members of the population have an equal probability of being

1755: selected.  This is sometimes called uniform selection.  When a graph

1756: shows the performance of tournament selection over a range of

1757: tournament sizes we will simply write TOURx.  Naturally FUSS indicates

1758: the fitness uniform selection scheme.  To indicate the deletion scheme

1759: used we will add either the suffix \mbox{-R} or \mbox{-F} to indicate

1760: random deletion or FUDS respectively.  Thus, TOUR10-R is tournament

1761: selection with a tournament size of 10 used with random deletion,

1762: while FUSS-F is FUSS selection used with FUDS deletion.

1763:

1764: The important free parameters to set for each test are the population

1765: size, and the crossover and mutation probabilities.  Good values for

1766: the crossover and mutation probabilities depend on the problem and

1767: must be manually tuned based on experience as there are few

1768: theoretical guidelines on how to do this.  For some problems

1769: performance can be quite sensitive to these values while for others

1770: they are less important.  Our default values are 0.5 for both as this

1771: has often provided us with reasonable performance in the past.

1772:

1773: For each test we ran the system multiple times with the same mutation

1774: and crossover probabilities and the same population size.  The only

1775: difference was which selection and deletion schemes were used by the

1776: code.  Thus even if our various parameters, mutation operators etc.\

1777: were not optimal for a given problem, the comparison is still fair.

1778: Indeed we often deliberately set the optimization parameters to

1779: non-optimal values in order to compare the robustness of the systems.

1780:

1781: As a steady state optimizer operates on just one individual at a time,

1782: the number of cycles within a given run can be high, perhaps 100,000

1783: or more.  In order to make our results more comparable to a

1784: generational optimizer we divide this number by the size of the

1785: population to give the approximate number of generations.

1786: Unfortunately the theoretical understanding of the relationship

1787: between steady state and generational optimizers is not strong.  It

1788: has been shown that under the assumption of no crossover the effective

1789: selection intensity using tournament selection with size 2 is

1790: approximately twice as strong under a steady state EA as it is with a

1791: generational EA \cite{Rogers:99}.  As far as we are aware a similar

1792: comparison for systems with crossover has not been performed.

1793:

1794: Depending on the purpose of a test run, different stopping criteria

1795: were applied.  For example, in situations where we wanted to graph how

1796: rapidly different strategies converged with respect to generations, it

1797: made sense to fix the number of generations.  In other situations we

1798: wanted to stop a run once the optimizer appeared to have become stuck,

1799: that is, when the maximum fitness had not improved after some

1800: specified number of generations.  In any case we explain for each test

1801: the stopping criterion that has been used.

1802:

1803: In order to generate reliable statistics we ran each test multiple

1804: times; typically 30 times but sometimes up to 100 times if the results

1805: were noisy.  From these runs we then calculated the mean performance

1806: as well as the sample standard deviation and from this the standard

1807: error in our estimate of the mean.  This value was then used to

1808: generate the 95\% confidence intervals which appear as error bars on

1809: the graphs.

1810:

1811:

1812: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1813: \section{A Deceptive 2D Problem}\label{secEx2}

1814: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1815:

1816: The first problem we examine is the simple but highly deceptive 2D

1817: optimization problem which was theoretically analyzed in

1818: Section~\ref{secEx}.  As in the theoretical analysis, we set up the

1819: mutation operator to randomly replace either the $x$ or $y$ position

1820: of an individual and the crossover to take the $x$ position from one

1821: individual and the $y$ position from another to produce an offspring.

1822: The size of the domain for which the function is maximized is just

1823: $\delta^2$ which is very small for small values of $\delta$, while the

1824: local maxima at fitness level 3 covers most of the space.  Clearly the

1825: only way to reach the global maximum is by leaving this local maximum

1826: and exploring the space of individuals with lower fitness values of 1

1827: or 2.  Thus, with respect to the mutation and crossover operators we

1828: have defined, this is a deceptive optimization problem as these

1829: partitions mislead the EA \cite{Forrest:93}.

1830:

1831: For this test we set the maximum population size to 1,000 and made 20

1832: runs for each $\delta$ value.  With a steady state EA it is usual to

1833: start with a full population of random individuals.  However for this

1834: particular problem we reduced the initial population size down to just

1835: 10 in order to avoid the effect of doing a large random search when we

1836: created the initial population and thereby distorting the scaling.

1837: Usually this might create difficulties due to the poor genetic

1838: diversity in the initial population.  However due to the fact that any

1839: individual can mutate to any other in just two steps this is not a

1840: problem in this situation.  Initial tests indicated that reducing the

1841: crossover probability from 0.5 to 0.25 improved the performance

1842: slightly and so we have used the latter value.

1843:

1844: The first set of results for the selection schemes used with random

1845: deletion appear in the left graph of Figure~\ref{SimpleProb}.  As

1846: expected, higher selection intensity is a significant disadvantage for

1847: this problem.  Indeed even with just a tournament size of 3 the number

1848: of generations required to find the maximum became infeasible to

1849: compute for smaller values of $\delta$.  Our results confirm the

1850: theoretical scaling orders of $1\over\delta^2$ for TOUR2-R, and

1851: $1\over\delta$ for FUSS-R, as predicted in Section~\ref{secEx}.  Be

1852: aware that this is a log-log scaled graph and so the different slopes

1853: indicate significantly different orders of scaling.

1854:

1855: In the second set of tests we switch from random deletion to FUDS.

1856: These results appear in the right graph of Figure~\ref{SimpleProb}.

1857: We see that with FUDS as the deletion scheme the scaling improves

1858: dramatically for RAND, TOUR2 and TOUR3.  Indeed they are now of the

1859: same order $\frac{1}{\delta}$ as FUSS, as predicted in

1860: Section~\ref{secEx}.  This shows that for very deceptive problems much

1861: higher levels of selection intensity can be applied when using FUDS

1862: rather than random deletion.  The performance of FUSS-R is very

1863: similar to that of FUSS-F.  This is not surprising as the population

1864: distribution under FUSS already tends to be approximately uniform

1865: across fitness levels and thus we expect the effect of FUDS to be

1866: quite weak.

1867:

1868: Although this problem was artificially constructed, the results

1869: clearly demonstrate how FUSS and FUDS can dramatically improve

1870: performance in some situations.

1871:

1872:

1873: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1874: \section{Traveling Salesman Problem}\label{secTSP}

1875: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1876:

1877: \begin{figure*}[t]

1878: \includegraphics[width=0.485\textwidth]{DTSPI-300gen-R.eps}

1879: \includegraphics[width=0.01\textwidth]{space.eps}

1880: \includegraphics[width=0.485\textwidth]{DTSPI-300gen-F.eps}

1881: \caption{\label{DTSP-1} TOUR3-R converged too slowly while TOUR12-R

1882: converged prematurely and became stuck.  TOUR6-R appears to be about

1883: the correct tournament size for this problem, however it is still

1884: inferior to FUSS-R.  With FUDS all of the selection schemes performed

1885: well though FUSS was still the best.}

1886: \end{figure*}

1887:

1888: A well known optimization problem is the so called Traveling Salesman

1889: Problem (TSP).  The task is to find the shortest Hamiltonian cycle

1890: (path) in a graph of $N$ vertexes (cities) connected by edges of

1891: certain lengths.  There exist highly specialized population based

1892: optimizers which use advanced mutation and crossover operators and are

1893: capable of finding paths less than one percent longer than the optimal

1894: path for up to $10^7$ cities

1895: \cite{Lin:73,Martin:96,Johnson:97,Applegate:00}.  As our goal is only

1896: to study the relative performance of selection and deletion schemes,

1897: having a highly refined implementation is not important.  Thus the

1898: mutation and crossover operators we used were quite simple: Mutation

1899: was achieved by just switching the position of two of the cities in

1900: the solution, while for crossover we used the partial mapped crossover

1901: technique \cite{Goldberg:85}.  Fitness was computed by taking the

1902: reciprocal of the tour length.

1903:

1904: For our first set of tests we used randomly generated TSP problems,

1905: that is, the distance between any two cities was chosen uniformly from

1906: the unit interval $[0,1]$.  We chose this as it is known to be a

1907: particularly deceptive form of the TSP problem as the usual triangle

1908: inequality relation does not hold in general.  For example, the

1909: distance between cities $A$ and $B$ might be $0.1$, between cities $B$

1910: and $C$ $0.2$, and yet the distance between $A$ and $C$ might be

1911: $0.8$.  The problem still has some structure though as efficient

1912: partial solutions tend to be useful building blocks for efficient

1913: complete tours.

1914:

1915: For this test we used random distance TSP problems with 20 cities and

1916: a population size of 1000.  We found that changing the crossover and

1917: mutation probabilities did not improve performance and so these have

1918: been left at their default values of 0.5.  Our stopping criterion was

1919: simply to let the EA run for 300 generations as this appeared to be

1920: adequate for all of the methods to converge and allowed us to easily

1921: graph performance versus generations.

1922:

1923: The first graph in Figure~\ref{DTSP-1} shows each of the selection

1924: schemes used with random deletion.  We see that TOUR3-R has

1925: insufficient selection intensity for adequate convergence while

1926: TOUR12-R quickly converges to a local optimum and then becomes stuck.

1927: TOUR6-R has about the correct level of selection intensity for this

1928: problem and population size.  FUSS-R however initially converges as

1929: rapidly as TOUR12-R but avoids becoming stuck in local optima.  This

1930: suggests improved population diversity.  The performance curve for

1931: FUSS-R is impressive, especially considering that it is parameterless.

1932:

1933: At first it might seem surprising that the maximum fitness with FUSS

1934: climbs very quickly for the first 20 generations, especially

1935: considering that FUSS makes no attempt to increase the average fitness

1936: in the population.  However we can explain this very rapid rise in

1937: solution fitness by considering a simple example.  Consider a

1938: situation where there is a large number of individuals in a small band

1939: of fitness levels, say 1,000 with fitness values ranging from 50 to

1940: 70.  Add to this population one individual with a fitness value of 73.

1941: Thus the total fitness range contains 24 values.  Whenever FUSS picks

1942: a random point from 72 to 73 inclusive this single individual with

1943: maximal fitness will be selected.  That is, the probability that the

1944: single fittest individual will be selected is 2/24 = 0.083.  In

1945: comparison under TOUR12 the probability that the fittest individual is

1946: selected is the same as the probability that it is picked for the

1947: sample of 12 elements used for the tournament, which is approximately,

1948: 12/1000 = 0.012.  Thus the probability of the fittest individual in

1949: the population being selected is higher under FUSS than under TOUR12

1950: and so the maximum fitness would rise quickly to start with.

1951:

1952: Previously in \cite{Legg:04fussexp} we speculated that this may have

1953: been responsible for performance problems that we had observed with

1954: FUSS in some situations.  However further experimentation has shown

1955: that very rapid rises in maximal fitness are quite rare and are also

1956: very shortly lived when they do occur --- too short to cause any

1957: significant diversity problems in the population.  We now believe that

1958: the population distribution is to blame in these situations; something

1959: that we will explore in detail in Section~\ref{secSAT}.

1960:

1961: \begin{figure*}[t]

1962: \includegraphics[width=0.48\textwidth,height=0.30\textheight]{dtpsi20-s40-p250.eps}

1963: \includegraphics[width=0.02\textwidth]{space.eps}

1964: \includegraphics[width=0.48\textwidth,height=0.30\textheight]{dtpsi20-s40-p500.eps}

1965: \includegraphics[width=1.00\textwidth,height=0.02\textheight]{space.eps}

1966: \includegraphics[width=0.48\textwidth,height=0.30\textheight]{dtpsi20-s40-p1k.eps}

1967: \includegraphics[width=0.02\textwidth]{space.eps}

1968: \includegraphics[width=0.48\textwidth,height=0.30\textheight]{dtpsi20-s40-p5k.eps}

1969: \caption{\label{tsp} The performance of TOURx-F is much more stable

1970: than TOURx-R under variation in the selection intensity.  Also both

1971: FUSS-R and FUSS-F produce very good results, especially with the

1972: larger populations.}

1973: \end{figure*}

1974:

1975: The second graph in Figure~\ref{DTSP-1} shows the same set of

1976: selection schemes but now using FUDS as the deletion scheme.  With

1977: FUDS the performance of all of the selection schemes either stayed the

1978: same or improved.  In the case of TOUR3 the improvement was dramatic

1979: and for TOUR12 the improvement was also quite significant.  This is

1980: interesting because it shows that with fitness uniform deletion,

1981: performance can improve when the selection intensity is either too

1982: high or too low.  That is, when using FUDS the performance of the EA

1983: now appears to be more robust with respect to variation in selection

1984: intensity.

1985:

1986: In the case of TOUR12-F this is evidence of improved population

1987: diversity as the EA is no longer becoming stuck.  However for TOUR3-R

1988: the selection intensity is quite low and thus we would expect the

1989: population diversity to be relatively good.  Thus the fact that

1990: TOUR3-F was so much better than TOUR3-R suggests that FUDS can have

1991: significant performance benefits that are not related to improved

1992: population diversity.

1993:

1994: Investigating further it seems that this effect is due to the way that

1995: FUDS focuses the deletion on the large mass of individuals which have

1996: an average level of fitness while completely leaving the less common

1997: fit individuals alone.  This helps a system with very weak selection

1998: intensity move the mass of the population up through the fitness

1999: space.  With higher selection intensity this problem tends not to

2000: occur as individuals in this central mass are less likely to be

2001: selected thus reducing the rate at which new individuals of average

2002: fitness are added to the population.

2003:

2004:

2005: In order to better understand how stable FUDS performance is when used

2006: with different selection intensities we ran another set of tests on

2007: random TSP problems with 20 cities and graphed how performance varied

2008: by tournament size.  For these tests we set the EA to stop each run

2009: when no improvement had occurred in 40 generations.  We also tested on

2010: a range of population sizes: 250, 500, 1000 and 5000.  The results

2011: appear in Figure~\ref{tsp}.

2012:

2013: In these graphs we can now clearly see how the performance of TOURx-R

2014: varies significantly with tournament size.  Below the optimal

2015: tournament size performance worsened quickly while above this value it

2016: also worsened, though more slowly.  Interestingly, with a population

2017: size of 5000 the optimal tournament size was about 6, while with small

2018: populations the optimal value fell to just 4.  Presumably this was

2019: partly because smaller populations have lower diversity and thus

2020: cannot withstand as much selection intensity.

2021:

2022: In contrast FUSS-R and FUSS-F appear as horizontal lines as they do

2023: not have a tournament size parameter.  We see that they have performed

2024: as well as the optimal performance of TOURx-R without requiring any

2025: tuning.  Indeed for larger populations FUSS-R appears to be even

2026: better than the optimally tuned performance of TOURx-R.  This is a

2027: very positive result for the parameterless FUSS.

2028:

2029: Comparing FUDS with random deletion we also see impressive results.

2030: For every combination of selection scheme, tournament size and

2031: population size the result with FUDS was better than the corresponding

2032: result with random deletion, and in some cases much better.

2033: Furthermore these graphs clearly display the improved robustness of

2034: tournament selection with FUDS as TOURx-F produced near optimal

2035: results for all tournament sizes.  Even with an optimally tuned

2036: tournament size FUDS increased performance, particularly with the

2037: smaller populations.  Indeed for each population size tested the worst

2038: performance of TOURx-F was equal to the best performance of

2039: \mbox{TOURx-R}.

2040:

2041: With FUSS there was also a performance advantage when using FUDS,

2042: again more so with the smaller populations.  The combination of both

2043: FUSS and FUDS was especially effective as can be seen by the

2044: consistently superior performance of FUSS-F across all of the graphs.

2045:

2046: More tests were run exploring performance with up to 100 cities.

2047: Although the performance of FUDS remained stronger than random

2048: deletion for very low selection intensity, for high selection

2049: intensity the two were equal.  We believe that the reason for this is

2050: the following: When the space of potential solutions is very large

2051: finding anything close to a global optimum is practically impossible,

2052: indeed it is difficult to even find the top of a reasonable local

2053: optimum as the space has so many dimensions.  In these situations it

2054: is more important to put effort into simply climbing in the space

2055: rather than spreading out and trying to thoroughly explore.  Thus

2056: higher selection intensity can be an advantage for large problem

2057: spaces.  At any rate, for large problems and with high selection

2058: intensity FUDS did not appear to hinder the performance, while with

2059: low selection intensity it continued to significantly improve it.

2060:

2061: \begin{figure*}[t]

2062: \includegraphics[width=0.485\textwidth]{SCPI42-p250.eps}

2063: \includegraphics[width=0.01\textwidth]{space.eps}

2064: \includegraphics[width=0.485\textwidth]{SCPI42-p500.eps}

2065: \includegraphics[width=1.00\textwidth,height=0.01\textheight]{space.eps}

2066: \includegraphics[width=0.485\textwidth]{SCPI42-p1k.eps}

2067: \includegraphics[width=0.01\textwidth]{space.eps}

2068: \includegraphics[width=0.485\textwidth]{SCPI42-p5k.eps}

2069: \caption{\label{scp-unbal} The performance of FUSS for the two smaller

2070: populations was relatively poor, while for the larger populations it

2071: matched the optimal performance of TOURx-R.  FUDS again produced

2072: superior results to random deletion in all situations tested.}

2073: \end{figure*}

2074:

2075: Experiments were also performed using the more efficient ``2-Opt''

2076: mutation operator.  As expected, this increased performance and allowed

2077: much higher selection pressure to be used.  Of course the problem then

2078: no longer had the kind of deceptive structure that heavily punishes

2079: high selection pressure that we are looking for.  Nevertheless, FUDS

2080: continued to significantly boost the performance of tournament

2081: selection, in particular when the tournament size was too small.

2082:

2083:

2084: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

2085: \section{Set Covering Problem}\label{secSetCover}

2086: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

2087:

2088: The set covering problem (SCP) is a reasonably well known NP-complete

2089: optimization problem with many real world applications.  Let $M \in

2090: \{0,1\}^{m \times n}$ be a binary valued matrix and let $c_j > 0$ for

2091: $j \in \{1, \ldots n \}$ be the cost of column $j$.  The goal is to

2092: find a subset of the columns such that the cost is minimized.  Define

2093: $x_j = 1$ if column $j$ is in our solution and 0 otherwise.  We can

2094: then express the cost of this solution as $\sum_{j=1}^n c_j x_j$

2095: subject to the condition that $\sum_{j=1}^n m_{ij} x_j \geq 1$ for $i

2096: \in \{1, \ldots m\}$.

2097:

2098: Our system of representation, mutation operators and crossover follow

2099: that used by Beasley \cite{Beasley:96} and we compute the fitness by

2100: taking the reciprocal of the cost.  The results presented here are

2101: based on the ``scp42'' problem from a standard collection of SCP

2102: problems \cite{Beasley:03}.  The results obtained on other problems in

2103: this test set were similar.  We found that increasing the crossover

2104: probability and reducing the mutation probability improved

2105: performance, especially when the selection intensity was low.  Thus we

2106: have tested the system with a crossover probability of 0.8 and a

2107: mutation probability of 0.2.  We performed each test at least 50 times

2108: in order to minimize the error bars.  Our stopping criterion was to

2109: terminate each run after no improvement in minimal cost

2110: had occurred

2111: for 40 generations.  The results for this test appear in

2112: Figure~\ref{scp-unbal}.

2113:

2114: \begin{figure*}[t]

2115: \includegraphics[width=0.485\textwidth]{CNF150-p500.eps}

2116: \includegraphics[width=0.01\textwidth]{space.eps}

2117: \includegraphics[width=0.485\textwidth]{CNF150-p5k.eps}

2118: \caption{\label{cnf-all}With low selection intensity TOURx-F performed

2119: slightly below TOURx-R, but was otherwise comparable.  FUSS had

2120: serious difficulties.}

2121: \end{figure*}

2122:

2123:

2124: Similar to the TSP graphs we again see the importance of correctly

2125: tuning the tournament size with TOURx-R.  We also see the optimal

2126: range of performance for TOURx-R moving to the right as the population

2127: sizes increases.  This is what we would expect due to the greater

2128: diversity in larger populations.  This kind of variability is one of

2129: the reasons why the selection intensity parameter usually has to be

2130: determined by experimentation.

2131:

2132: Unlike with TSP however, the performance of FUSS was less convincing

2133: in these results.  With the smaller populations of 250 and 500 FUSS-R

2134: was only better than TOURx-R when the tournament size was very low or

2135: very high.  With the larger populations of 1,000 and 5,000 the results

2136: were much better with FUSS-R performing as well as the optimal

2137: performance of TOURx-R.  FUSS-F performed better than FUSS-R, in

2138: particular with the smaller populations though this improvement was

2139: still insufficient for it to match the optimal performance of TOURx-R

2140: in these cases.  The fact that the performance of FUSS varied by

2141: population size suggests that FUSS might be experiencing some kind of

2142: population diversity problem.  We will look more carefully at

2143: diversity issues in the next section.

2144:

2145: With FUDS the results were again very impressive.  As with the TSP

2146: tests; for all combinations of selection scheme, tournament size and

2147: population size that we tested, the performance with FUDS was superior

2148: to the corresponding performance with random deletion.  This was true

2149: even when the tournament size was optimal.  While the performance of

2150: TOURx-F did vary significantly with different tournament sizes, the

2151: results were more robust than TOURx-R, especially with the larger

2152: populations.  Indeed for the larger two populations we again have a

2153: situation where the worst performance of TOURx-F is equal to the

2154: optimal performance of TOURx-R.

2155:

2156:

2157: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

2158: \section{Maximum CNF3 SAT}\label{secSAT}

2159: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

2160:

2161: Maximum CNF3 SAT is a well known NP hard optimization problem

2162: \cite{Crescenzi:04} that has been extensively studied.  A three

2163: literal conjunctive normal form (CNF) logical equation is a boolean

2164: equation that consists of a conjunction of clauses where each clause

2165: contains a disjunction of three literals.  So for example, $(a \lor b

2166: \lor \lnot c) \land ( a \lor \lnot e \lor f)$ is a CNF3 expression.

2167: The goal in the maximum CNF3 SAT problem is to find an instantiation

2168: of the variables such that the maximum number of clauses evaluate to

2169: true.  Thus for the above equation if $a = F$, $b = T$, $c = T$, $e =

2170: T$, and $f = F$ then just one clause evaluates to true and thus this

2171: instantiation gets a score of one.  Achieving significant results in

2172: this area would be difficult and this is not our aim; we are simply

2173: using this problem as a test to compare selection and deletion

2174: schemes.

2175:

2176: Our test problems have been taken from the SATLIB collection of SAT

2177: benchmark tests \cite{Hoos:00}.  The first test was performed on the

2178: full set of 100 instances of randomly generated CNF3 formula with 150

2179: variables and 645 clauses, all of which are known to be satisfiable.

2180: Based on test results the crossover and mutation probabilities were

2181: left at the default values.  Our mutation operator simply flips one

2182: boolean variable and the crossover operator forms a new individual by

2183: randomly selecting for each variable which parent's state to take.

2184: Fitness was simply taken to be the number of classes satisfied.  Again

2185: we tested across a range of tournament sizes and population sizes.

2186: The results of these tests appear in Figure~\ref{cnf-all}.

2187:

2188: We have shown only the population sizes of 500 and 5,000 as the other

2189: population sizes tested followed the same pattern.  Interestingly for

2190: this problem there was no evidence of better performance with FUDS at

2191: higher selection intensities.  Nor for that matter was there the

2192: decline in performance with TOURx-R that we have seen elsewhere.

2193: Indeed with random deletion the selection intensity appeared to have

2194: no impact on performance at all.  While SAT3 CNF is an NP hard

2195: optimization problem, this lack of dependence of our selection

2196: intensity parameter suggests that it may not have the deceptive

2197: structure that FUSS and FUDS are designed for.

2198:

2199: With low selection intensity FUDS caused performance to fall below

2200: that of random deletion; something that we have not seen before.

2201: Because the advantages of FUDS have been more apparent with low

2202: populations in other test problems, we also tested the system with a

2203: population size of only 150. Unfortunately no interesting changes in

2204: behavior were observed.

2205:

2206: \begin{figure*}[t]

2207: \includegraphics[width=0.485\textwidth]{CNF-TOUR4-R-popDist.eps}

2208: \includegraphics[width=0.01\textwidth]{space.eps}

2209: \includegraphics[width=0.485\textwidth]{CNF-TOUR4-B-popDist.eps}

2210: \includegraphics[width=1.00\textwidth,height=0.01\textheight]{space.eps}

2211: \includegraphics[width=0.485\textwidth]{CNF-FUSS-R-popDist.eps}

2212: \includegraphics[width=0.01\textwidth]{space.eps}

2213: \includegraphics[width=0.485\textwidth]{CNF-FUSS-B-popDist.eps}

2214: \caption{\label{cnf-pop} With TOUR4-R the population collapses to a

2215: narrow band of fitness levels while with TOUR4-F the distribution is

2216: flat.  Under FUSS the population spreads out in both directions with

2217: FUSS-F in particular giving an extremely uniform distribution.}

2218: \end{figure*}

2219:

2220: While FUDS had minor difficulties, FUSS had serious problems for all

2221: the population sizes that we tested.  We suspected that the uniform

2222: nature of the population distribution that should occur with both FUSS

2223: and FUDS might be to blame as we only expect this to be a benefit for

2224: very deceptive problems which are sensitive to the tuning of the

2225: selection intensity parameter.  Thus we ran the EA with a population

2226: of 1000 and graphed the population distribution across the number of

2227: clauses satisfied at the end of the run.  We stopped each run when the

2228: EA made no progress in 40 generations.  The results of this appear in

2229: Figure~\ref{cnf-pop}.

2230:

2231: The first thing to note is that with TOUR4-R the population collapses

2232: to a narrow band of fitness levels, as expected.  With TOUR4-F the

2233: distribution is now uniform, though practically none of the population

2234: satisfies fewer than 550 clauses.  The reason for this is quite

2235: simple: While FUDS levels the population distribution out, TOUR4 tends

2236: to select the most fit individuals and thus pushes the population to

2237: the right from its starting point.  In contrast, FUSS pushes the

2238: population toward currently unoccupied fitness levels.  This results

2239: in the population spreading out in both directions and so the number

2240: of individuals with extremely poor fitness is much higher.

2241:

2242: Given that our goal is to find an instantiation that satisfies all 645

2243: clauses, it is questionable whether having a large percentage of the

2244: population unable to satisfy even 600 clauses is of much benefit.

2245: While the total population diversity under FUSS-F might be very high,

2246: perhaps the kind of diversity that matters the most is the diversity

2247: among the relatively fit individuals in the population.  This should

2248: be true for all but the most excessively deceptive problems.  By

2249: thinly spreading the population across a very wide range of fitness

2250: levels we actually end up with very few individuals with the kind of

2251: diversity that matters.  Of course this depends on the nature of the

2252: problem we are trying to solve and the fitness function that we use.

2253:

2254: \begin{figure*}[t]

2255: \includegraphics[width=0.485\textwidth]{CNF-total-diversity-R.eps}

2256: \includegraphics[width=0.01\textwidth]{space.eps}

2257: \includegraphics[width=0.485\textwidth]{CNF-total-diversity-F.eps}

2258: \includegraphics[width=1.00\textwidth,height=0.01\textheight]{space.eps}

2259: \includegraphics[width=0.485\textwidth]{CNF-top-diversity-R.eps}

2260: \includegraphics[width=0.01\textwidth]{space.eps}

2261: \includegraphics[width=0.485\textwidth]{CNF-top-diversity-F.eps}

2262: \caption{\label{cnf-diver} While the total population diversity is

2263: very strong under FUSS, the diversity among fit individuals is weak.

2264: FUDS improves the total population diversity compared to random

2265: deletion, but has little effect on the diversity among the fit

2266: individuals.}

2267: \end{figure*}

2268:

2269: Fortunately with CNF3 SAT we can directly measure population diversity

2270: by taking the average hamming distance between individuals' genomes.

2271: While this means that the value of the fitness based similarity metric

2272: is questionable for this problem, as more direct methods like crowding

2273: can be applied, it is a useful situation for our analysis as it allows

2274: us to directly measure how effective FUSS and FUDS are at preserving

2275: population diversity.  The hope of course is that any positive

2276: benefits that we have seen here will also carry over to problems where

2277: directly measuring the diversity is problematic.

2278:

2279: For the diversity tests we used a population size of 1000 again.  For

2280: comparison we used FUSS, TOUR3 and TOUR12 both with random deletion

2281: and with FUDS.  In each run we calculated two different statistics:

2282: The average hamming distance between individuals in the whole

2283: population, and the average hamming distance between individuals whose

2284: fitness was no more than 20 below the fittest individual in the

2285: population at the time.  These two measurements give us the ``total

2286: population diversity'' and ``high fitness diversity'' graphs in

2287: Figure~\ref{cnf-diver}.

2288:

2289: We graphed these measurements against the solution cost of the fittest

2290: individual rather than the number of generations.  This is only fair

2291: because if good solutions are found very quickly then an equally rapid

2292: decline in diversity is acceptable and to be expected.  Indeed it is

2293: trivial to come up with a system which always maintains high

2294: population diversity how ever long it runs, but is unlikely to find

2295: any good solutions.  The results were averaged over all 100 problems in

2296: the test set.  Because the best solution found in each run varied we

2297: have only graphed each curve until such a point where fewer than 50\%

2298: of the runs were able to achieve this level of fitness.  Thus the

2299: terminal point at the right of each curve is representative of fairly

2300: typical runs rather than just a few exceptional ones that perhaps

2301: found unusually good solutions by chance.

2302:

2303: The top two graphs in Figure~\ref{cnf-diver} show the total population

2304: diversity.  As expected the diversity with TOUR3-R and TOUR12-R

2305: decline steadily as finding better solutions becomes increasingly

2306: difficult and the population tends to collapse into a narrow band of

2307: fitness.  As we would expect, the total population diversity with

2308: TOUR3-R is higher than with TOUR12-R.  While FUSS-R declines initially

2309: it then stabilizes at around 50 before becoming stuck.  As the TOUR3-R

2310: and TOUR12-R curves both extend further to the right, even though the

2311: total population diversity becomes quite low, this show that diversity

2312: problems in the population as a whole are not a significant factor

2313: behind the performance problems with FUSS-R.

2314:

2315: The top right graph shows the same selection schemes, but this time

2316: with FUDS.  As expected FUDS has significantly improved the total

2317: population diversity with both TOUR3 and TOUR12, while having little

2318: impact on FUSS which already has a relatively flat population

2319: distribution.  As the maximal solution found by TOUR3-F and TOUR12-F

2320: were not better than TOUR3-R and TOUR12-R, this indicates that

2321: improved total population diversity is not a significant factor in the

2322: performance of the EA for this type of optimization problem.  That

2323: FUDS has lifted the total diversity for TOUR3 and TOUR12 so that they

2324: are now above FUSS-F, is particularly interesting.  This suggests that

2325: while FUSS has high total population diversity, there appears to be

2326: some more subtle effects that are causing the diversity to be lower

2327: than it could be.  It may be related to the fact the FUSS sometimes

2328: heavily selects from small groups within the population during the

2329: early stages of the optimization process, as we noted in

2330: Section~\ref{secTSP}.  However we are not certain whether this is

2331: occurring in this case.

2332:

2333: On the lower set of graphs we see the diversity among the fitter

2334: individuals in the population; specifically those whose fitness is no

2335: more than 20 below the fittest individual in the population at the

2336: time.  On the first graph on the left we see that TOUR3 has

2337: significantly greater diversity than TOUR12 with both deletion

2338: schemes.  This is expected as TOUR3 tends to search more evolutionary

2339: paths while TOUR12 just rushes down a few.  Disappointingly FUDS does

2340: not appear to have made much difference to the diversity among these

2341: highly fit individuals, though the curves do flatten out a little as

2342: the diversity drops below 30, so perhaps FUDS is having a slight

2343: impact.

2344:

2345: For both FUSS-R and FUSS-F the diversity among the fit individuals was

2346: poor, indeed it was even worse than TOUR12 for both deletion schemes.

2347: Thus, while the total population diversity with FUSS tends to be high,

2348: the diversity among the fittest individuals in the population can be

2349: quite poor.  Furthermore, the curves for high fitness diversity all

2350: end once the diversity drops into the 12 to 17 range.  As this pattern

2351: was absent from the graphs of total population diversity, this

2352: indicates that it is indeed the diversity among the relatively fit

2353: individuals in the population that most determines when the EA is

2354: going to become stuck.

2355:

2356: In summary, these results show that while FUSS has been successful in

2357: maximizing total population diversity, for problems such as CNF3 SAT

2358: this is not sufficient.  It appears to be more important that the EA

2359: maximizes the diversity among those individuals which have higher

2360: fitness and in this regard FUSS is poor, which leads to poor

2361: performance.  This is most likely a characteristic of optimization

2362: problems which, while still difficult, are not as deceptive as SCP or

2363: random TSP.

2364:

2365:

2366: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

2367: \section{Conclusions and Future Research Directions}\label{secConc}

2368: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

2369:

2370: We have addressed the problem of balancing the selection intensity

2371: in EAs, which determines speed versus quality of a solution. We

2372: invented a new fitness uniform selection scheme FUSS. It generates

2373: a selection pressure toward sparsely populated fitness levels.

2374: This property is unique to FUSS as compared to other selection

2375: schemes (STD).

2376: %

2377: It results in the desired high selection pressure toward higher

2378: fitness if there are only a few fit individuals. The selection

2379: pressure is automatically reduced when the number of fit

2380: individuals increases.

2381: %

2382: We motivated FUSS as a scheme which bounds the number of {\em

2383: similar} individuals in a population. We defined a universal

2384: similarity relation solely depending on the fitness, independent

2385: of the problem structure, representation and EA details.

2386: %

2387: We showed analytically by way of a simple example that FUSS can be

2388: much more effective than STD. A joint pair selection scheme for

2389: recombination has been defined.

2390: %

2391: A heuristic worst case analysis of FUSS compared to STD has been

2392: given. For this, the fitness tree model has been defined, which is an

2393: interesting analytic tool in itself.

2394: %

2395: FUSS solves the problem of population takeover and the resulting

2396: loss of genetic diversity of STD, while still generating enough

2397: selection pressure. It does not help in getting a more uniform

2398: distribution within a fitness level.

2399:

2400: We have also invented a related system called FUDS which achieves a

2401: similar effect to FUSS except that it works through deletion rather

2402: than through selection.  This means that FUDS shares many of the

2403: important characteristics of FUSS including strong total population

2404: diversity and the impossibility of population collapse.  We showed

2405: analytically that for a simple deceptive optimization problem the

2406: performance of STD when used with FUDS scales similarly to FUSS.

2407:

2408: A test system has been constructed and used to evaluate the empirical

2409: performance of both FUSS and FUDS on a range of optimization problems

2410: with different population sizes, mutation probabilities and crossover

2411: probabilities.  Their performance has been compared to the more

2412: standard methods of tournament selection and random deletion.  For the

2413: artificial deceptive 2D optimization problem and random distance

2414: matrix TSP problems both FUSS and FUDS performed extremely well.  For

2415: the deceptive 2D problem they dramatically improved the scaling

2416: exponent in the number of generations needed to find the global

2417: optimum.  For the TSP problems FUSS-R performed as well as optimally

2418: tuned TOURx-R for all population sizes, and FUDS caused TOURx to

2419: perform near optimally for all tournament sizes and population sizes.

2420:

2421: With SCP problems with small populations the performance of FUSS-R was

2422: only better than TOURx-R when the tournament size was poorly set.  For

2423: populations larger than 1,000 however, FUSS-R continued to perform as

2424: well as the optimal results for TOURx-R.  FUDS was again consistently

2425: superior returning better results than random deletion for every

2426: combination of selection scheme, tournament size and population size

2427: tested.

2428:

2429: For CNF3 SAT problems we ran into difficulties however.  While FUDS

2430: significantly improved the performance of FUSS, it was inferior to

2431: random deletion for low selection intensities.  In other cases the

2432: performance was comparable.  FUSS however had serious performance

2433: problems.  Further investigations revealed that this appears to be due

2434: to the small number of individuals in the population that have

2435: relatively high fitness when using FUSS.  We measured the diversity in

2436: the population and found that while the total population diversity

2437: with FUSS was high, the diversity among the fit individuals was

2438: relatively poor.  This produced a serious diversity problem in the

2439: population when combined with the fact that there are relatively few

2440: individuals of high fitness when using FUSS.

2441:

2442: As the performance of TOURx-R was not impacted by high selection

2443: intensity on the CNF3 SAT problem this indicates that this problem

2444: does not have the kind of deceptive nature that harshly punishes

2445: greedy exploration that we were looking for.  Perhaps for such

2446: problems a less extreme approach is called for.  For example, rather

2447: than trying to spread the population across all fitness levels

2448: uniformly we should instead control the distribution so that it is

2449: biased toward high fitness but never collapses totally as it does with

2450: TOURx-R.

2451:

2452: We have experimented with a deletion scheme which deletes the

2453: population distribution down to a convex curve peaked at the fittest

2454: individual in the population.  This is the deletion equivalent of the

2455: scale independent selection scheme described in

2456: Section~\ref{secCross}.  Our results thus far indicate that the

2457: performance is equal or slightly superior to random deletion in all

2458: situations.  However the dramatic improvements that FUDS has over

2459: random deletion in some cases are now less significant.

2460:

2461: Another possibility is to manipulate the fitness function to

2462: effectively achieve the same thing.  For example, we have found that

2463: by taking the fitness to be the reciprocal of the number of

2464: unsatisfied clauses in the CNF3 SAT problem the performance of FUSS

2465: improves significantly, indeed it is then comparable to TOURx.

2466: Perhaps however it would be better to avoid these performance tricks

2467: and instead focus on extremely deceptive problems where high selection

2468: intensity is heavily punished, that is, the kinds of problems that

2469: FUSS and FUDS were specifically designed for.

2470:

2471: %------------------------------%

2472: \subsection{Acknowledgments}

2473: %------------------------------%

2474: This work was supported by SNF grants 2100-67712.02 and 200020-107616.

2475:

2476: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

2477: %         Bibliography        %

2478: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

2479:

2480: \begin{small}

2481: \begin{thebibliography}{XXXX}

2482:

2483: \bibitem[ACR00]{Applegate:00}

2484: D.~Applegate, W.~Cook, and A.~Rohe.

2485: \newblock Chained {L}in-{K}ernighan for large traveling salesman problems.

2486: \newblock Technical report, Department of Computational and Applied

2487:   Mathematics, Rice University, Houston, TX, 2000.

2488:

2489: \bibitem[Bak85]{Baker:85}

2490: J.~E. Baker.

2491: \newblock Adaptive selection methods for genetic algorithms.

2492: \newblock In {\em Proc. 1st International Conference on Genetic Algorithms and

2493:   their Applications}, pages 101--111, Pittsburgh, PA, 1985. Lawrence Erlbaum

2494:   Associates.

2495:

2496: \bibitem[BC96]{Beasley:96}

2497: J.~Beasley and P.~Chu.

2498: \newblock A genetic algorithm for the set covering problem.

2499: \newblock {\em European Journal of Operational Research}, 94:392--404, 1996.

2500:

2501: \bibitem[Bea03]{Beasley:03}

2502: J.~Beasley.

2503: \newblock Or-library.

2504: \newblock {\em mscmga.ms.ic.ac.uk/jeb/orlib/scpinfo.html}, 2003.

2505:

2506: \bibitem[BHS91]{Baeck:91}

2507: T.~B{\"a}ck, F.~Hoffmeister, and H.~P. Schwefel.

2508: \newblock A survey of evolution strategies.

2509: \newblock In {\em Proc. 4th International Conference on Genetic Algorithms},

2510:   pages 2--9, San Diego, CA, July 1991. Morgan Kaufmann.

2511:

2512: \bibitem[BT95]{Blickle:95a}

2513: T.~Blickle and L.~Thiele.

2514: \newblock A mathematical analysis of tournament selection.

2515: \newblock In {\em Proc. Sixth International Conference on Genetic Algorithms

2516:   ({ICGA}'95)}, pages 9--16, San Francisco, California, 1995. Morgan Kaufmann

2517:   Publishers.

2518:

2519: \bibitem[BT97]{Blickle:97}

2520: T.~Blickle and L.~Thiele.

2521: \newblock A comparison of selection schemes used in evolutionary algorithms.

2522: \newblock {\em Evolutionary {C}omputation}, 4(4):361--394, 1997.

2523:

2524: \bibitem[Cav70]{Cavicchio:70}

2525: D.~J. Cavicchio.

2526: \newblock {\em Adaptive search using simulated evolution}.

2527: \newblock PhD thesis, Unpublished doctoral dissertation, University of

2528:   Michigan, Ann Arbor, 1970.

2529:

2530: \bibitem[CJ91]{Collins:91}

2531: R.~J. Collins and D.~R. Jefferson.

2532: \newblock Selection in massively parallel genetic algorithms.

2533: \newblock In {\em Proc. Fourth International Conference on Genetic Algorithms},

2534:   San Mateo, CA, 1991. Morgan Kaufmann Publishers.

2535:

2536: \bibitem[CK03]{Crescenzi:04}

2537: P.~Crescenzi and V.~Kann.

2538: \newblock A compendium of {NP} optimization problems.

2539: \newblock {\em www.nada.kth.se/$\sim$viggo/problemlist/compendium.html}, 2003.

2540:

2541: \bibitem[EHM99]{Eiben:99}

2542: A.~E. Eiben, R.~Hinterding, and Z.~Michalewicz.

2543: \newblock Parameter control in evolutionary algorithms.

2544: \newblock {\em IEEE Transactions on Evolutionary Computation}, 3(2):124--141,

2545:   1999.

2546:

2547: \bibitem[Esh91]{Eshelman:91}

2548: L.~J. Eshelman.

2549: \newblock The {CHC} adaptive search algorithm: How to safe search when engaging

2550:   in nontraditional genetic recombination.

2551: \newblock In G.~J.~E. Rawlings, editor, {\em Foundations of genetic

2552:   algorithms}, pages 265--283. Morgan Kaufmann, San Mateo, 1991.

2553:

2554: \bibitem[FM93]{Forrest:93}

2555: S.~Forrest and M.~Mitchell.

2556: \newblock What makes a problem hard for a genetic algorithm? {S}ome anomalous

2557:   results and their explanation.

2558: \newblock {\em Machine Learning}, 13(2--3):285--319, 1993.

2559:

2560: \bibitem[GA85]{Goldberg:85}

2561: D.~Goldberg and R.~Lingle. Alleles.

2562: \newblock Loci and the traveling salesman problem.

2563: \newblock In {\em Proc. International Conference on Genetic Algorithms and

2564:   their Applications}, pages 154--159. Lawrence Erlbaum Associates, 1985.

2565:

2566: \bibitem[GD91]{Goldberg:91}

2567: D.~E. Goldberg and K.~Deb.

2568: \newblock A comparative analysis of selection schemes used in genetic

2569:   algorithms.

2570: \newblock In G.~J.~E. Rawlings, editor, {\em Foundations of genetic

2571:   algorithms}, pages 69--93. Morgan Kaufmann, San Mateo, 1991.

2572:

2573: \bibitem[Gol89]{Goldberg:89}

2574: D.~E. Goldberg.

2575: \newblock {\em Genetic Algorithms in Search, Optimization, and Machine

2576:   Learning}.

2577: \newblock Addison-Wesley, Reading, Mass., 1989.

2578:

2579: \bibitem[GR87]{Goldberg:87}

2580: D.~E. Goldberg and J.~Richardson.

2581: \newblock Genetic algorithms with sharing for multi-modal function

2582:   optimization.

2583: \newblock In {\em Proc. 2nd International Conference on Genetic Algorithms and

2584:   their Applications}, pages 41--49, Cambridge, MA, July 1987. Lawrence Erlbaum

2585:   Associates.

2586:

2587: \bibitem[Her92]{Herdy:92}

2588: M.~Herdy.

2589: \newblock Reproductive isolation as strategy parameter in hierarchically

2590:   organized evolution strategies.

2591: \newblock In {\em Parallel problem solving from nature 2}, pages 207--217,

2592:   Amsterdam, 1992. North-Holland.

2593:

2594: \bibitem[Hol75]{Holland:75}

2595: John~H. Holland.

2596: \newblock {\em Adpatation in Natural and Artificial Systems}.

2597: \newblock University of Michigan Press, Ann Arbor, MI, 1975.

2598:

2599: \bibitem[HS00]{Hoos:00}

2600: H.~H. Hoos and T.~St{\"u}tzle.

2601: \newblock {SATLIB: An Online Resource for Research on {SAT}}.

2602: \newblock In {\em SAT 2000}, pages 283--292. IOS press, 2000.

2603:

2604: \bibitem[Hut91]{Hutter:92cfs}

2605: M.~Hutter.

2606: \newblock {I}mplementierung eines {K}lassifizierungs-{S}ystems.

2607: \newblock Master's thesis, Theoretische Informatik, TU M{\"u}nchen, 1991.

2608: \newblock 72 pages with C listing, in German,

2609:   http://www.idsia.ch/$\sim$marcus/ai/pcfs.htm.

2610:

2611: \bibitem[Hut02]{Hutter:01fuss}

2612: M.~Hutter.

2613: \newblock Fitness uniform selection to preserve genetic diversity.

2614: \newblock In {\em Proc. 2002 Congress on Evolutionary Computation (CEC-2002)},

2615:   pages 783--788, Washington D.C, USA, May 2002. IEEE.

2616:

2617: \bibitem[JM97]{Johnson:97}

2618: D.~S. Johnson and A.~McGeoch.

2619: \newblock The traveling salesman problem: {A} case study.

2620: \newblock In E.~H.~L. Aarts and J.~K. Lenstra, editors, {\em Local Search in

2621:   Combinatorial Optimization}, Discrete Mathematics and Optimization,

2622:   chapter~8, pages 215--310. Wiley-Interscience, Chichester, England, 1997.

2623:

2624: \bibitem[Jon75]{DeJong:75}

2625: {K. de} Jong.

2626: \newblock An analysis of the behavior of a class of genetic adaptive systems.

2627: \newblock {\em Dissertation Abstracts International}, 36(10), 5140B, 1975.

2628:

2629: \bibitem[Leg04]{Legg:website}

2630: S.~Legg.

2631: \newblock Website.

2632: \newblock {\em www.idsia.ch/$\sim$shane}, 2004.

2633:

2634: \bibitem[LH05]{Legg:05fuds}

2635: S.~Legg and M.~Hutter.

2636: \newblock Fitness uniform deletion for robust optimization.

2637: \newblock In {\em Proc. Genetic and Evolutionary Computation Conference

2638:   ({GECCO'05})}, pages 1271--1278, Washington, OR, 2005. ACM SigEvo.

2639:

2640: \bibitem[LHK04]{Legg:04fussexp}

2641: S.~Legg, M.~Hutter, and A.~Kumar.

2642: \newblock Tournament versus fitness uniform selection.

2643: \newblock In {\em Proc. 2004 Congress on Evolutionary Computation ({CEC'04})},

2644:   pages 2144--2151, Portland, OR, 2004. IEEE.

2645:

2646: \bibitem[LK73]{Lin:73}

2647: S.~Lin and B.~W. Kernighan.

2648: \newblock An effective heuristic for the travelling salesman problem.

2649: \newblock {\em Operations Research}, 21:498--516, 1973.

2650:

2651: \bibitem[MO96]{Martin:96}

2652: O.~Martin and S.~Otto.

2653: \newblock Combining simulated annealing with local search heuristics.

2654: \newblock {\em Annals of Operations Research}, 63:57--75, 1996.

2655:

2656: \bibitem[MSV94]{Muehlenbein:94}

2657: Heinz M{\"u}hlenbein and Dirk Schlierkamp-Voosen.

2658: \newblock The science of breeding and its application to the breeder genetic

2659:   algorithm ({BGA}).

2660: \newblock {\em Evolutionary {C}omputation}, 1(4):335--360, 1994.

2661:

2662: \bibitem[MT93]{Maza:93}

2663: {M. de la} Maza and B.~Tidor.

2664: \newblock An analysis of selection procedures with particular attention paid to

2665:   proportional and {B}oltzmann selection.

2666: \newblock In {\em Proc. 5th International Conference on Genetic Algorithms},

2667:   pages 124--131, San Mateo, CA, USA, 1993. Morgan Kaufmann.

2668:

2669: \bibitem[RPB99a]{Rogers:99gd}

2670: A.~Rogers and A.~Pr{\"u}gel-Bennett.

2671: \newblock Genetic drift in genetic algorithm selection schemes.

2672: \newblock {\em IEEE Transactions on Evolutionary Computation}, 3(4):298--303,

2673:   1999.

2674:

2675: \bibitem[RPB99b]{Rogers:99}

2676: A.~Rogers and A.~Pr{\"u}gel-Bennett.

2677: \newblock Modelling the dynamics of a steady-state genetic algorithm.

2678: \newblock In Wolfgang Banzhaf and Colin Reeves, editors, {\em Foundations of

2679:   {G}enetic {A}lgorithms 5}, pages 57--68. Morgan Kaufmann, San Francisco, CA,

2680:   1999.

2681:

2682: \bibitem[Rud00]{Rudolph:00}

2683: G{\"u}nter Rudolph.

2684: \newblock On takeover times in spatially structured populations: array and

2685:   ring.

2686: \newblock In K.~K. Lai, O.~Katai, M.~Gen, and B.~Lin, editors, {\em Proceedings

2687:   of the Second Asia-Pacific Conference on Genetic Algorithms and Applications

2688:   ({APGA}'00)}, pages 144--151, Hong Kong, PR China, 2000. Global-Link

2689:   Publishing Company.

2690:

2691: \bibitem[SVM94]{Schlierkamp:94}

2692: D.~Schlierkamp-Voosen and H.~M{\"u}hlenbein.

2693: \newblock Strategy adaptation by competing subpopulations.

2694: \newblock In {\em {P}arallel {P}roblem {S}olving from {N}ature -- {PPSN III}},

2695:   pages 199--208, Berlin, 1994. Springer.

2696: \newblock {L}ecture {N}otes in {C}omputer {S}cience 866.

2697:

2698: \bibitem[Whi89]{Whitley:89}

2699: D.~Whitley.

2700: \newblock The {G}{E}{N}{I}{T}{O}{R} algorithm and selection pressure: Why

2701:   rank-based allocation of reproductive trials is best.

2702: \newblock In {\em Proc. Third International Conference on Genetic Algorithms

2703:   ({ICGA}'89)}, pages 116--123, San Mateo, California, 1989. Morgan Kaufmann

2704:   Publishers, Inc.

2705:

2706: \end{thebibliography}

2707: \end{small}

2708:

2709: \vspace{5ex}

2710: \begin{wrapfigure}[9]{l}{0.3\columnwidth}

2711: \vspace{-2.5ex}\includegraphics[width=0.3\columnwidth]{hutter.eps}

2712: \end{wrapfigure}

2713: \noindent{\bf Marcus Hutter}

2714: received the M.Sc.\ degree in computer science and the

2715: Ph.D.\ degree in theoretical particle physics from the (Technical)

2716: University, Munich, Germany, in 1992 and 1995, respectively.

2717:

2718: Thereafter, he developed algorithms in a medical software company

2719: for five years. Since 2000, he has published over 35 research papers

2720: while a Researcher with the Dalle Molle Institute for Artificial

2721: Intelligence (IDSIA), Lugano, Switzerland. He is the

2722: author of {\em Universal Artificial Intelligence} (EATCS: Springer,

2723: 2004). His current interests are centered around reinforcement

2724: learning, algorithmic information theory and statistics, universal

2725: induction schemes, adaptive control theory, and related areas.

2726:

2727: \vspace{5ex}

2728: \begin{wrapfigure}[9]{l}{0.3\columnwidth}

2729: \vspace{-2.5ex}\includegraphics[width=0.3\columnwidth]{legg.eps}

2730: \end{wrapfigure}

2731: \noindent{\bf Shane Legg}

2732: received the B.C.M.S.\ degree in mathematical and computer sciences

2733: from the University of Waikato, Hamilton, New Zealand, in 1996 and

2734: the M.Sc.\ degree in mathematics from Auckland University, Auckland,

2735: New Zealand, in 1997. He is currently working towards the Ph.D.\

2736: degree at the Dalle Molle Institute for Artificial Intelligence

2737: (IDSIA), Lugano, Switzerland.

2738:

2739: After receiving the M.Sc.\ degree in 1997, he then worked in a

2740: number of companies in New Zealand and the United States mainly

2741: focusing on commercial applications of artificial intelligence. His

2742: research is focused on genetic algorithms, complexity theory, and

2743: theoretical models of artificial intelligence.

2744:

2745: \end{document}

2746: