1:
2: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
3: %% Fitness Uniform Optimization %%
4: %% Marcus Hutter & Shane Legg: (2000-2005) %%
5: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
6:
7: \documentclass[twoside,twocolumn,10pt]{article}
8: \usepackage{graphicx,wrapfig}
9:
10: \topmargin=-15mm \oddsidemargin=-10mm \evensidemargin=-10mm
11: \textwidth=18cm \textheight=25cm
12: \sloppy\lineskip=0pt
13: \pagestyle{myheadings}
14: \markboth{\sc Marcus Hutter \& Shane Legg % \hfil IDSIA-16-06
15: }{\sc Fitness Uniform Optimization}
16:
17: %-------------------------------%
18: % My Math-Environments %
19: %-------------------------------%
20:
21: \def\,{\mskip 3mu} \def\>{\mskip 4mu plus 2mu minus 4mu} \def\;{\mskip 5mu plus 5mu} \def\!{\mskip-3mu}
22: \def\dispmuskip{\thinmuskip= 3mu plus 0mu minus 2mu \medmuskip= 4mu plus 2mu minus 2mu \thickmuskip=5mu plus 5mu minus 2mu}
23: \def\textmuskip{\thinmuskip= 0mu \medmuskip= 1mu plus 1mu minus 1mu \thickmuskip=2mu plus 3mu minus 1mu}
24: \textmuskip
25: \def\beq{\dispmuskip\begin{equation}} \def\eeq{\end{equation}\textmuskip}
26: \def\beqn{\dispmuskip\begin{displaymath}}\def\eeqn{\end{displaymath}\textmuskip}
27: \def\bqa{\dispmuskip\begin{eqnarray}} \def\eqa{\end{eqnarray}\textmuskip}
28: \def\bqan{\dispmuskip\begin{eqnarray*}} \def\eqan{\end{eqnarray*}\textmuskip}
29:
30: %-------------------------------%
31: % Macro-Definitions %
32: %-------------------------------%
33:
34: \newenvironment{keywords}{\vskip 3ex\noindent {\bf\Large Keywords}\vskip 2ex\noindent}{\par\vskip 1ex}
35: %\def\subsection#1{\paragraph{#1.}}
36: \def\subsection#1{\vspace{1ex plus 0.5ex minus 0.5ex}\noindent{\bfseries\boldmath{#1.}}}
37:
38: \def\toinfty#1{\stackrel{#1\to\infty}{\longrightarrow}}
39: \def\nq{\hspace{-1em}}
40: \def\qed{\hspace*{\fill}$\Box\quad$\\}
41: \def\odt{{\textstyle{1\over 2}}}
42: \def\eps{\varepsilon}
43: \def\v#1{{\bf #1}}
44: \def\approxleq{\mbox{\raisebox{-0.8ex}{$\stackrel{\displaystyle<}\sim$}}} %% make nicer
45: \def\approxgeq{\mbox{\raisebox{-0.8ex}{$\stackrel{\displaystyle>}\sim$}}} %% make nicer
46: \def\SetR{{I\!\!R}}
47:
48: \begin{document}
49: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
50: % T i t l e - P a g e %
51: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
52:
53: \title{\vskip -10mm\normalsize\sc Technical Report \hfill IDSIA-16-06
54: \vskip 2mm\bf\huge\hrule height5pt \vskip 6mm
55: Fitness Uniform Optimization
56: \vskip 6mm \hrule height2pt \vskip 5mm}
57: \author{
58: {\bf Marcus Hutter}\\[2mm]
59: IDSIA, Galleria 2, CH-6928\\ Manno-Lugano, Switzerland\\
60: marcus@idsia.ch
61: \and
62: {\bf Shane Legg}\\[2mm]
63: IDSIA, Galleria 2, CH-6928\\ Manno-Lugano, Switzerland\\
64: shane@idsia.ch}
65:
66: \maketitle
67:
68: \begin{abstract}
69: In evolutionary algorithms, the fitness of a population increases with
70: time by mutating and recombining individuals and by a biased selection
71: of more fit individuals. The right selection pressure is critical in
72: ensuring sufficient optimization progress on the one hand and in
73: preserving genetic diversity to be able to escape from local optima on
74: the other hand. Motivated by a universal similarity relation on the
75: individuals, we propose a new selection scheme, which is uniform in
76: the fitness values. It generates selection pressure toward sparsely
77: populated fitness regions, not necessarily toward higher fitness, as
78: is the case for all other selection schemes. We show analytically on a
79: simple example that the new selection scheme can be much more
80: effective than standard selection schemes. We also propose a new
81: deletion scheme which achieves a similar result via deletion and show
82: how such a scheme preserves genetic diversity more effectively than
83: standard approaches. We compare the performance of the new schemes to
84: tournament selection and random deletion on an artificial deceptive
85: problem and a range of NP-hard problems: traveling salesman, set
86: covering and satisfiability.
87: \end{abstract}
88:
89: \begin{keywords}
90: Evolutionary algorithms, fitness uniform selection scheme, fitness
91: uniform deletion scheme, preserve diversity, local optima, evolution,
92: universal similarity relation, correlated recombination, fitness tree
93: model, traveling salesman, set covering, satisfiability
94: \end{keywords}
95:
96: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
97: \section{Introduction}\label{secInt}
98: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
99:
100: %------------------------------%
101: \subsection{Evolutionary algorithms (EA)}
102: %------------------------------%
103: Evolutionary algorithms are capable of solving complicated
104: optimization tasks in which an objective function $f:I\to\SetR$ shall
105: be maximized. $i\in I$ is an individual from the set $I$ of feasible
106: solutions. Infeasible solutions due to constraints may also be
107: considered by reducing $f$ for each violated constraint. A population
108: $P$ is a multi-set of individuals from $I$ which is maintained and
109: updated as follows: one or more individuals are selected according to
110: some selection strategy.
111: %
112: In generation based EAs, the selected individuals are recombined
113: (e.g.\ crossover) and mutated, and constitute the new population.
114: We prefer the more incremental, steady-state population update,
115: which selects (and possibly deletes) only one or two individuals from
116: the current population and adds the newly recombined and mutated
117: individuals to it.
118: %
119: We are interested in finding a single individual of maximal objective value
120: $f$ for difficult multi-modal and deceptive problems.
121:
122: %------------------------------%
123: \subsection{Standard selection schemes (STD)}
124: %------------------------------%
125: The standard selection schemes (abbreviated by STD in the
126: following), proportionate, truncation, ranking and tournament
127: selection all favor individuals of higher fitness
128: \cite{Goldberg:89,Goldberg:91,Blickle:95a,Blickle:97}. This is
129: also true for less common schemes, like Boltzmann selection
130: \cite{Maza:93}. The
131: fitness function is identified with the objective
132: function (possibly after a monotone transformation).
133: In linear proportionate selection the probability
134: of selecting an individual depends linearly on its fitness
135: \cite{Holland:75}. In truncation selection the $\alpha\%$ fittest
136: individuals are selected, usually with multiplicity
137: ${1\over\alpha\%}$ to keep the population size fixed
138: \cite{Muehlenbein:94}.(Linear) ranking selection orders the
139: individuals according to their fitness. The selection probability
140: is, then, a (linear) function of the rank \cite{Whitley:89}.
141: Tournament selection \cite{Baker:85}, which selects the best $l$
142: out of $k$ individuals has primarily developed for steady-state
143: EAs, but can be adapted to generation based EAs. All these
144: selection schemes have the property (and goal!) to increase the
145: average fitness of a population, i.e.\ to evolve the population
146: toward higher fitness.
147:
148: %------------------------------%
149: \subsection{The problem of the right selection pressure}
150: %------------------------------%
151: The standard selection schemes STD, together with mutation and
152: recombination, evolve the population toward higher fitness. If the
153: selection pressure is too high, the EA gets stuck in a local optimum,
154: since the genetic diversity rapidly decreases. The suboptimal genetic
155: material which might help in finding the global optimum is deleted too
156: rapidly (premature convergence). On the other hand, the selection
157: pressure cannot be chosen arbitrarily low if we want the EA to be
158: effective. In difficult optimization problems, suitable population
159: sizes, mutation and recombination rates, and selection parameters,
160: which influence the selection intensity, are usually not known
161: beforehand. Often, constant values are not sufficient at all
162: \cite{Eiben:99}. There are various suggestions to dynamically
163: determine and adapt the parameters
164: \cite{Eshelman:91,Baeck:91,Herdy:92,Schlierkamp:94}.
165: Other approaches to preserve genetic diversity are fitness sharing
166: \cite{Goldberg:87} and crowding \cite{DeJong:75}.
167: They depend on the proper design of a neighborhood function based on
168: the specific problem structure and/or coding. One approach which does
169: not require a neighborhood function based on the genome is local
170: mating \cite{Collins:91}, however it has been shown that rapid
171: takeover can still occur for basic spatial topologies
172: \cite{Rudolph:00}. Another approach which has not been widely studied
173: is preselection \cite{Cavicchio:70}.
174:
175: We are interested in evolutionary algorithms which do not require
176: special problem insight (problem specific neighborhood function and/or
177: coding) and is able to effectively prevent population takeover. In
178: this paper we introduce and analyze two potential approaches to this
179: problem: the Fitness Uniform Selection Scheme (FUSS) and the Fitness
180: Uniform Deletion Scheme (FUDS).
181:
182: %------------------------------%
183: \subsection{The fitness uniform selection scheme}
184: %------------------------------%
185: FUSS is based on the insight that we are not primarily interested in a
186: population converging to maximal fitness, but only in a single
187: individual of maximal fitness. The scheme automatically creates a
188: suitable selection pressure and preserves genetic diversity better
189: than STD. The proposed fitness uniform selection scheme FUSS (see also
190: Figure \ref{figsel}) is defined as follows: {\em if the lowest/highest
191: fitness values in the current population $P$ are $f_{min/max}$ we
192: select a fitness value $f$ uniformly in the interval
193: $[f_{min},f_{max}]$. Then, the individual $i\in P$ with fitness
194: nearest to $f$ is selected and a copy is added to $P$, possibly after
195: mutation and recombination.} We will see that FUSS maintains genetic
196: diversity better than STD, since a distribution over the fitness
197: values is used, unlike STD, which all use a distribution over
198: individuals. Premature convergence is avoided in FUSS by abandoning
199: convergence at all. Nevertheless there is a selection pressure in FUSS
200: toward higher fitness.
201: %
202: The probability of selecting a specific individual is proportional
203: to the distance to its nearest fitness neighbor. In a
204: population with a high density of unfit and low density of fit
205: individuals, the fitter ones are effectively favored.
206:
207: %------------------------------%
208: \subsection{The fitness uniform deletion scheme}
209: %------------------------------%
210: We may also preserve diversity through deletion rather than through
211: selection. By always deleting from those individuals which have very
212: commonly occurring fitness values we achieve a population which is
213: uniformly distributed across fitness values, like with FUSS. Because
214: these deleted individuals are ``commonly occurring'' in some sense
215: this should help preserve population diversity. Under FUDS the role
216: of the selection scheme is to govern how actively different parts of
217: the solution space are searched rather than to move the population as
218: a whole toward higher fitness. Thus, like with FUSS, premature
219: convergence is avoided by abandoning convergence as our goal. However
220: as FUDS is only a deletion scheme, the EA still requires a selection
221: scheme which may require a selection intensity parameter to be set.
222: Thus we do not necessarily have a parameterless EA, as we do with
223: FUSS. Nevertheless due to the impossibility of population collapse
224: the performance is more robust than usual with respect to variation in
225: selection intensity. Thus FUDS is at least a partial solution to the
226: problem of having to correctly set a selection intensity parameter.
227:
228: %------------------------------%
229: \subsection{Contents}
230: %------------------------------%
231: This paper extends and supersedes the earlier results reported in the
232: conference papers \cite{Hutter:01fuss}, \cite{Legg:04fussexp} and
233: \cite{Legg:05fuds}.
234: Among other things, this paper: extends the previous theoretical
235: analysis of FUSS and gives the first theoretical analysis of FUDS and
236: of their performance when combined; presents a new method of analysis
237: called fitness tree analysis; is the first set of experimental results
238: which directly compares the two proposed schemes on the same problems
239: with the same parameters, including when they are used together; gives
240: the first full analysis of population diversity measurements for FUSS
241: and in particular extends and corrects some of the earlier speculation
242: about performance problems in some situations.
243:
244: The paper is structured as follows:
245:
246: In {\em Section \ref{secSim}} we discuss the problems of local
247: optima and population takeover \cite{Goldberg:91} in STD, which
248: could be lowered by restricting the number of {\em similar}
249: individuals in a population. As we often do not have an
250: appropriate functional similarity relation, we define a universal
251: distance (semi-metric) $d(i,j):=|f(i)-f(j)|$ based on the
252: available fitness only, which will serve our needs.
253:
254: Motivated by the universal similarity relation $d$ and by the need to
255: preserve genetic diversity, we define in {\em Section
256: \ref{secFuss}} the fitness uniform selection scheme. We
257: discuss under which circumstances FUSS leads to an (approximate)
258: fitness uniform population.
259:
260: Further properties of FUSS are discussed in \emph{Section
261: \ref{secProp}}, especially, how FUSS creates selection pressure
262: toward higher fitness and how it preserves diversity better than
263: STD. Further topics are the equilibrium distribution, the
264: transformation properties of FUSS under linear and non-linear
265: transformations of $f$.
266:
267: Another way to utilize the ability of the universal similarity
268: relation $d$ to preserve diversity, is to use it to help target
269: deletion. This gives us the fitness uniform {\em deletion} scheme
270: which we define in {\em Section \ref{secFUDS}}. As this produces a
271: population which is approximately uniformly distributed across fitness
272: levels, like with FUSS, many of the properties of FUSS carry over to
273: an EA using FUDS. Some of these properties are highlighted in {\em
274: Section
275: \ref{secPropFUDS}}.
276:
277: In {\em Section \ref{secEx}} we theoretically demonstrate, by way of a
278: simple optimization example, that an EA with FUSS or FUDS can optimize
279: much faster than with STD. We show that crossover can be effective in
280: FUSS, even when ineffective in STD. Furthermore, FUSS, FUDS and STD
281: are compared to random search with and without crossover.
282:
283: In {\em Section \ref{secTree}} we develop a fitness tree model, which
284: we believe to cover the essential features of fitness landscapes for
285: difficult problems with many local optima. Within this model we derive
286: heuristic expressions for the optimization time of random walk, FUSS,
287: FUDS and STD. They are compared, and a worst case slowdown of FUSS
288: relative to STD is obtained.
289:
290: There is a possible additional slowdown when including recombination,
291: as discussed in {\em Section \ref{secCross}}, which can be avoided by
292: using a scale independent pair selection. It is a ``best'' compromise
293: between unrestricted recombination and recombination of $d$-similar
294: individuals only. It also has other interesting properties when used
295: without crossover.
296:
297: To simplify the discussion we have concentrated on the case of
298: discrete, equi-spaced fitness values. In many practical problems, the
299: fitness function is continuously valued. FUSS and some of the
300: discussion of the previous sections is generalized to the continuous
301: case in {\em Section~\ref{secCont}}.
302:
303:
304:
305: \emph{Section~\ref{secJfuss}} begins our experimental analysis of
306: FUSS and FUDS. In this section we give a detailed account of the EA
307: software we have used for our experiments, including links to where
308: the source code can be downloaded.
309:
310: \emph{Section \ref{secEx2}} examines the empirical performance of
311: FUSS and FUDS on the artificially constructed deceptive optimization
312: problem described in Section~\ref{secEx}. These results confirm the
313: correctness of our theoretical analysis.
314:
315: In \emph{Section \ref{secTSP}} we test randomly generated traveling
316: salesman problems.
317:
318: In \emph{Section \ref{secSetCover}} we examine the set covering
319: problem, an NP hard optimization problem which has many real world
320: applications.
321:
322: For our final test in \emph{Section \ref{secSAT}} we look at random
323: CNF3 SAT problems. These are also NP hard optimization problems.
324:
325: \emph{Section \ref{secConc}} contains a summary of our results and
326: possible avenues for future research.
327:
328:
329: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
330: \section{Universal Similarity Relation}\label{secSim}
331: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
332:
333: %------------------------------%
334: \subsection{The problem of local optima}
335: %------------------------------%
336: Proportionate, truncation, ranking and tournament are the standard
337: (STD) selection algorithms used in evolutionary optimization. They
338: have the following property: if a local optimum $i^{lopt}$ has been
339: found, the number of individuals with fitness $f^{lopt}=f(i^{lopt})$
340: tends to increase rapidly. Assume a low mutation and recombination
341: rate, or, for instance, truncation selection {\em after} mutation and
342: recombination. Further, assume that it is very difficult to find an
343: individual fitter than $i^{lopt}$. The population will then degenerate
344: and will consist mostly of $i^{lopt}$ after a few rounds. This
345: decreased diversity makes it even less likely that $f^{lopt}$ gets
346: improved. The suboptimal genetic material which might help in finding
347: the global optimum has been deleted too rapidly. On the other hand,
348: too high mutation and recombination rates convert the EA into an
349: inefficient random search.
350:
351: %------------------------------%
352: \subsection{Possible solution}
353: %------------------------------%
354: Sometimes it is possible to appropriately choose the mutation and
355: recombination rate and population size by some insight into the nature
356: of the problem. More often this is a trial and error process, or no
357: single fixed rate works at all.
358:
359: A naive fix of the problem is to artificially limit the number of
360: identical individuals to a significant but small fraction $\eps$.
361: If the space of individuals $I$ is large, there could be many very
362: similar (but not identical) individuals of, for instance, fitness
363: $f^{lopt}$. The EA can still converge to a population containing
364: only this class of similar individuals, with all others becoming
365: extinct. In order for the limitation approach to work, one has to
366: restrict the number of {\em similar} individuals. Significant
367: contributions in this direction are fitness sharing
368: \cite{Goldberg:87} and crowding \cite{DeJong:75}.
369:
370: %------------------------------%
371: \subsection{The problem of finding a similarity relation}
372: %------------------------------%
373: If the individuals are coded binary one might use the Hamming distance
374: as a similarity relation. This distance is consistent with a mutation
375: operator which flips a few bits. It produces Hamming-similar
376: individuals, but recombination (like crossover) can produce very
377: dissimilar individuals w.r.t.\ this measure. In any case, genotypic
378: similarity relations, like the Hamming distance, depend on the
379: representation of the individuals as binary strings. Individuals with
380: very dissimilar genomes might actually be functionally
381: (phenotypically) very similar. For instance, when most bits are unused
382: (like introns in genetic programming), they can be randomly disturbed
383: without affecting the properties of the individual. For specific
384: problems at hand, it might be possible to find suitable
385: representation-independent functional similarity relations. On the
386: other hand, in genetic programming, for instance, it is in general
387: undecidable whether two individuals are functionally similar.
388:
389: %------------------------------%
390: \subsection{A universal similarity relation}
391: %------------------------------%
392: Here we want to take a different approach. We define the
393: difference or distance between two individuals as
394: \beqn
395: d(i,j) \;:=\; |f(i)-f(j)|.
396: \eeqn
397: The distance is based solely on the fitness function, which is
398: provided as part of the problem specification.
399: %
400: It is independent of the coding/representation and other problem
401: details, and of the optimization algorithm (e.g.\ the genetic mutation
402: and recombination operators), and can trivially be computed from
403: the fitness values.
404: %
405: If we make the natural assumption that functionally similar
406: individuals have similar fitness, they are also similar w.r.t.\ the
407: distance $d$. On the other hand, individuals with very different
408: coding, and even functionally dissimilar individuals may be
409: $d$-similar, but we will see that this is acceptable. For instance,
410: individuals from different local optima of equal height are
411: $d$-similar.
412:
413:
414: %------------------------------%
415: \subsection{Relation to niching and crowding}
416: %------------------------------%
417: Unlike fitness uniform optimization, diversity control methods like
418: niching or crowding require a metric $g$ to be defined over the genome
419: space. By looking at the relationship between $g$ and $f$ we can
420: relate these two types of diversity control: We say that a fitness
421: function $f$ is \emph{smooth} with respect to $g$, if $g( i, j )$
422: being small implies that $|f(i) - f(j)|$ is also small, that is, $d(
423: i, j )$ is small. This implies that if $d( i, j )$ is not small, $g(
424: i, j )$ also cannot be small. Thus, if we limit the number of $d$
425: similar individuals, as we do in fitness uniform optimization, this
426: will also limit the number of $g$ similar individuals, as is done in
427: crowding and niching methods. The advantage of fitness uniform
428: optimization is that we do not need to know what $g$ is, or to compute
429: its value. Indeed, the above argument is true for \emph{any} metric
430: $g$ on the genome space that $f$ is smooth with respect to.
431:
432: On the other hand, if the fitness function $f$ is not generally smooth
433: with respect to $g$, then such a comparison between the methods cannot
434: be made. However, in this case an EA is less likely to be effective
435: as small mutations in genome space with respect to $g$ will produce
436: unpredictable changes in fitness.
437:
438: %------------------------------%
439: \subsection{Topologies on individual space $I$}
440: %------------------------------%
441: The distance $d:I \times I \to \SetR_0^+$ induced by the fitness
442: function $f$ is a semi-metric on the individual space $I$ (semi only
443: because $d(i,j)=0$ for $i\neq j$ is possible). The semi-metric induces
444: a topology on $I$. Equal fitness suffices to declare two individuals
445: as $d$-equivalent, i.e. $d$ is a rather small semi-metric in the sense
446: that the induced topology is rather coarse. We will see that a
447: non-zero distance between individuals of different fitness is
448: sufficient to avoiding the population takeover. $d$ induces the
449: coarsest topology (is the ``smallest'' distance) avoiding population
450: takeover.
451:
452: %------------------------------%
453: \subsection{The problem of genetic drift}
454: %------------------------------%
455: Besides elitist selection, the other major cause of diversity loss in
456: a population is genetic drift. This occurs due to the stochastic
457: nature of the selection operator breeding some individuals more often
458: than others. In a finite population this will cause some individuals
459: to be replaced which have no close relatives, thus reducing diversity.
460: Indeed, without a sufficient rate of mutation, eventually a population
461: will converge on a single genome; even if no selection pressure is
462: applied.
463:
464: Although fitness uniform optimization does not attempt to address this
465: problem, some implications can be drawn. Clearly, with fitness
466: uniform optimization a complete collapse in diversity is impossible as
467: individuals with a wide range of fitness values are always preserved
468: in the population. However, within a given fitness level genetic
469: drift can occur, although the sustained presence of many individuals
470: in other fitness levels to breed with will reduce this effect.
471:
472: Theoretical analysis of genetic drift is often performed by
473: calculating the Markov chain transition matrices to compute the time
474: for the system to reach an absorption state where all of the
475: population members have the same genome. As these results can be
476: difficult to generalize, an alternative approach has been to measure
477: genetic drift by measuring the loss in fitness diversity in a
478: population over time \cite{Rogers:99gd}. This is interesting as
479: fitness uniform optimization attempts to maximize the entropy of the
480: fitness values in the population, producing a very high variance in
481: population fitness. Thus, at least according to the second method of
482: analysis, very little genetic drift would be evident in the
483: population.
484:
485:
486: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
487: \section{Fitness Uniform Selection Scheme (FUSS)}\label{secFuss}
488: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
489:
490: %------------------------------%
491: \subsection{Discrete fitness function}
492: %------------------------------%
493: In this section we propose a new selection scheme, which limits
494: the fraction of $d$-similar individuals. For simplicity we start
495: with a fitness function $f:I\to F$ with discrete equi-spaced
496: values $F=\{f_{min},f_{min}+\eps,f_{min}+2\eps,...,
497: f_{max}-\eps,f_{max}\}$. We call two individuals $i$ and $j$
498: $\delta$-similar if $d(i,j)\equiv|f(i)-f(j)|\leq\delta$. The
499: continuous valued case $F=[f_{min},f_{max}]$ is considered
500: later. In the following we assume $\delta<\eps$. In this case,
501: two individuals are $\delta$-similar if and only if they have the
502: same fitness.
503:
504: %------------------------------%
505: \subsection{The goal}
506: %------------------------------%
507: We have argued that in order to escape local optima, genetic
508: variety should be preserved somehow. One way is to limit the
509: number of $\delta$-similar individuals in the population. In an exact
510: fitness uniform distribution there would be $|P|/|F|$ individuals
511: for each of the $|F|$ fitness values, i.e.\ each fitness level
512: would be occupied by a fraction of $1/|F|$ individuals.
513: The following selection scheme asymptotically transforms any
514: finite population into a fitness uniform one.
515:
516: %------------------------------%
517: \subsection{The fitness uniform selection scheme (FUSS)}
518: %------------------------------%
519: FUSS is defined as follows: randomly select a fitness value $f$
520: uniformly from the fitness values $F$. Then, uniformly at random
521: select an individual $i\in P$ with fitness $f$. Add another copy of
522: $i$ to $P$.
523:
524: Note the two stage uniform selection process which is very
525: different from a one step uniform selection of an individual of
526: $P$ (see Figure \ref{figsel}).
527: \begin{figure}
528: \centerline{\includegraphics[width=1.0\columnwidth,height=0.6\textheight]{select.eps}}
529: \caption{\label{figsel}Effects of proportionate, truncation,
530: ranking \& tournament, uniform, and fitness uniform (FUSS) selection
531: on the fitness distribution in a generation based EA. The left/right
532: diagrams depict fitness distributions before/after applying the
533: selection schemes depicted in the middle diagrams. Note that for
534: populations with a non-Gaussian distribution of fitness values (left
535: column), the graph of selection probability vs. fitness for FUSS
536: (center bottom) can be totally different to that illustrated above,
537: however the population distribution that results (right bottom) will be
538: the same.}
539: \end{figure}
540: In STD, inertia increases with population size. A large mass of
541: unfit individuals reduces the probability of selecting fit
542: individuals. This is not the case for FUSS. Hence, without loss of
543: performance, we can define a {\em pure model}, in which no
544: individual is ever deleted; the population size increases with
545: time. No genetic material is ever discarded and no fine-tuning in
546: population size is necessary. What may prevent the pure model from
547: being applied to practical problems are not computation time
548: issues, but memory problems.
549: If space becomes a problem we delete random individuals, as is
550: usually done with a steady state EA.
551:
552: %------------------------------%
553: \subsection{Asymptotically fitness uniform distribution}
554: %------------------------------%
555: The expected number of
556: individuals per fitness level $f$ after $t$ selections is
557: $n_t(f)=n_0(f)+t/|F|$, where $n_0(f)$ is the initial distribution.
558: Hence, asymptotically each fitness level gets occupied uniformly
559: by a fraction
560: \beqn
561: {n_t(f)\over |P_t|} \;=\;
562: {n_0(f)+t/|F|\over |P_0|+t} \;\to\; {1\over |F|}
563: \quad\mbox{for}\quad t\to\infty,
564: \eeqn
565: where $P_t$ is the population at time $t$. The same limit holds if
566: each selection is accompanied by uniformly deleting one individual
567: from the (now constant sized) population.
568:
569: %------------------------------%
570: \subsection{Fitness gaps and continuous fitness}
571: %------------------------------%
572: We made two unrealistic assumptions. First, we assumed that each
573: fitness level is initially occupied. If the smallest/largest
574: fitness values in $P_t$ are $f_{min/max}^t$ we extend the
575: definition of FUSS by selecting a fitness value $f$ uniformly in
576: the interval $[f_{min}^t-\odt\eps,f_{max}^t+\odt\eps]$ and an
577: individual $i\in P_t$ with fitness nearest to $f$ (see Figure
578: \ref{figfuss}). This also covers the case when there are missing
579: intermediate fitness values, and also works for continuous valued
580: fitness functions ($\eps\to 0$).
581:
582: \begin{figure}
583: \centerline{\includegraphics[width=1.0\columnwidth,height=0.2\textheight]{FussPop.eps}}
584: \caption{\label{figfuss}If the lowest/highest fitness values
585: in the current population $P$ are $f_{min/max}$, FUSS selects a
586: fitness value $f$ uniformly in the interval $[f_{min},f_{max}]$,
587: then, the individual $i\in P$ with fitness nearest to $f$ is
588: selected and a copy is added to $P$, possibly after mutation and
589: recombination.}
590: \end{figure}
591:
592: %------------------------------%
593: \subsection{Mutation and recombination}
594: %------------------------------%
595: The second assumption was that there is no mutation and
596: recombination. In the presence of a small mutation and/or
597: recombination rate eventually each fitness level will become
598: occupied and the occupation fraction is still asymptotically
599: approximately uniform. For larger rate the distribution will be no
600: longer uniform, but the important point is that the occupation
601: fraction of {\em no} fitness level decreases to zero for
602: $t\to\infty$, unlike for STD.
603: Furthermore, FUSS selects by construction uniformly in the fitness
604: levels, even if the levels are not uniformly occupied.
605:
606: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
607: \section{Properties of FUSS}\label{secProp}
608: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
609:
610: \begin{figure}
611: \centerline{\includegraphics[width=1\columnwidth,height=0.2\textheight]{popinit.eps}}
612: \centerline{\includegraphics[width=1\columnwidth,height=0.2\textheight]{popstd.eps}}
613: \centerline{\includegraphics[width=1\columnwidth,height=0.2\textheight]{popfuss.eps}}
614: \caption{\label{figevolve}Evolution of the population under
615: FUSS versus standard selection schemes (STD): STD may get stuck
616: in a local optimum if all unfit individuals were eliminated too
617: quickly. In FUSS, all fitness levels remain occupied with ``free''
618: drift within and in-between fitness levels, from which new mutants
619: are steadily created, occasionally leading to further
620: evolution in a more promising direction.}
621: \end{figure}
622:
623: %------------------------------%
624: \subsection{FUSS effectively favors fit individuals}
625: %------------------------------%
626: FUSS preserves diversity better than STD, but the latter have a
627: (higher) selection pressure toward higher fitness, which is
628: necessary for optimization. At first glance it seems that there is
629: no such pressure at all in FUSS, but this is deceiving. As FUSS
630: selects uniformly in the fitness levels, individuals of low
631: populated fitness levels are effectively favored. The probability
632: of selecting a specific individual with fitness $f$ is inversely
633: proportional to $n_t(f)$ (see Figure \ref{figsel}). In an initial typical
634: (FUSS) population there are many unfit and only a few fit
635: individuals. Hence, fit individuals are effectively favored until
636: the population becomes fitness uniform. Occasionally, a new higher
637: fitness level is discovered and occupied by a single new
638: individual, which then, again, is favored.
639:
640: %------------------------------%
641: \subsection{No takeover in FUSS}
642: %------------------------------%
643: With FUSS, takeover of the highest fitness level never happens. The
644: concept of takeover time \cite{Goldberg:91} is meaningless for
645: FUSS. The fraction of fittest individuals in a population is always
646: small. This implies that the average population fitness is always much
647: lower than the best fitness. Actually, a large number of fit
648: individuals is usually not the true optimization goal. A single
649: fittest individual usually suffices to solve the optimization task.
650:
651: %------------------------------%
652: \subsection{FUSS may also favor unfit individuals}
653: %------------------------------%
654: Note, if it is also difficult to find individuals of low fitness,
655: i.e.\ if there are only a few individuals of low fitness, FUSS will
656: also favor these individuals. Half of the time is ``wasted'' in
657: searching on the wrong end of the fitness scale. This possible
658: slowdown by a factor of 2 is usually acceptable. In Section
659: \ref{secEx} we will see that in certain circumstances this
660: behavior can actually speedup the search. In general, fitness
661: levels which are difficult to reach, are favored.
662:
663: %------------------------------%
664: \subsection{Distribution within a fitness level}
665: %------------------------------%
666: Within a fitness level there is no selection pressure which could
667: further exponentially decrease the population in certain regions
668: of the individual space. This (exponential) reduction is the major
669: enemy of diversity, which is suppressed by FUSS. Within a fitness
670: level, the individuals freely drift around (by mutation).
671: Furthermore, there is a steady stream of individuals into and out
672: of a level by (d)evolution from (higher)lower levels.
673: Consequently, FUSS develops an equilibrium distribution which is
674: nowhere zero. This does not mean that the distribution within a
675: level is uniform. For instance, if there are two (local) maxima of
676: same height, a very broad one and a very narrow one, the broad one
677: may be populated much more than the narrow one, since it is much
678: easier to ``find''.
679:
680: %------------------------------%
681: \subsection{Steady creation of individuals from every fitness level}
682: %------------------------------%
683: In STD, a wrong step (mutation) at some point in evolution might cause
684: further evolution in the wrong direction. Once a local optimum has
685: been found and all unfit individuals were eliminated it is very
686: difficult to undo the wrong step. In FUSS, all fitness levels remain
687: occupied from which new mutants are steadily created, occasionally
688: leading to further evolution in a more promising direction (see Figure
689: \ref{figevolve}).
690:
691: %------------------------------%
692: \subsection{Transformation properties of FUSS}
693: %------------------------------%
694: FUSS (with continuous fitness) is independent of a scaling and a
695: shift of the fitness function, i.e.\ FUSS($\tilde f$) with $\tilde
696: f(i):=a\cdot f(i)+b$ is identical to FUSS($f$). This is true
697: even for $a<0$, since FUSS searches for maxima {\em and}
698: minima, as we have seen. It is not independent of a non-linear
699: (monotone) transformation unlike tournament, ranking and
700: truncation selection. The non-linear transformation properties are
701: more like the ones of proportionate selection.
702:
703:
704: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
705: \section{Fitness Uniform Deletion Scheme (FUDS)}\label{secFUDS}
706: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
707:
708: For a steady state evolutionary algorithm each cycle of the system
709: consists of both selecting which individual or individuals to
710: crossover and mutate, and then selecting which individual is to be
711: deleted in order to make space for the new child. The usual deletion
712: scheme used is \emph{random deletion} as this is neutral in the sense
713: that it does not bias the distribution of the population in any way
714: and does not require additional work to be done, such as evaluating
715: the similarity of individuals based on their genes. Another common
716: strategy is to use an elitist deletion scheme.
717:
718: Here we propose to use the similarity semi-metric $d$ defined in
719: Section \ref{secSim} to achieve a uniform distribution across fitness
720: levels, like with FUSS, except that we achieve this by selectively
721: deleting those members of the population which have very commonly
722: occurring fitness values. Of course this leaves the selection scheme
723: unspecified, indeed we may use any standard selection scheme such as
724: tournament selection in combination with FUDS. It also means that we
725: lose one of the nice features of FUSS as we now need to manually tune
726: the selection intensity for our application --- FUSS of course is
727: parameterless. Nevertheless it allows us to give many FUSS like
728: properties to an existing EA using a standard selection scheme with
729: only a minor modification to the deletion scheme.
730:
731: The intuition behind why FUDS preserves population diversity is very
732: simple: If an individual has a fitness value which is very rare in the
733: population then this individual almost certainly contains unique
734: information which, if it were to be deleted, would decrease the total
735: population diversity. Conversely, if we delete an individual with
736: very commonly occurring fitness then we are unlikely to be losing
737: significant diversity. Presumably most of these individuals are
738: common in some sense and likely exist in parts of the solution space
739: which are easy to reach. Thus the fitness uniform deletion strategy
740: is now clear: Only delete individuals with very commonly occurring
741: fitness values as these individuals are less likely to contain
742: important genetic diversity.
743:
744: Practically FUDS is implemented as follows. Let $f_{min}$ and
745: $f_{max}$ be the minimum and maximum fitness values possible for a
746: problem, or at least reasonable upper and lower bounds. We divide the
747: interval $[f_{min}, f_{max}]$ into a collection of subintervals of
748: equal length $\{ [f_{min}, f_{min} + a ), [f_{min} + a, f_{min} + 2a
749: ), \ldots, [f_{max}-a, f_{max}] \}$ which we call \emph{fitness
750: levels}. As individuals are added to the population their fitness is
751: computed and they are placed in the set of individuals corresponding
752: to the fitness level they belong to. Thus the number of individuals
753: in each fitness level describes how common fitness values within this
754: interval are in the current population. When a deletion is required
755: the algorithm locates the fitness level with the greatest number of
756: individuals and then deletes a random individual from this level. In
757: the case where multiple fitness levels have maximal size the lowest of
758: these levels is used.
759:
760: If the number of fitness levels is chosen too low, say 5 levels, then
761: the resulting model of the distribution of individuals across the
762: fitness range will be too coarse. Alternatively if a large number of
763: fitness levels is used with a very small population the individuals
764: may become too thinly spread across the fitness levels. While in
765: these extreme cases this could affect the performance of FUDS, in
766: practice we have found that the system is not very sensitive to the
767: setting of this parameter. If $n$ is the population size then setting
768: the number of fitness levels to be $\sqrt{n}$ is a good rule of thumb.
769:
770: For discrete valued fitness functions there is a natural lower bound on
771: the interval length $a$ because below a certain value there will be
772: more intervals than unique fitness values. Of course this cannot
773: happen when the fitness function is continuous. Other than this small
774: technical detail, the two cases are treated identically.
775:
776: As FUDS spreads the individuals out across a wide range of fitness
777: values, for small populations the EA may become inefficient as only a
778: few individuals will have relatively high fitness. For problems which
779: are not deceptive this is especially true as there will be little
780: value in having individuals in the population with low to medium
781: fitness. Of course these are not the kinds of problems for which FUDS
782: was designed. In practice we have always used populations of between
783: 250 and 5,000 individuals and have not observed a decline in
784: performance relative to random deletion at the lower end of this
785: range.
786:
787: An alternative implementation that avoids discretization is to choose
788: the two individuals that have the most similar fitness and delete one
789: of them. An efficient implementation keeps a list of the individuals
790: ordered by their fitness along with an ordered list of the distances
791: between the individuals. Then in each cycle one of the two
792: individuals with closest fitness to each other is selected for
793: deletion. Although the performance of this algorithm was better than
794: random deletion, it was not as good as the implementation of FUDS
795: using bins. We conjecture that the reason for this is as follows:
796: When there are just a few very fit individuals in the population it is
797: quite likely that they will be highly related to each other and have
798: very similar fitness. This means that if we delete the individuals
799: with most similar fitness it is likely that many of the very fit
800: individuals will be deleted. However with the bins approach this will
801: not happen as there are typically few individuals in the high fitness
802: bins. Thus, although deleting one of the closest individuals in terms
803: of fitness might preserve diversity well, it also changes the pressure
804: on the population distribution over fitness levels. This small change
805: in distribution dynamics appears to reduce performance in practice.
806:
807:
808: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
809: \section{Properties of FUDS}\label{secPropFUDS}
810: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
811:
812: As FUDS uniformly distributes the population across fitness levels,
813: like FUSS does, many of the key properties of FUSS also carry over to
814: an EA that is using a standard selection scheme (STD) combined with
815: FUDS deletion.
816:
817: %------------------------------%
818: \subsection{No takeover in FUDS}
819: %------------------------------%
820: Under FUDS the takeover of the highest fitness level, or indeed any
821: fitness level, is impossible. This is easy to see because as soon as
822: any fitness level starts to dominate, all of the deletions become
823: focused on this level until it is no longer the most populated fitness
824: level. As a by-product, this also means that individuals on
825: relatively unpopulated fitness levels are preserved.
826:
827: %------------------------------%
828: \subsection{Steady creation of individuals from every fitness level}
829: %------------------------------%
830: Another similarity with FUSS is the steady creation of individuals on
831: many different fitness levels. This occurs because under FUDS some
832: individuals on each fitness level are always kept. This makes it
833: relatively easy for the EA to find its way out of local optima as it
834: keeps on exploring evolutionary paths which do not at first appear to
835: be promising.
836:
837: %------------------------------%
838: \subsection{Robust performance with respect to selection intensity}
839: %------------------------------%
840: Because FUDS is only a deletion scheme, we still need to choose a
841: selection scheme for the EA. Of course this selection scheme may then
842: require us to set a selection intensity parameter. While this is not
843: as desirable as FUSS, which has no such parameter, at least with FUDS
844: we expect the performance of the system to be less sensitive to the
845: correct setting of this parameter. For example, if the selection
846: intensity is set too high the normal problem is that the population
847: rushes into a local optimum too soon and becomes stuck before it has
848: had a chance to properly explore the genotype space for other
849: promising regions. However, as we noted above, with FUDS a total
850: collapse in population diversity is impossible. Thus much higher
851: levels of selection intensity may be used without the risk of
852: premature convergence.
853:
854: In some situations if very low section intensity is used along with
855: random deletion, the population tends not to explore the higher areas
856: of the fitness landscape at all. This can be illustrated by a simple
857: example. Consider a population which contains 1,000 individuals.
858: Under random deletion all of these individuals, including the highly
859: fit ones, will have a 1 in 1,000 chance of being deleted in each cycle
860: and so the expected life time of an individual is 1,000 deletion
861: cycles. Thus if a highly fit individual is to contribute a child of
862: the same fitness or higher, it must do so reasonably quickly. However
863: for some optimization problems the probability of a fit individual
864: having such a child when it is selected is very low, so low in fact
865: that it is more likely to be deleted before this happens. As a result
866: the population becomes stuck, unable to find individuals of greater
867: fitness before the fittest individuals are killed off.
868:
869: The usual solution to this problem is to increase the selection
870: intensity because then the fit individuals are selected more often and
871: thus are more likely to contribute a child of similar or greater
872: fitness before they are deleted. Another is to change the deletion
873: scheme so that these individuals live longer. This is what happens
874: with FUDS as rare fit individuals are not deleted. Effectively it
875: means that with FUDS we can often use much lower selection intensity
876: without the population becoming stuck.
877:
878: %------------------------------%
879: \subsection{Transformation properties of FUDS}
880: %------------------------------%
881: While with FUDS we have the added complication of having to choose the
882: number of subintervals with which to break up the fitness values, this
883: number is only a function of the population size and distributional
884: characteristics of the problem. Thus any linear transformation of the
885: fitness function has no effect on FUDS. However, non-linear
886: transformations will affect performance.
887:
888: %------------------------------%
889: \subsection{Problem and representation independence}
890: %------------------------------%
891: Because FUDS only requires the fitness of individuals, the method is
892: completely independent of the problem and genotype representation,
893: i.e.\ how the individuals are coded.
894:
895: %------------------------------%
896: \subsection{Simple implementation and low computational cost}
897: %------------------------------%
898: As the algorithm is simple and the fitness function is given as part
899: of the problem specification, FUDS is very easy to implement and
900: requires few computational resources.
901:
902:
903: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
904: \section{A Simple Example}\label{secEx}
905: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
906:
907: In the following, we use a simple example problem to compare the
908: performance of fitness uniform selection (FUSS), random search (RAND)
909: and standard selection (STD), each used both with and without
910: recombination. We also examine the performance of standard selection
911: when used with the fitness uniform deletion scheme (FUDS). We regard
912: this problem as a prototype for deceptive multi-modal functions. The
913: example demonstrates how FUSS and FUDS can be superior to RAND and STD
914: in some situations. More generic situations will be considered in
915: Section \ref{secTree}. An experimental analysis of this problem
916: appears in Section~\ref{secEx2}.
917:
918: %------------------------------%
919: \subsection{Simple 2D example}
920: %------------------------------%
921: Consider individuals $(x,y)\in I:=[0,1]\times[0,1]$,
922: which are tuples of real numbers, each coordinate in the interval $[0,1]$.
923: The example models individuals possessing up to 2 ``features''.
924: Individual $i$ possesses feature $I_1$ if
925: $i\in I_1:=[a,a+\Delta]\times[0,1]$, and feature $I_2$
926: if $i\in I_2:=[0,1]\times[b,b+\Delta]$.
927: The fitness function $f:I\to\{1,2,3\}$ is defined as
928: \beqn
929: f(x,y) = \left\{
930: \begin{array}{l}
931: 1 \quad\mbox{if}\quad (x,y)\in I_1\backslash I_2, \\
932: 2 \quad\mbox{if}\quad (x,y)\in I_2\backslash I_1, \\
933: 3 \quad\mbox{if}\quad (x,y)\not\in I_1\cup I_2, \\
934: 4 \quad\mbox{if}\quad (x,y)\in I_1\cap I_2. \\
935: \end{array}\right.
936: \parbox{2cm}{\hfill \unitlength=0.6mm
937: %\linethickness{0.4pt}
938: \begin{picture}(45,45)
939: \scriptsize
940: \put(5,5){\vector(0,1){40}}
941: \put(5,5){\vector(1,0){40}}
942: \put(20,5){\line(0,1){35}}
943: \put(25,5){\line(0,1){35}}
944: \put(40,5){\line(0,1){35}}
945: \put(5,15){\line(1,0){35}}
946: \put(5,20){\line(1,0){35}}
947: \put(5,40){\line(1,0){35}}
948: \put(22.5,17.5){\makebox(0,0)[cc]{4}}
949: \put(12.5,30){\makebox(0,0)[cc]{3}}
950: \put(22.5,30){\makebox(0,0)[cc]{1}}
951: \put(32.5,30){\makebox(0,0)[cc]{3}}
952: \put(32.5,10){\makebox(0,0)[cc]{3}}
953: \put(12.5,10){\makebox(0,0)[cc]{3}}
954: \put(12.5,17.5){\makebox(0,0)[cc]{2}}
955: \put(32.5,17.5){\makebox(0,0)[cc]{2}}
956: \put(22.5,10){\makebox(0,0)[cc]{1}}
957: \put(44,2.5){\makebox(0,0)[cc]{$x$}}
958: \put(22.5,2.5){\makebox(0,0)[cc]{$\Delta$}}
959: \put(20,3.5){\makebox(0,0)[cc]{$a$}}
960: \put(2.5,17.5){\makebox(0,0)[cc]{$\Delta$}}
961: \put(4,14.5){\makebox(0,0)[cc]{$b$}}
962: \put(40,3){\makebox(0,0)[cc]{1}}
963: \put(3.5,40){\makebox(0,0)[cc]{1}}
964: \put(2.5,44){\makebox(0,0)[cc]{$y$}}
965: \put(22.5,42.5){\makebox(0,0)[cc]{$f(x,y)$}}
966: \end{picture}
967: }
968: \eeqn
969: We assume $\Delta\ll 1$. Individuals with neither of the two features
970: ($i\in I\backslash(I_1\cup I_2)$) have fitness $f=3$. These ``local
971: $f=3$ optima'' occupy most of the individual space $I$, namely a
972: fraction $(1-\Delta)^2$. It is disadvantageous for an individual to
973: possess only one of the two features ($i\in(I_1\backslash I_2)\cup
974: (I_2\backslash I_1)$), since $f=1$ or 2 in this case. In combination
975: ($i\in I_1\cap I_2)$), the two features lead to the highest fitness,
976: but the global maximum $f=4$ occupies the smallest fraction $\Delta^2$
977: of the individual space $I$. With a fraction $\Delta(1-\Delta)$, the
978: $f=1/f=2$ minima are in between. The example has sort of an XOR
979: structure, which is hard for many optimizers.
980:
981: %------------------------------%
982: \subsection{Random search}
983: %------------------------------%
984: Individuals are created uniformly in the unit square. The ``local
985: optimum'' $f=3$ is easy to ``find'', since it occupies nearly the
986: whole space. The global optimum $f=4$ is difficult to find, since it
987: occupies only $\Delta^2\ll 1$ of the space. The expected time, i.e.\
988: the expected number of individuals created and tested until one with
989: $f=4$ is found, is $T_{RAND}={1\over\Delta^2}$. Here and in the
990: following, the ``time'' $T$ is defined as the expected number of
991: created individuals until the {\it first} optimal individual (with
992: $f=4$) is found. $T$ is neither a takeover time nor the number of
993: generations (we consider steady-state EAs).
994:
995: %------------------------------%
996: \subsection{Random search with crossover}
997: %------------------------------%
998: Let us occasionally perform a recombination of individuals in the
999: current population. We combine the $x$-coordinate of one uniformly
1000: selected individual $i_1$ with the $y$ coordinate of another
1001: individual $i_2$. This crossover operation maintains a uniform
1002: distribution of individuals in $[0,1]^2$. It leads to the global
1003: optimum if $i_1\in I_1$ and $i_2\in I_2$. The probability of
1004: selecting an individual in $I_i$ is
1005: $\Delta(1-\Delta)\approx\Delta$ (we assumed that the global
1006: optimum has not yet been found). Hence, the probability that $I_1$
1007: crosses with $I_2$ is $\Delta^2$. The time to find the global
1008: optimum by random search including crossover is still
1009: $\sim{1\over\Delta^2}$ ($\sim$ denotes asymptotic proportionality).
1010:
1011: %------------------------------%
1012: \subsection{Mutation}
1013: %------------------------------%
1014: The result remains valid (to leading order in ${1\over\Delta}$)
1015: if, instead of a random search, we uniformly select an individual
1016: and mutate it according to some probabilistic, sufficiently mixing
1017: rule, which preserves uniformity in $[0,1]$. One popular such
1018: mutation operator is to use a sufficiently long binary
1019: representation of each coordinate, like in genetic algorithms, and
1020: flip a single bit. For simplicity we assume in the following a
1021: mutation operator which replaces with probability $\odt/\odt$ the
1022: first/second coordinate by a new uniform random number. Other
1023: mutation operators which mutate with probability $\odt/\odt$ the
1024: first/second coordinate, preserve uniformity, are sufficiently
1025: mixing, and leave the other coordinate unchanged (like the
1026: single-bit-flip operator) lead to the same scaling of $T$ with
1027: $\Delta$ (but with different proportionality constants).
1028:
1029: %------------------------------%
1030: \subsection{Standard selection with crossover}
1031: %------------------------------%
1032: The $f=1$ and $f=2$ individuals contain useful building
1033: blocks, which could speedup the search by a suitable selection and
1034: crossover scheme. Unfortunately, the standard selection schemes
1035: favor individuals of higher fitness and will diminish the
1036: $f=1/f=2$ population fraction. The probability of
1037: selecting $f=1/f=2$ individuals is even smaller than in
1038: random search. Hence $T_{STD}\sim{1\over\Delta^2}$. Standard
1039: selection does not improve performance, even not in combination
1040: with crossover, although crossover is well suited to produce the
1041: needed recombination.
1042:
1043: %------------------------------%
1044: \subsection{FUSS}
1045: %------------------------------%
1046: At the beginning, only the $f=3$ level is occupied and
1047: individuals are uniformly selected and mutated. The expected time
1048: until an $f=1$ or $f=2$ individual in $I_1\cup I_2$ is created is
1049: $T_1\approx{1\over \Delta}$ (not ${1\over 2\Delta}$, since only
1050: one coordinate is mutated). From this time on FUSS will select one
1051: half(!) of the time the $f=1/f=2$ individual(s) and only the
1052: remaining half the abundant $f=3$ individuals. When level
1053: $f=1$ {\em and} level $f=2$ are occupied, the selection
1054: probability is ${1\over 3}+{1\over 3}$ for these levels.
1055: With probability $\odt$ the
1056: mutation operator will mutate the $y$ coordinate of an individual
1057: in $I_1$ or the $x$ coordinate of an individual in $I_2$ and
1058: produces a new $f=1/2/4$ individual. The relative probability
1059: of creating an $f=4$ individual is $\Delta$. The expected time
1060: to find this global optimum from the $f=1/f=2$ individuals, hence,
1061: is $T_2=[({1\over 2}...{2\over 3})\times{1\over
1062: 2}\times\Delta]^{-1}$. The total expected time is
1063: $T_{FUSS}\approx T_1+T_2= {4\over\Delta}...{5\over\Delta}\ll
1064: {1\over\Delta^2}\sim T_{STD}$. FUSS is much faster by exploiting
1065: unfit $f=1/f=2$ individuals. This is an example where (local)
1066: minima can help the search. Examples where a low local maxima
1067: can help in finding the global maximum, but where standard
1068: selection sweeps over too quickly to higher but useless local
1069: maxima, can also be constructed.
1070:
1071: %------------------------------%
1072: \subsection{FUSS with crossover}
1073: %------------------------------%
1074: The expected time until an $f=1$ individual in $I_1$ and an
1075: $f=2$ individual in $I_2$ is found is $T_1\sim{1\over
1076: \Delta}$, even with crossover. The probability of selecting an
1077: $f=1/f=2$ individual is ${1\over 3}/{1\over 3}$. Thus, the
1078: probability that a crossing operation crosses $I_1$ with $I_2$ is
1079: $({1\over 3})^2$. The expected time to find the global optimum
1080: from the $f=1/f=2$ individuals, hence, is $T_2=9\cdot O(1)$,
1081: where the $O(1)$ factor depends on the frequency of crossover
1082: operations. This is far faster than by STD, even if the
1083: $f=1/f=2$ levels were local maxima, since to get a high standard
1084: selection probability, the level has first to be taken over, which
1085: itself needs some time depending on the population size. In FUSS a
1086: single $f=1$ and a single $f=2$ individual suffice to
1087: guarantee a high selection probability and an effective crossover.
1088: Crossover does not significantly decrease the {\em total} time
1089: $T_{FUSSX}\approx T_1+T_2\sim {1\over \Delta}+O(9)$, but for a
1090: suitable 3D generalization we get a large speedup by a factor of
1091: ${1\over\Delta}$.
1092:
1093: %------------------------------%
1094: \subsection{FUDS with crossover}
1095: %------------------------------%
1096: Assume that initially all of the individuals have $f=3$ and that we
1097: are using random selection. For any mutation the probability of the
1098: child being in $I_1 \cup I_2$ is $\Delta$. Until $I_1 \cup I_2$
1099: becomes quite full FUDS will never delete individuals from these
1100: areas. Furthermore if an individual in $I_1 \cup I_2$ is mutated then
1101: the mutant will also be in $I_1 \cup I_2$ with probability
1102: $\frac{1}{2}( 1 + \Delta) \gg \Delta$. Therefore while most of the
1103: population has $f=3$ we can lower bound the probability of a new child
1104: being in $I_1 \cup I_2$ by $\Delta$. It then follows that if $P$ is
1105: the size of the population we can upper bound the expected time for
1106: $I_1 \cup I_2$ to contain half the total population by $\frac{P}{2}
1107: \frac{1}{\Delta}
1108: \propto \frac{1}{\Delta}$. Once this occurs (and most likely well
1109: before this point) crossover will produce an individual with $f=4$
1110: almost immediately by crossing a member of $I_1$ with a member of
1111: $I_2$. Thus $T_{FUDS} \propto \frac{1}{\Delta} \ll \frac{1}{\Delta^2}
1112: \sim T_{STD}$. This gives FUDS when used with random selection scaling
1113: characteristics which are similar to FUSS. If we use a selection
1114: scheme with higher intensity our bound on the expected time for half
1115: the population to have $f=3$ remains unchanged as the bound holds in
1116: the worst case situation where only individuals with $f=3$ are
1117: selected. However higher selection intensity makes the final
1118: crossover required to find an individual with $f=4$ less likely. For
1119: moderate levels of selection intensity this is clearly not a
1120: significant factor and more importantly it is O(1) and independent of
1121: $\Delta$. Thus the order of scaling for $T_{FUDS}$ is just
1122: $\frac{1}{\Delta}$ for this difficult problem, which is the same as
1123: $T_{FUSSX}$.
1124:
1125: %------------------------------%
1126: \subsection{Simple 3D example}
1127: %------------------------------%
1128: We generalize the 2D example to D-dimensional individuals
1129: $\vec x\in[0,1]^D$ and a
1130: fitness function
1131: \beqn
1132: f(\vec x) \;:=\; (D+1)\!\cdot\!\prod_{d=1}^D\chi_d(\vec x)\;
1133: - \max_{1\leq d\leq D} d\!\cdot\!\chi_d(\vec x)\; +D+1,
1134: \eeqn
1135: where $\chi_d(\vec x)$ is the characteristic function of feature
1136: $I_d$
1137: \beqn
1138: \chi_d(\vec x) \;:=\; \left\{
1139: \begin{array}{l}
1140: 1 \quad\mbox{if}\quad a_i\leq x_i\leq a_i+\Delta, \\
1141: 0 \quad\mbox{else.} \\
1142: \end{array}\right.
1143: \eeqn
1144: For $D=2$, $f$ coincides with the 2D example. For $D=3$,
1145: the fractions of $[0,1]^3$ where $f=1/2/3/4/5$ are approximately
1146: $\Delta^2/\Delta^2/\Delta^2/1/\Delta^3$.
1147: With the same line of reasoning we get the following expected
1148: search times for the global optimum:
1149: \beqn
1150: T_{RAND}\sim T_{STD}\sim {1\over\Delta^3},
1151: \eeqn \beqn
1152: T_{FUSS}\sim {1\over\Delta^2},\quad
1153: T_{FUSSX}\sim T_{FUDS} \sim {1\over\Delta}.
1154: \eeqn
1155: This demonstrates the existence of problems where FUSS is much faster
1156: than RAND and STD, and where crossover can give a further boost to
1157: FUSS, even when it is ineffective in combination with STD.
1158:
1159:
1160: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1161: \section{Fitness-Tree Analysis}\label{secTree}
1162: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1163:
1164: \begin{figure}
1165: \centerline{\includegraphics[width=\columnwidth,height=0.25\textheight]{tree.eps}}
1166: \caption{\label{figtree}Generic 2D fitness landscape with
1167: evolution tree. Each connected slice represents a species. A species
1168: is also symbolized by a node in the slice. The number in a slice and
1169: near a node is the fitness value of the species. If individuals from
1170: one species can evolve to individuals of another species, the nodes are
1171: connected by a solid line. Altogether, they form the fitness tree. The
1172: branching factor $b$ is $2$ and the number of species per fitness
1173: level $s$ is $4$ for intermediate fitness values (3,4,5).}
1174: \end{figure}
1175:
1176: %------------------------------%
1177: subsection{The fitness tree model}
1178: %------------------------------%
1179: A general, problem independent comparison of the various
1180: optimization algorithms is difficult. We are interested in the
1181: performance for difficult fitness landscapes with many local
1182: optima.
1183:
1184: We only consider mutation; recombination is discussed in the next
1185: section. The evolutionary neighborhood (not to be confused with
1186: $d$-similarity) of an individual $i$ is defined as the set of
1187: individuals that can be created from $i$ by a single
1188: mutation\footnote{We have ``small'' mutations in mind, e.g.\ single
1189: bit flips, not macro mutations, which connect {\em all}
1190: individuals.}. Two individuals $i$ and $j$ with the same fitness are
1191: defined to belong to the same {\em species} if there is a finite
1192: sequence of mutations which transforms $i$ into $j$ {\em and} all
1193: individuals of the sequence also have fitness $f(i)=f(j)$. Each
1194: fitness level is partitioned in this way into disjoint species. We say
1195: a species of fitness $f+\eps$ can {\em evolve} from a species of
1196: fitness $f$, if there is a mutation which transforms an individual
1197: from the latter species to one of the former. Those species are
1198: connected by an edge in Figures \ref{figtree} and
1199: \ref{figtree1d}. A species is said to be {\em promising} if it
1200: {\em can} evolve to the global optimum $f_{max}$.
1201:
1202:
1203: %------------------------------%
1204: \subsection{Additional definitions and simplifying assumptions}
1205: %------------------------------%
1206: \begin{itemize}\parskip=0ex\parsep=0ex\itemsep=0ex
1207: \item[i)] Evolution which skips fitness levels is ignored, and also
1208: devolution to species of lower fitness other than the
1209: primordial species.
1210: \item[ii)] Random individuals have
1211: lowest fitness $f_{min}$ with high probability, and there is
1212: only one species of fitness $f_{min}$.
1213: \item[iii)] There is a fixed branching factor $b$, i.e.\ each species
1214: can evolve into $b$ improved species, or represents a local optimum
1215: from which no further evolution is possible.
1216: \item[iv)] There is a single global optimum
1217: $f_{max}$ (or $b$ optima to be consistent with the previous item).
1218: \item[v)] There are $s$ different species per fitness level (except
1219: near $f_{min}$ and $f_{max}$ where there must be fewer to be consistent
1220: with the previous items).
1221: \item[vi)] The probability $p$ that an individual evolves to a higher
1222: fitness is very small. In most cases a mutation keeps an
1223: individual within its species or devolves it.
1224: \item[vii)] The probability to evolve to one of the offspring species is
1225: uniform, i.e.\ $1/b$ for all offspring species.
1226: \end{itemize}
1227:
1228: We have the feeling that this picture covers the essential
1229: features of fitness landscapes for difficult problems. The
1230: qualitative conclusions we will draw should still hold when some
1231: or all of the additional simplifying assumptions are violated.
1232:
1233: \begin{figure}
1234: \centerline{\includegraphics[width=\columnwidth,height=0.25\textheight]{tree1d.eps}}
1235: \caption{\label{figtree1d}
1236: Generic fitness function with evolution tree. Individuals which
1237: are evolutionary neighbors are connected by a dashed line. They
1238: belong to the species indicated by a node on the dashed
1239: line. A species which can evolve from another is connected to
1240: it by a solid line. The smooth curve
1241: visualizes (somewhat misleading, since the fitness is discrete)
1242: the fitness function with many local maxima. \vspace{1.1em}}
1243: \end{figure}
1244:
1245: %------------------------------%
1246: \subsection{Example}
1247: %------------------------------%
1248: Consider the case of individuals, which are real-valued $D$
1249: dimensional vectors, i.e.\ $I=\SetR^D$. Let the fitness function
1250: $\tilde f$ be continuous and positive with many local maxima,
1251: which tends to zero for large arguments. This covers a large range
1252: of physical optimization problems. Mutation shall be local in
1253: $\SetR^D$, i.e. $||i_{original}-i_{mutated}||\ll D$. As FUSS and
1254: the fitness tree model is only defined for discrete fitness
1255: functions, we discretize $\tilde f$ to
1256: $f:=\,_\lfloor{1\over\tilde\eps}\tilde f_\rfloor$, which is
1257: acceptable for sufficiently small $\tilde\eps$. A typical fitness
1258: landscape for $D=2$ and $D=1$ together with their fitness tree are
1259: depicted in Figures \ref{figtree} and \ref{figtree1d}. Since
1260: mutation is a local operation, each species is a (possibly
1261: multiply punched) connected slice ($D$-dimensional sub-volume) and
1262: evolution can only occur from $f$ to $f+1$ ($\eps=1$). Assumption
1263: (i) is generally satisfied. The special fitness landscapes
1264: depicted in Figures \ref{figtree} and \ref{figtree1d} also satisfy
1265: (ii,iii,iv,v) with $b=2$ and $s=4$.
1266:
1267: %------------------------------%
1268: \subsection{Random walk}
1269: %------------------------------%
1270: Consider a mutation induced random walk of a single individual. Due
1271: to the low evolution probability $p\ll 1$, most of the time will be
1272: spent on individuals of the lowest fitness $f_{min}$. As evolution is
1273: a tree, there is only one evolution sequence which leads to the global
1274: optimum. At each evolution step, the correct offspring species (out of
1275: $b$) has to be evolved. The probability of an evolution step in the
1276: right direction, hence, is $p/b$. $|F|$ evolution steps are necessary
1277: to reach $f_{max}$. Therefore, the expected time to find the global
1278: maximum by random walk is $T_{RW}\approx(b/p)^{|F|}$. Random walk is
1279: very slow; it is exponential in the number of fitness levels $|F|$ to
1280: a very large basis $b/p$.
1281:
1282: %------------------------------%
1283: \subsection{FUSS}
1284: %------------------------------%
1285: Assume that $L$ fitness levels from $f_{min}$ to $f$ are occupied.
1286: The probability that FUSS selects an individual of fitness $f$ is
1287: $1/L$. Under this additional assumption that the occupation of species
1288: within one fitness level is approximately uniform most of the time,
1289: the probability of selecting an individual of the promising species,
1290: which can evolve to the global optimum, is $1/s$. The probability of
1291: an evolution step in the right direction is $p/b$ as in the random
1292: walk case. Hence, the total expected time for an evolution in the
1293: right direction is $L\cdot s\cdot b/p$. The total time
1294: $T_{FUSS}\approx\odt|F|^2\cdots\cdot b/p$ for an evolution from $L=1$
1295: to the global optimum $L=|F|$ is obtained by summation over
1296: $L=1...|F|$.
1297:
1298: %------------------------------%
1299: \subsection{FUDS}
1300: %------------------------------%
1301: A similar analysis can be applied to FUDS. Assume again that the $L$
1302: fitness levels from $f_{min}$ to $f$ are occupied and that the
1303: occupation of species within each fitness level is approximately
1304: uniform most of the time. Because FUDS tends to spread the population
1305: out, like FUSS, this assumption is not unreasonable. As FUDS is only
1306: a deletion scheme we must also specify a selection scheme. For our
1307: analysis we will take a very simple elitist selection scheme that half
1308: of the time selects an individual from the highest fitness level, and
1309: the other half of the time selects an individual from a lower level.
1310: It follows then that the probability of selecting a promising species
1311: is $1/2s$ and the probability that this then results in an
1312: evolutionary step in the right direction is $p/b$. Thus the total
1313: expected time for an evolutionary step in the right direction is $2
1314: \cdot s \cdot b/p$. Therefore by summation the total expected time to
1315: evolve to the global optimum is $T_{FUDS} \approx 2|F| \cdot s \cdot
1316: b/p$. Of course this analysis rests on our choice of selection scheme
1317: and the assumptions about the uniformity of the population that we
1318: have made. When FUDS is used with selection schemes which are very
1319: greedy these uniformity assumptions will likely be violated and less
1320: favorable bounds could result.
1321:
1322: %------------------------------%
1323: \subsection{Standard selection}
1324: %------------------------------%
1325: We assumed a fixed number of $s$ species per fitness level and $0$ or
1326: $b$ offspring species. This implies that only a fraction of $1/b$
1327: species can evolve to higher fitness. We assume that fitness level
1328: $f$ has been taken over, i.e.\ most individuals have fitness $f$. The
1329: probability of evolution is $p$. A significant fraction (for
1330: simplicity we assume most) of the $|P|$ individuals must evolve to the
1331: next fitness level before evolution with a relevant rate can occur to
1332: the next to next level. Hence, the time to take over the next fitness
1333: level is roughly $|P|\cdot b/p$. As there are $|F|$ fitness levels,
1334: the total time is $T_{STD}
1335: \approxgeq |F|\cdot|P|\cdot b/p$.
1336:
1337: %\subsection{The problem of the population size}
1338: We wrote $\approxgeq$ as we have made two significant favorable
1339: assumptions. In order to ensure convergence, the promising species in
1340: the current fitness level has to be occupied. If we assume a uniform
1341: occupation of species within one fitness level, as for FUSS, this
1342: means that all species of the current fitness level have to be
1343: populated. As there are $s$ species, $|P|$ has to be at least $s$,
1344: which can be quite large. On the other hand, STD linearly slows down
1345: with $|P|$, unlike FUSS. Hence, there is a trade-off in the choice of
1346: $|P|$.
1347:
1348: %\subsection{The problem of non-promising takeover}
1349: More serious is the following problem. Assume that the first
1350: individual evolved with fitness $f+\eps$ is one in a non-promising
1351: species $a$. Due to selection pressure it might happen that
1352: species $a$ takes over the whole population before all (or at
1353: least the promising) species with fitness $f+\eps$ can evolve from
1354: the ones of fitness $f$. The probability to find the global
1355: optimum in the worst case scenario, where at each level only one
1356: species is occupied, is $(1/b)^{|F|}$. This is the original
1357: problem of the loss of genetic diversity discussed at the outset,
1358: which lead to the invention of FUSS.
1359:
1360: %\subsection{Conventional fix(es)}
1361: Every other fix the authors are aware of only seems to diminish the
1362: problem, but does not solve it. One fix is to repeatedly restart
1363: the EA, but the huge number of $b^{|F|}$ restarts might be
1364: necessary. The time is exponential in $|F|$ like for random walk
1365: but with a smaller basis $b$. The true time is expected to be
1366: somewhere in between $|F|\cdot|P|\cdot b/p$ and this worst
1367: case analysis, although an unfavorable setting may never reach the
1368: global optimum ($T_{STD}=\infty$ in this case).
1369:
1370:
1371: %------------------------------%
1372: \subsection{Performance comparison}
1373: %------------------------------%
1374: The times $T_{FUSS}$, $T_{FUDS}$ and $T_{STD}$ should be regarded, at
1375: best, as rules of thumb, since the derivation was rather heuristic due
1376: to the list of assumptions. The quotient is more reliable:
1377: \beqn
1378: {T_{FUSS}\over T_{STD}} \quad\approxleq\quad
1379: {|F|\!\cdot\!s\over 2|P|} \quad\approxleq\quad
1380: \odt|F| \quad\leq\quad |F|,
1381: \eeqn
1382: and
1383: \beqn
1384: {T_{FUDS}\over T_{STD}} \quad\approx\quad
1385: {\frac{s }{|P|}} \quad\approx\quad 1.
1386: \eeqn
1387:
1388: We will give a more direct argument in Section \ref{secCross} that
1389: the slowdown of FUSS relative to STD is at most $|F|$.
1390:
1391: Finally, a truism has been recovered, namely that an EA can, under
1392: certain circumstances, be much faster than random walk, that is,
1393: $T_{RW}\gg T_{FUSS}, T_{FUDS}, T_{STD}$.
1394:
1395:
1396: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1397: \section{Scale-Independent Selection and Recombination}\label{secCross}
1398: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1399:
1400: %------------------------------%
1401: \subsection{Worst case analysis}
1402: %------------------------------%
1403: We now want to estimate the maximal possible slowdown of FUSS
1404: compared to STD.
1405: %\subsection{Best case for STD}
1406: Let us assume that all individuals in STD have fitness $f$, and
1407: once one individual with fitness $f+\eps$ has been found,
1408: takeover of level $f+\eps$ is quick. Let us assume that this
1409: quick takeover is actually good (e.g.\ if there are no local maxima).
1410: The selection probability of individuals of same fitness is equal.
1411: %\subsection{Worst case for FUSS}
1412: For FUSS we assume individuals in the range of $f_{min}$ and $f$.
1413: Uniformity is {\em not} necessary. In the worst case, a selection of
1414: an individual of fitness $<f$ never leads to an individual of
1415: fitness $\geq f$, i.e.\ is always useless. The probability of selecting
1416: an individual with fitness $f$ is $\geq{1\over|F|}$.
1417: %\subsection{Comparison}
1418: At least every $|F|th$ FUSS selection corresponds to a STD
1419: selection. Hence, we expect a maximal slowdown by a factor of
1420: $|F|$, since FUSS ``simulates'' STD statistically every $|F|th$
1421: selection.
1422: %
1423: It is possible to construct problems where this slowdown occurs
1424: (unimodal function, local mutation $x\to x\pm\eps$, no
1425: crossover). Gradient ascent would be the algorithm of choice in this
1426: case. On the other hand, we have not observed this slowdown in our
1427: simple 2D example and the TSP experiments, where FUSS outperformed STD
1428: in solution quality/time (see the experimental results in
1429: Section~\ref{secEx2}). Since real world problems often lie in between
1430: these extreme cases it is desirable to modify FUSS to cope with simple
1431: problems as well, without destroying its advantages for complex
1432: objective functions.
1433:
1434: %------------------------------%
1435: \subsection{Quadratic slowdown due to recombination}
1436: %------------------------------%
1437: We have seen that $T_{FUSS}\leq|F|\cdot T_{STD}$. In the
1438: presence of recombination, a {\em pair} of individuals has to be
1439: selected. The probability that FUSS selects {\em two} individuals
1440: with fitness $f$ is $\geq{1\over|F|^2}$. Hence, in the worst case,
1441: there could be a slowdown by a factor of $|F|^2$ --- for {\em
1442: independent} selection we expect
1443: $T_{FUSS}\leq|F|^2\cdot T_{STD}$. This potential quadratic
1444: slowdown can be avoided by selecting one fitness value at random,
1445: and then two individuals of this single fitness value. For this
1446: {\em dependent} selection, we expect
1447: $T_{FUSS}\leq|F|\cdot T_{STD}$. On the other hand,
1448: crossing two individuals of different fitness can also be
1449: advantageous, like the crossing of $f=1$ with $f=2$
1450: individuals in the 2D example of Section \ref{secEx}.
1451:
1452: %0012(2ei)
1453: %------------------------------%
1454: \subsection{Scale independent selection}
1455: %------------------------------%
1456: A near optimal compromise is possible: a high selection
1457: probability $p(f)\sim 1$ if $f\approx f_{max}$ and $p(f)\sim
1458: {1\over|F|}$ otherwise. A ``scale
1459: independent'' probability distribution $p(f)\sim{1\over|f_{max}-f|}$
1460: is appropriate for this.
1461: %
1462: We define
1463: \beq\label{ptscale}
1464: p(f) \;:=\; {c\over\ln|F|}\cdot
1465: {1\over {1\over\eps}|f_{max}-f|+1}.
1466: \eeq
1467: The $+1$ in the denominator has been added to regularize the
1468: expression for $f=f_{max}$. The factor $c/\ln|F|$ ensures
1469: correct normalization ($\sum_f p(f)=1$). By using $\ln{b+1\over
1470: a}\leq\sum_{i=a}^b{1\over i}\leq\ln{b\over a-1}$, one can show
1471: that
1472: $
1473: {\ln|F|\over 1+\ln|F|} \leq c \leq 1
1474: $
1475: i.e.\ $c\to 1$ for $|F|\to\infty$. In the following we assume
1476: $|F|\geq 3$, i.e.\ $c\geq \odt$.
1477: Apart from a minor additional logarithmic suppression of order
1478: $\ln|F|$ we have the desired behavior $p(f)\sim 1$
1479: for $f\approx f_{max}$ and $p(f)\sim {1\over|F|}$ otherwise:
1480: \beqn
1481: p(f_{max}-m\eps) \geq {1\over 2\ln|F|} \cdot
1482: {1\over m+1},
1483: \eeqn
1484: \beqn
1485: p(f) \geq {1\over 2\ln|F|} \cdot
1486: {1\over |F|} \quad\forall\,f
1487: \eeqn
1488: During optimization, the minimal/maximal fitness of an individual in
1489: population $P_t$ is $f_{min/max}^t$. In the definition of $p$ one has
1490: to use $F_t:=\{f_{min}^t,f_{min}^t+\eps,...,f_{max}^t\}$ instead of
1491: $F$, i.e.\ $|F|$ replaced with
1492: $|F_t|={1\over\eps}(f_{max}^t-f_{min}^t)+1\leq|F|$. So (\ref{ptscale})
1493: can not be achieved by a static re-parametrization of fitness $f$
1494: replaced with $g(f)$. Furthermore the important idea of sampling from
1495: a fitness level instead of individuals directly is still
1496: maintained. The only difference now is that the population will no
1497: longer converge to a fitness uniform one but to one with distribution
1498: $p(f)$ which is biased toward higher fitness but still never converges
1499: to a fittest individual. In the worst case, we expect a small slowdown
1500: of the order of $\ln|F|$ as compared to FUSS, as well as compared to
1501: STD.
1502:
1503: %------------------------------%
1504: \subsection{Scale independent pair selection}
1505: %------------------------------%
1506: It is possible to (nearly) have the best of independent and
1507: dependent selection: a high selection probability $p(f,f')\sim
1508: {1\over|F|}$ if $f\approx f'$ and $p(f,f')\sim {1\over|F|^2}$
1509: otherwise, with uniform marginal $p(f)={1\over|F|}$. The idea
1510: is to use a strongly correlated joint distribution for selecting a
1511: fitness pair. A ``scale independent'' probability distribution
1512: $p(f,f')\sim{1\over|f-f'|}$ is appropriate. We define the joint
1513: probability $\tilde p(f,f')$ of selecting two individuals of
1514: fitness $f$ and $f'$ and the marginal $\tilde p(f)$ as
1515: \begin{equation} \label{ptjoint}
1516: \tilde p(f,f') \;:=\; {1\over 2|F|\ln|F|}\cdot
1517: {1\over {1\over\eps}|f\!-\!f'|+1},
1518: \end{equation}
1519: \[
1520: \tilde p(f) \;:=\; \sum_{f'\in F}\tilde p(f,f')
1521: = \sum_{f'\in F}\tilde p(f',f).
1522: \]
1523:
1524: We assume $|F|\geq 3$ in the following. The $+1$ in the
1525: denominator has been added to regularize the expression for
1526: $f=f'$. The factor $(2|F|\ln|F|)^{-1}$ ensures correct
1527: normalization for $|F|\to\infty$. More precisely, using
1528: $\ln{b+1\over a}\leq\sum_{i=a}^b{1\over i}\leq\ln{b\over a-1}$,
1529: one can show that
1530: \beqn
1531: 1-{\textstyle{1\over\ln|F|}} \;\leq\;
1532: \sum_{f,f'\in F}\tilde p(f,f') \;\leq\; 1,\quad
1533: \odt \;\leq\; |F|\!\cdot\!\tilde p(f) \;\leq\; 1,
1534: \eeqn
1535: i.e.\ $\tilde p$ is not strictly normalized to $1$ and the
1536: marginal $\tilde p(f)$ is only approximately (within a factor of 2)
1537: uniform. The first defect can be corrected by appropriately
1538: increasing the diagonal probabilities $\tilde p(f,f)$. This also
1539: solves the second problem.
1540: \beq\label{pjoint}
1541: p(f,f') \;:=\; \left\{
1542: \begin{array}{ll}
1543: \tilde p(f,f') & \mbox{for}\quad f\neq f' \\
1544: \tilde p(f,f')+[{1\over|F|}-\tilde p(f)] &
1545: \mbox{for}\quad f=f' \
1546: \end{array}
1547: \right.
1548: \eeq
1549:
1550: %------------------------------%
1551: \subsection{Properties of $p(f,f')$}
1552: %------------------------------%
1553: $p$ is normalized to $1$ with uniform marginal
1554: \[
1555: p(f):= \sum_{f'\in F} p(f,f') = {1\over|F|},
1556: \]
1557: \[
1558: \sum_{f,f'\in F} p(f,f') =
1559: \sum_{f\in F} p(f) = 1,
1560: \]
1561: \[
1562: p(f,f')\geq \tilde p(f,f').
1563: \]
1564: Apart from a minor additional logarithmic suppression of order
1565: $\ln|F|$ we have the desired behavior $p(f,f')\sim {1\over|F|}$
1566: for $f\approx f'$ and $p(f,f')\sim {1\over|F|^2}$ otherwise:
1567: \[
1568: p(f,f\pm m\eps) \geq {1\over 2\ln|F|} \cdot
1569: {1\over m+1} \cdot {1\over|F|},
1570: \]
1571: \[
1572: p(f,f') \geq {1\over 2\ln|F|} \cdot
1573: {1\over |F|^2}.
1574: \]
1575: During optimization, the minimal/maximal fitness of an individual in
1576: population $P_t$ is $f_{min/max}^t$. In the definition of $p$ one has
1577: to use $F_t:=\{f_{min}^t,f_{min}^t+\eps,...,f_{max}^t\}$ instead of
1578: $F$, i.e.\ $|F|$ replaced with
1579: $L:=|F_t|={1\over\eps}(f_{max}^t-f_{min}^t)+1\leq|F|$.
1580:
1581: %------------------------------%
1582: \subsection{Scale-Independent Deletion}
1583: %------------------------------%
1584: Just as the selection scheme FUSS has its dual in the deletion scheme
1585: FUDS, we can likewise create the dual of Scale-Independent Selection
1586: in the form of Scale-Independent Deletion. Thus rather than targeting
1587: deletion from the population so that the distribution becomes flat, as
1588: we do with FUDS, we now define a convex curve $g$ which is peaked at
1589: the fittest individual in the population and delete the population
1590: down so that it follows the shape of this curve. This retains some of
1591: the advantages of FUDS, for example the population cannot collapse to
1592: just a few fitness levels, and yet it recognizes that for many
1593: problems it is useful to bias the population distribution toward fit
1594: individuals. Of course such problems are less deceptive than the kind
1595: that FUSS and FUDS are intended for.
1596:
1597: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1598: \section{Continuous Fitness Functions}\label{secCont}
1599: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1600:
1601: %------------------------------%
1602: \subsection{Effective discretization scale}
1603: %------------------------------%
1604: Up to now we have considered a discrete valued fitness function
1605: with values in $F=\{f_{min},f_{min}+\eps,...,f_{max}\}$.
1606: In many practical problems, the fitness function is continuous
1607: valued with $F=[f_{min},f_{max}]$. We generalize FUSS, and some of
1608: the discussion of the previous sections to the continuous case by
1609: replacing the discretization scale $\eps$ by an effective
1610: (time-dependent) discretization scale $\hat\eps$. By construction, FUSS shifts
1611: the population toward a more uniform one. Although the fitness
1612: values are no longer equi-spaced, they still form a discrete set
1613: for finite population $P$. For a fitness uniform distribution, the
1614: average distance between (fitness) neighboring individuals is
1615: ${1\over|P_t|-1}(f^t_{max}-f^t_{min})=:\hat\eps$. We
1616: define $\hat
1617: F_t:=\{f^t_{min},f^t_{min}+\hat\eps,...,f^t_{max}\}$.
1618: $|\hat F_t| = {1\over\hat\eps}(f^t_{max}-f^t_{min})+1 =
1619: |P_t|$.
1620:
1621: %------------------------------%
1622: \subsection{FUSS}
1623: %------------------------------%
1624: Fitness uniform selection for a continuous valued function has already
1625: been mentioned in Section \ref{secFuss}. We just take a uniform random
1626: fitness $f$ in the interval
1627: $[f_{min}^t-\odt\hat\eps,f_{max}^t+\odt\hat\eps]$.
1628: Independent and dependent fitness pair selection as described in
1629: the last section works analogously. An $\hat\eps=0$ version of
1630: correlated selection does not exist; a non-zero $\hat\eps$ is
1631: important. A discrete pair $(f,f')$ is drawn with probability
1632: $p(f,f')$ as defined in (\ref{ptjoint}) and (\ref{pjoint}) with
1633: $\eps$ and $F$ replaced by $\hat\eps$ and $\hat F_t$. The
1634: additional suppression $\ln|\hat F|=\ln|P_t|$ is small for all
1635: practically realizable population sizes.
1636: In all cases an individual with fitness nearest to $f$ ($f'$) is
1637: selected from the population $P$ (randomly if there is
1638: more than one nearest individual).
1639:
1640: If we assume a fitness uniform distribution then our worst case bound
1641: of $T_{FUSS}\approxleq\sum_{t=1}^{T_{STD}}|P_t|$ is plausible, since
1642: the probability of selecting the best individual is approximately
1643: $|P_t|$. For constant population size we get a bound
1644: $T_{FUSS}\approxleq|P|\cdot T_{STD}$. For the preferred non-deletion
1645: case with population size $|P_t|=t$ the bound gets much worse
1646: $T_{FUSS}\approxleq\odt T_{STD}^2$.
1647: %\subsection{Problems of proportionate selection}
1648: This possible (but not necessary!) slowdown has similarities to
1649: the slowdown problems of proportionate selection in later
1650: optimization stages.
1651: %\subsection{Species definition}
1652: The species definition in Section \ref{secTree} has to be relaxed
1653: by allowing mutation sequences of individuals with
1654: $\hat\eps$-similar fitness.
1655: %\subsection{Larger $\hat\eps$}
1656: Larger choices of $\hat\eps$ may be favorable if the standard
1657: choice causes problems.
1658:
1659: %------------------------------%
1660: \subsection{FUDS}
1661: %------------------------------%
1662: Fitness uniform deletion already requires the range of the fitness
1663: function to be broken up into a finite number of intervals. While for
1664: discrete valued fitness functions the intervals may correspond to the
1665: unique values of the fitness function, this is not a requirement.
1666: Indeed if the population is small and the fitness function has a large
1667: number of possible values then a more coarse discretization is
1668: necessary. Continuous valued fitness functions can therefore be
1669: treated in exactly the same way and do not cause any special problems.
1670: In fact they are slightly simpler in that we are now free to choose
1671: the discretization as fine as we like without being limited by the
1672: number of possible fitness values. Of course, like in the discrete
1673: case, we still must choose a discretization which is appropriate given
1674: the size of the population.
1675:
1676:
1677: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1678: \section{The EA Test System}\label{secJfuss}
1679: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1680:
1681: To test FUSS and FUDS we have implemented an EA test system in Java.
1682: The complete source code along with the test problems presented in
1683: this paper and basic usage instructions can be downloaded from
1684: \cite{Legg:website}. The EA model we have chosen for our tests is the
1685: so called ``steady state'' model as opposed to the more usual
1686: ``generational'' model. In a generational EA in each generation we
1687: select an entirely new population based on the old population. The
1688: old population is then simply discarded. Under the steady state model
1689: that we use, each step of the optimization adds and removes just one
1690: individual at a time. Specifically the process occurs as follows:
1691: Firstly an individual is selected by the
1692: \emph{selection scheme} and then with a certain probability another
1693: individual is also selected and the \emph{crossover operator} is
1694: applied to produce a new individual. Then with another probability a
1695: \emph{mutation operator} is applied to produce the child individual
1696: which is then added to the population. We refer to the probability of
1697: crossing as the \emph{crossover probability} and the probability of
1698: mutating following a crossover as the \emph{mutation probability}. In
1699: the case where no crossover takes place the individual is always
1700: mutated to ensure that we are not simply adding a clone of an existing
1701: individual into the population. Finally, an individual must be
1702: deleted in order to keep the population size constant. This
1703: individual is selected by the \emph{deletion scheme}. The deletion
1704: scheme is important as it has the power to bias the population in a
1705: similar way to the selection scheme.
1706:
1707: Our task in this paper is to experimentally analyze how FUSS performs
1708: relative to other selection schemes and how FUDS performs relative to
1709: other deletion schemes. Because any particular run of a steady state
1710: EA requires both a selection and a deletion scheme to be used, there
1711: are many possible combinations that we could test. We have narrowed
1712: this range of possibilities down to just a few that are commonly used.
1713:
1714: Among the selection schemes, tournament selection is one of the
1715: simplest and most commonly used and we consider it to be roughly
1716: representative of other standard selection schemes which favor the
1717: fitter individuals in the population; indeed in the case of tournament
1718: size 2 it can be shown that tournament selection is equivalent to the
1719: linear ranking selection scheme \cite[Sec.2.2.4]{Hutter:92cfs}. With
1720: tournament selection we randomly pick a group of individuals and then
1721: select the fittest individual from this group. The size of the group
1722: is called the \emph{tournament size} and it is clear that the larger
1723: this group is the more likely we are to select a highly fit individual
1724: from the population. At some point in the future we may implement
1725: other standard selection schemes to broaden our comparison, however we
1726: expect the performance of these schemes to be at best comparable to
1727: tournament selection when used with a correctly tuned selection
1728: intensity.
1729:
1730: Among the deletion schemes one of the most commonly used in steady
1731: state EAs is random deletion. The rational for this is that it is
1732: neutral in the sense that it does not skew the distribution of the
1733: population in any way. Thus whether the population tends toward high
1734: or low fitness etc.\ is solely a function of the selection scheme and
1735: its settings. Of course random deletion, unlike FUDS, makes no effort
1736: to preserve diversity in the population as all individuals have an
1737: equal chance of being removed. In this paper we will compare FUDS
1738: against random deletion as this is the standard deletion schemes in
1739: situations where it is difficult or impossible to directly measure the
1740: similarity of individuals based on their genomes.
1741:
1742: \begin{figure*}[t]
1743: \includegraphics[width=0.485\textwidth]{Deceptive-R.eps}
1744: \includegraphics[width=0.01\textwidth]{space.eps}
1745: \includegraphics[width=0.485\textwidth]{Deceptive-F.eps}
1746: \caption{\label{SimpleProb} With random deletion (left graph) FUSS
1747: significantly outperforms TOURx and RAND. By switching to FUDS (right
1748: graph) the performance of TOURx and RAND now scale the same as FUSS.}
1749: \end{figure*}
1750:
1751: When reporting test results we will adopt the following notation:
1752: TOUR2 means tournament selection with a tournament size of 2.
1753: Similarly for TOUR3, TOUR4 and so on. Under random selection, denoted
1754: RAND, all members of the population have an equal probability of being
1755: selected. This is sometimes called uniform selection. When a graph
1756: shows the performance of tournament selection over a range of
1757: tournament sizes we will simply write TOURx. Naturally FUSS indicates
1758: the fitness uniform selection scheme. To indicate the deletion scheme
1759: used we will add either the suffix \mbox{-R} or \mbox{-F} to indicate
1760: random deletion or FUDS respectively. Thus, TOUR10-R is tournament
1761: selection with a tournament size of 10 used with random deletion,
1762: while FUSS-F is FUSS selection used with FUDS deletion.
1763:
1764: The important free parameters to set for each test are the population
1765: size, and the crossover and mutation probabilities. Good values for
1766: the crossover and mutation probabilities depend on the problem and
1767: must be manually tuned based on experience as there are few
1768: theoretical guidelines on how to do this. For some problems
1769: performance can be quite sensitive to these values while for others
1770: they are less important. Our default values are 0.5 for both as this
1771: has often provided us with reasonable performance in the past.
1772:
1773: For each test we ran the system multiple times with the same mutation
1774: and crossover probabilities and the same population size. The only
1775: difference was which selection and deletion schemes were used by the
1776: code. Thus even if our various parameters, mutation operators etc.\
1777: were not optimal for a given problem, the comparison is still fair.
1778: Indeed we often deliberately set the optimization parameters to
1779: non-optimal values in order to compare the robustness of the systems.
1780:
1781: As a steady state optimizer operates on just one individual at a time,
1782: the number of cycles within a given run can be high, perhaps 100,000
1783: or more. In order to make our results more comparable to a
1784: generational optimizer we divide this number by the size of the
1785: population to give the approximate number of generations.
1786: Unfortunately the theoretical understanding of the relationship
1787: between steady state and generational optimizers is not strong. It
1788: has been shown that under the assumption of no crossover the effective
1789: selection intensity using tournament selection with size 2 is
1790: approximately twice as strong under a steady state EA as it is with a
1791: generational EA \cite{Rogers:99}. As far as we are aware a similar
1792: comparison for systems with crossover has not been performed.
1793:
1794: Depending on the purpose of a test run, different stopping criteria
1795: were applied. For example, in situations where we wanted to graph how
1796: rapidly different strategies converged with respect to generations, it
1797: made sense to fix the number of generations. In other situations we
1798: wanted to stop a run once the optimizer appeared to have become stuck,
1799: that is, when the maximum fitness had not improved after some
1800: specified number of generations. In any case we explain for each test
1801: the stopping criterion that has been used.
1802:
1803: In order to generate reliable statistics we ran each test multiple
1804: times; typically 30 times but sometimes up to 100 times if the results
1805: were noisy. From these runs we then calculated the mean performance
1806: as well as the sample standard deviation and from this the standard
1807: error in our estimate of the mean. This value was then used to
1808: generate the 95\% confidence intervals which appear as error bars on
1809: the graphs.
1810:
1811:
1812: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1813: \section{A Deceptive 2D Problem}\label{secEx2}
1814: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1815:
1816: The first problem we examine is the simple but highly deceptive 2D
1817: optimization problem which was theoretically analyzed in
1818: Section~\ref{secEx}. As in the theoretical analysis, we set up the
1819: mutation operator to randomly replace either the $x$ or $y$ position
1820: of an individual and the crossover to take the $x$ position from one
1821: individual and the $y$ position from another to produce an offspring.
1822: The size of the domain for which the function is maximized is just
1823: $\delta^2$ which is very small for small values of $\delta$, while the
1824: local maxima at fitness level 3 covers most of the space. Clearly the
1825: only way to reach the global maximum is by leaving this local maximum
1826: and exploring the space of individuals with lower fitness values of 1
1827: or 2. Thus, with respect to the mutation and crossover operators we
1828: have defined, this is a deceptive optimization problem as these
1829: partitions mislead the EA \cite{Forrest:93}.
1830:
1831: For this test we set the maximum population size to 1,000 and made 20
1832: runs for each $\delta$ value. With a steady state EA it is usual to
1833: start with a full population of random individuals. However for this
1834: particular problem we reduced the initial population size down to just
1835: 10 in order to avoid the effect of doing a large random search when we
1836: created the initial population and thereby distorting the scaling.
1837: Usually this might create difficulties due to the poor genetic
1838: diversity in the initial population. However due to the fact that any
1839: individual can mutate to any other in just two steps this is not a
1840: problem in this situation. Initial tests indicated that reducing the
1841: crossover probability from 0.5 to 0.25 improved the performance
1842: slightly and so we have used the latter value.
1843:
1844: The first set of results for the selection schemes used with random
1845: deletion appear in the left graph of Figure~\ref{SimpleProb}. As
1846: expected, higher selection intensity is a significant disadvantage for
1847: this problem. Indeed even with just a tournament size of 3 the number
1848: of generations required to find the maximum became infeasible to
1849: compute for smaller values of $\delta$. Our results confirm the
1850: theoretical scaling orders of $1\over\delta^2$ for TOUR2-R, and
1851: $1\over\delta$ for FUSS-R, as predicted in Section~\ref{secEx}. Be
1852: aware that this is a log-log scaled graph and so the different slopes
1853: indicate significantly different orders of scaling.
1854:
1855: In the second set of tests we switch from random deletion to FUDS.
1856: These results appear in the right graph of Figure~\ref{SimpleProb}.
1857: We see that with FUDS as the deletion scheme the scaling improves
1858: dramatically for RAND, TOUR2 and TOUR3. Indeed they are now of the
1859: same order $\frac{1}{\delta}$ as FUSS, as predicted in
1860: Section~\ref{secEx}. This shows that for very deceptive problems much
1861: higher levels of selection intensity can be applied when using FUDS
1862: rather than random deletion. The performance of FUSS-R is very
1863: similar to that of FUSS-F. This is not surprising as the population
1864: distribution under FUSS already tends to be approximately uniform
1865: across fitness levels and thus we expect the effect of FUDS to be
1866: quite weak.
1867:
1868: Although this problem was artificially constructed, the results
1869: clearly demonstrate how FUSS and FUDS can dramatically improve
1870: performance in some situations.
1871:
1872:
1873: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1874: \section{Traveling Salesman Problem}\label{secTSP}
1875: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1876:
1877: \begin{figure*}[t]
1878: \includegraphics[width=0.485\textwidth]{DTSPI-300gen-R.eps}
1879: \includegraphics[width=0.01\textwidth]{space.eps}
1880: \includegraphics[width=0.485\textwidth]{DTSPI-300gen-F.eps}
1881: \caption{\label{DTSP-1} TOUR3-R converged too slowly while TOUR12-R
1882: converged prematurely and became stuck. TOUR6-R appears to be about
1883: the correct tournament size for this problem, however it is still
1884: inferior to FUSS-R. With FUDS all of the selection schemes performed
1885: well though FUSS was still the best.}
1886: \end{figure*}
1887:
1888: A well known optimization problem is the so called Traveling Salesman
1889: Problem (TSP). The task is to find the shortest Hamiltonian cycle
1890: (path) in a graph of $N$ vertexes (cities) connected by edges of
1891: certain lengths. There exist highly specialized population based
1892: optimizers which use advanced mutation and crossover operators and are
1893: capable of finding paths less than one percent longer than the optimal
1894: path for up to $10^7$ cities
1895: \cite{Lin:73,Martin:96,Johnson:97,Applegate:00}. As our goal is only
1896: to study the relative performance of selection and deletion schemes,
1897: having a highly refined implementation is not important. Thus the
1898: mutation and crossover operators we used were quite simple: Mutation
1899: was achieved by just switching the position of two of the cities in
1900: the solution, while for crossover we used the partial mapped crossover
1901: technique \cite{Goldberg:85}. Fitness was computed by taking the
1902: reciprocal of the tour length.
1903:
1904: For our first set of tests we used randomly generated TSP problems,
1905: that is, the distance between any two cities was chosen uniformly from
1906: the unit interval $[0,1]$. We chose this as it is known to be a
1907: particularly deceptive form of the TSP problem as the usual triangle
1908: inequality relation does not hold in general. For example, the
1909: distance between cities $A$ and $B$ might be $0.1$, between cities $B$
1910: and $C$ $0.2$, and yet the distance between $A$ and $C$ might be
1911: $0.8$. The problem still has some structure though as efficient
1912: partial solutions tend to be useful building blocks for efficient
1913: complete tours.
1914:
1915: For this test we used random distance TSP problems with 20 cities and
1916: a population size of 1000. We found that changing the crossover and
1917: mutation probabilities did not improve performance and so these have
1918: been left at their default values of 0.5. Our stopping criterion was
1919: simply to let the EA run for 300 generations as this appeared to be
1920: adequate for all of the methods to converge and allowed us to easily
1921: graph performance versus generations.
1922:
1923: The first graph in Figure~\ref{DTSP-1} shows each of the selection
1924: schemes used with random deletion. We see that TOUR3-R has
1925: insufficient selection intensity for adequate convergence while
1926: TOUR12-R quickly converges to a local optimum and then becomes stuck.
1927: TOUR6-R has about the correct level of selection intensity for this
1928: problem and population size. FUSS-R however initially converges as
1929: rapidly as TOUR12-R but avoids becoming stuck in local optima. This
1930: suggests improved population diversity. The performance curve for
1931: FUSS-R is impressive, especially considering that it is parameterless.
1932:
1933: At first it might seem surprising that the maximum fitness with FUSS
1934: climbs very quickly for the first 20 generations, especially
1935: considering that FUSS makes no attempt to increase the average fitness
1936: in the population. However we can explain this very rapid rise in
1937: solution fitness by considering a simple example. Consider a
1938: situation where there is a large number of individuals in a small band
1939: of fitness levels, say 1,000 with fitness values ranging from 50 to
1940: 70. Add to this population one individual with a fitness value of 73.
1941: Thus the total fitness range contains 24 values. Whenever FUSS picks
1942: a random point from 72 to 73 inclusive this single individual with
1943: maximal fitness will be selected. That is, the probability that the
1944: single fittest individual will be selected is 2/24 = 0.083. In
1945: comparison under TOUR12 the probability that the fittest individual is
1946: selected is the same as the probability that it is picked for the
1947: sample of 12 elements used for the tournament, which is approximately,
1948: 12/1000 = 0.012. Thus the probability of the fittest individual in
1949: the population being selected is higher under FUSS than under TOUR12
1950: and so the maximum fitness would rise quickly to start with.
1951:
1952: Previously in \cite{Legg:04fussexp} we speculated that this may have
1953: been responsible for performance problems that we had observed with
1954: FUSS in some situations. However further experimentation has shown
1955: that very rapid rises in maximal fitness are quite rare and are also
1956: very shortly lived when they do occur --- too short to cause any
1957: significant diversity problems in the population. We now believe that
1958: the population distribution is to blame in these situations; something
1959: that we will explore in detail in Section~\ref{secSAT}.
1960:
1961: \begin{figure*}[t]
1962: \includegraphics[width=0.48\textwidth,height=0.30\textheight]{dtpsi20-s40-p250.eps}
1963: \includegraphics[width=0.02\textwidth]{space.eps}
1964: \includegraphics[width=0.48\textwidth,height=0.30\textheight]{dtpsi20-s40-p500.eps}
1965: \includegraphics[width=1.00\textwidth,height=0.02\textheight]{space.eps}
1966: \includegraphics[width=0.48\textwidth,height=0.30\textheight]{dtpsi20-s40-p1k.eps}
1967: \includegraphics[width=0.02\textwidth]{space.eps}
1968: \includegraphics[width=0.48\textwidth,height=0.30\textheight]{dtpsi20-s40-p5k.eps}
1969: \caption{\label{tsp} The performance of TOURx-F is much more stable
1970: than TOURx-R under variation in the selection intensity. Also both
1971: FUSS-R and FUSS-F produce very good results, especially with the
1972: larger populations.}
1973: \end{figure*}
1974:
1975: The second graph in Figure~\ref{DTSP-1} shows the same set of
1976: selection schemes but now using FUDS as the deletion scheme. With
1977: FUDS the performance of all of the selection schemes either stayed the
1978: same or improved. In the case of TOUR3 the improvement was dramatic
1979: and for TOUR12 the improvement was also quite significant. This is
1980: interesting because it shows that with fitness uniform deletion,
1981: performance can improve when the selection intensity is either too
1982: high or too low. That is, when using FUDS the performance of the EA
1983: now appears to be more robust with respect to variation in selection
1984: intensity.
1985:
1986: In the case of TOUR12-F this is evidence of improved population
1987: diversity as the EA is no longer becoming stuck. However for TOUR3-R
1988: the selection intensity is quite low and thus we would expect the
1989: population diversity to be relatively good. Thus the fact that
1990: TOUR3-F was so much better than TOUR3-R suggests that FUDS can have
1991: significant performance benefits that are not related to improved
1992: population diversity.
1993:
1994: Investigating further it seems that this effect is due to the way that
1995: FUDS focuses the deletion on the large mass of individuals which have
1996: an average level of fitness while completely leaving the less common
1997: fit individuals alone. This helps a system with very weak selection
1998: intensity move the mass of the population up through the fitness
1999: space. With higher selection intensity this problem tends not to
2000: occur as individuals in this central mass are less likely to be
2001: selected thus reducing the rate at which new individuals of average
2002: fitness are added to the population.
2003:
2004:
2005: In order to better understand how stable FUDS performance is when used
2006: with different selection intensities we ran another set of tests on
2007: random TSP problems with 20 cities and graphed how performance varied
2008: by tournament size. For these tests we set the EA to stop each run
2009: when no improvement had occurred in 40 generations. We also tested on
2010: a range of population sizes: 250, 500, 1000 and 5000. The results
2011: appear in Figure~\ref{tsp}.
2012:
2013: In these graphs we can now clearly see how the performance of TOURx-R
2014: varies significantly with tournament size. Below the optimal
2015: tournament size performance worsened quickly while above this value it
2016: also worsened, though more slowly. Interestingly, with a population
2017: size of 5000 the optimal tournament size was about 6, while with small
2018: populations the optimal value fell to just 4. Presumably this was
2019: partly because smaller populations have lower diversity and thus
2020: cannot withstand as much selection intensity.
2021:
2022: In contrast FUSS-R and FUSS-F appear as horizontal lines as they do
2023: not have a tournament size parameter. We see that they have performed
2024: as well as the optimal performance of TOURx-R without requiring any
2025: tuning. Indeed for larger populations FUSS-R appears to be even
2026: better than the optimally tuned performance of TOURx-R. This is a
2027: very positive result for the parameterless FUSS.
2028:
2029: Comparing FUDS with random deletion we also see impressive results.
2030: For every combination of selection scheme, tournament size and
2031: population size the result with FUDS was better than the corresponding
2032: result with random deletion, and in some cases much better.
2033: Furthermore these graphs clearly display the improved robustness of
2034: tournament selection with FUDS as TOURx-F produced near optimal
2035: results for all tournament sizes. Even with an optimally tuned
2036: tournament size FUDS increased performance, particularly with the
2037: smaller populations. Indeed for each population size tested the worst
2038: performance of TOURx-F was equal to the best performance of
2039: \mbox{TOURx-R}.
2040:
2041: With FUSS there was also a performance advantage when using FUDS,
2042: again more so with the smaller populations. The combination of both
2043: FUSS and FUDS was especially effective as can be seen by the
2044: consistently superior performance of FUSS-F across all of the graphs.
2045:
2046: More tests were run exploring performance with up to 100 cities.
2047: Although the performance of FUDS remained stronger than random
2048: deletion for very low selection intensity, for high selection
2049: intensity the two were equal. We believe that the reason for this is
2050: the following: When the space of potential solutions is very large
2051: finding anything close to a global optimum is practically impossible,
2052: indeed it is difficult to even find the top of a reasonable local
2053: optimum as the space has so many dimensions. In these situations it
2054: is more important to put effort into simply climbing in the space
2055: rather than spreading out and trying to thoroughly explore. Thus
2056: higher selection intensity can be an advantage for large problem
2057: spaces. At any rate, for large problems and with high selection
2058: intensity FUDS did not appear to hinder the performance, while with
2059: low selection intensity it continued to significantly improve it.
2060:
2061: \begin{figure*}[t]
2062: \includegraphics[width=0.485\textwidth]{SCPI42-p250.eps}
2063: \includegraphics[width=0.01\textwidth]{space.eps}
2064: \includegraphics[width=0.485\textwidth]{SCPI42-p500.eps}
2065: \includegraphics[width=1.00\textwidth,height=0.01\textheight]{space.eps}
2066: \includegraphics[width=0.485\textwidth]{SCPI42-p1k.eps}
2067: \includegraphics[width=0.01\textwidth]{space.eps}
2068: \includegraphics[width=0.485\textwidth]{SCPI42-p5k.eps}
2069: \caption{\label{scp-unbal} The performance of FUSS for the two smaller
2070: populations was relatively poor, while for the larger populations it
2071: matched the optimal performance of TOURx-R. FUDS again produced
2072: superior results to random deletion in all situations tested.}
2073: \end{figure*}
2074:
2075: Experiments were also performed using the more efficient ``2-Opt''
2076: mutation operator. As expected, this increased performance and allowed
2077: much higher selection pressure to be used. Of course the problem then
2078: no longer had the kind of deceptive structure that heavily punishes
2079: high selection pressure that we are looking for. Nevertheless, FUDS
2080: continued to significantly boost the performance of tournament
2081: selection, in particular when the tournament size was too small.
2082:
2083:
2084: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2085: \section{Set Covering Problem}\label{secSetCover}
2086: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2087:
2088: The set covering problem (SCP) is a reasonably well known NP-complete
2089: optimization problem with many real world applications. Let $M \in
2090: \{0,1\}^{m \times n}$ be a binary valued matrix and let $c_j > 0$ for
2091: $j \in \{1, \ldots n \}$ be the cost of column $j$. The goal is to
2092: find a subset of the columns such that the cost is minimized. Define
2093: $x_j = 1$ if column $j$ is in our solution and 0 otherwise. We can
2094: then express the cost of this solution as $\sum_{j=1}^n c_j x_j$
2095: subject to the condition that $\sum_{j=1}^n m_{ij} x_j \geq 1$ for $i
2096: \in \{1, \ldots m\}$.
2097:
2098: Our system of representation, mutation operators and crossover follow
2099: that used by Beasley \cite{Beasley:96} and we compute the fitness by
2100: taking the reciprocal of the cost. The results presented here are
2101: based on the ``scp42'' problem from a standard collection of SCP
2102: problems \cite{Beasley:03}. The results obtained on other problems in
2103: this test set were similar. We found that increasing the crossover
2104: probability and reducing the mutation probability improved
2105: performance, especially when the selection intensity was low. Thus we
2106: have tested the system with a crossover probability of 0.8 and a
2107: mutation probability of 0.2. We performed each test at least 50 times
2108: in order to minimize the error bars. Our stopping criterion was to
2109: terminate each run after no improvement in minimal cost
2110: had occurred
2111: for 40 generations. The results for this test appear in
2112: Figure~\ref{scp-unbal}.
2113:
2114: \begin{figure*}[t]
2115: \includegraphics[width=0.485\textwidth]{CNF150-p500.eps}
2116: \includegraphics[width=0.01\textwidth]{space.eps}
2117: \includegraphics[width=0.485\textwidth]{CNF150-p5k.eps}
2118: \caption{\label{cnf-all}With low selection intensity TOURx-F performed
2119: slightly below TOURx-R, but was otherwise comparable. FUSS had
2120: serious difficulties.}
2121: \end{figure*}
2122:
2123:
2124: Similar to the TSP graphs we again see the importance of correctly
2125: tuning the tournament size with TOURx-R. We also see the optimal
2126: range of performance for TOURx-R moving to the right as the population
2127: sizes increases. This is what we would expect due to the greater
2128: diversity in larger populations. This kind of variability is one of
2129: the reasons why the selection intensity parameter usually has to be
2130: determined by experimentation.
2131:
2132: Unlike with TSP however, the performance of FUSS was less convincing
2133: in these results. With the smaller populations of 250 and 500 FUSS-R
2134: was only better than TOURx-R when the tournament size was very low or
2135: very high. With the larger populations of 1,000 and 5,000 the results
2136: were much better with FUSS-R performing as well as the optimal
2137: performance of TOURx-R. FUSS-F performed better than FUSS-R, in
2138: particular with the smaller populations though this improvement was
2139: still insufficient for it to match the optimal performance of TOURx-R
2140: in these cases. The fact that the performance of FUSS varied by
2141: population size suggests that FUSS might be experiencing some kind of
2142: population diversity problem. We will look more carefully at
2143: diversity issues in the next section.
2144:
2145: With FUDS the results were again very impressive. As with the TSP
2146: tests; for all combinations of selection scheme, tournament size and
2147: population size that we tested, the performance with FUDS was superior
2148: to the corresponding performance with random deletion. This was true
2149: even when the tournament size was optimal. While the performance of
2150: TOURx-F did vary significantly with different tournament sizes, the
2151: results were more robust than TOURx-R, especially with the larger
2152: populations. Indeed for the larger two populations we again have a
2153: situation where the worst performance of TOURx-F is equal to the
2154: optimal performance of TOURx-R.
2155:
2156:
2157: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2158: \section{Maximum CNF3 SAT}\label{secSAT}
2159: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2160:
2161: Maximum CNF3 SAT is a well known NP hard optimization problem
2162: \cite{Crescenzi:04} that has been extensively studied. A three
2163: literal conjunctive normal form (CNF) logical equation is a boolean
2164: equation that consists of a conjunction of clauses where each clause
2165: contains a disjunction of three literals. So for example, $(a \lor b
2166: \lor \lnot c) \land ( a \lor \lnot e \lor f)$ is a CNF3 expression.
2167: The goal in the maximum CNF3 SAT problem is to find an instantiation
2168: of the variables such that the maximum number of clauses evaluate to
2169: true. Thus for the above equation if $a = F$, $b = T$, $c = T$, $e =
2170: T$, and $f = F$ then just one clause evaluates to true and thus this
2171: instantiation gets a score of one. Achieving significant results in
2172: this area would be difficult and this is not our aim; we are simply
2173: using this problem as a test to compare selection and deletion
2174: schemes.
2175:
2176: Our test problems have been taken from the SATLIB collection of SAT
2177: benchmark tests \cite{Hoos:00}. The first test was performed on the
2178: full set of 100 instances of randomly generated CNF3 formula with 150
2179: variables and 645 clauses, all of which are known to be satisfiable.
2180: Based on test results the crossover and mutation probabilities were
2181: left at the default values. Our mutation operator simply flips one
2182: boolean variable and the crossover operator forms a new individual by
2183: randomly selecting for each variable which parent's state to take.
2184: Fitness was simply taken to be the number of classes satisfied. Again
2185: we tested across a range of tournament sizes and population sizes.
2186: The results of these tests appear in Figure~\ref{cnf-all}.
2187:
2188: We have shown only the population sizes of 500 and 5,000 as the other
2189: population sizes tested followed the same pattern. Interestingly for
2190: this problem there was no evidence of better performance with FUDS at
2191: higher selection intensities. Nor for that matter was there the
2192: decline in performance with TOURx-R that we have seen elsewhere.
2193: Indeed with random deletion the selection intensity appeared to have
2194: no impact on performance at all. While SAT3 CNF is an NP hard
2195: optimization problem, this lack of dependence of our selection
2196: intensity parameter suggests that it may not have the deceptive
2197: structure that FUSS and FUDS are designed for.
2198:
2199: With low selection intensity FUDS caused performance to fall below
2200: that of random deletion; something that we have not seen before.
2201: Because the advantages of FUDS have been more apparent with low
2202: populations in other test problems, we also tested the system with a
2203: population size of only 150. Unfortunately no interesting changes in
2204: behavior were observed.
2205:
2206: \begin{figure*}[t]
2207: \includegraphics[width=0.485\textwidth]{CNF-TOUR4-R-popDist.eps}
2208: \includegraphics[width=0.01\textwidth]{space.eps}
2209: \includegraphics[width=0.485\textwidth]{CNF-TOUR4-B-popDist.eps}
2210: \includegraphics[width=1.00\textwidth,height=0.01\textheight]{space.eps}
2211: \includegraphics[width=0.485\textwidth]{CNF-FUSS-R-popDist.eps}
2212: \includegraphics[width=0.01\textwidth]{space.eps}
2213: \includegraphics[width=0.485\textwidth]{CNF-FUSS-B-popDist.eps}
2214: \caption{\label{cnf-pop} With TOUR4-R the population collapses to a
2215: narrow band of fitness levels while with TOUR4-F the distribution is
2216: flat. Under FUSS the population spreads out in both directions with
2217: FUSS-F in particular giving an extremely uniform distribution.}
2218: \end{figure*}
2219:
2220: While FUDS had minor difficulties, FUSS had serious problems for all
2221: the population sizes that we tested. We suspected that the uniform
2222: nature of the population distribution that should occur with both FUSS
2223: and FUDS might be to blame as we only expect this to be a benefit for
2224: very deceptive problems which are sensitive to the tuning of the
2225: selection intensity parameter. Thus we ran the EA with a population
2226: of 1000 and graphed the population distribution across the number of
2227: clauses satisfied at the end of the run. We stopped each run when the
2228: EA made no progress in 40 generations. The results of this appear in
2229: Figure~\ref{cnf-pop}.
2230:
2231: The first thing to note is that with TOUR4-R the population collapses
2232: to a narrow band of fitness levels, as expected. With TOUR4-F the
2233: distribution is now uniform, though practically none of the population
2234: satisfies fewer than 550 clauses. The reason for this is quite
2235: simple: While FUDS levels the population distribution out, TOUR4 tends
2236: to select the most fit individuals and thus pushes the population to
2237: the right from its starting point. In contrast, FUSS pushes the
2238: population toward currently unoccupied fitness levels. This results
2239: in the population spreading out in both directions and so the number
2240: of individuals with extremely poor fitness is much higher.
2241:
2242: Given that our goal is to find an instantiation that satisfies all 645
2243: clauses, it is questionable whether having a large percentage of the
2244: population unable to satisfy even 600 clauses is of much benefit.
2245: While the total population diversity under FUSS-F might be very high,
2246: perhaps the kind of diversity that matters the most is the diversity
2247: among the relatively fit individuals in the population. This should
2248: be true for all but the most excessively deceptive problems. By
2249: thinly spreading the population across a very wide range of fitness
2250: levels we actually end up with very few individuals with the kind of
2251: diversity that matters. Of course this depends on the nature of the
2252: problem we are trying to solve and the fitness function that we use.
2253:
2254: \begin{figure*}[t]
2255: \includegraphics[width=0.485\textwidth]{CNF-total-diversity-R.eps}
2256: \includegraphics[width=0.01\textwidth]{space.eps}
2257: \includegraphics[width=0.485\textwidth]{CNF-total-diversity-F.eps}
2258: \includegraphics[width=1.00\textwidth,height=0.01\textheight]{space.eps}
2259: \includegraphics[width=0.485\textwidth]{CNF-top-diversity-R.eps}
2260: \includegraphics[width=0.01\textwidth]{space.eps}
2261: \includegraphics[width=0.485\textwidth]{CNF-top-diversity-F.eps}
2262: \caption{\label{cnf-diver} While the total population diversity is
2263: very strong under FUSS, the diversity among fit individuals is weak.
2264: FUDS improves the total population diversity compared to random
2265: deletion, but has little effect on the diversity among the fit
2266: individuals.}
2267: \end{figure*}
2268:
2269: Fortunately with CNF3 SAT we can directly measure population diversity
2270: by taking the average hamming distance between individuals' genomes.
2271: While this means that the value of the fitness based similarity metric
2272: is questionable for this problem, as more direct methods like crowding
2273: can be applied, it is a useful situation for our analysis as it allows
2274: us to directly measure how effective FUSS and FUDS are at preserving
2275: population diversity. The hope of course is that any positive
2276: benefits that we have seen here will also carry over to problems where
2277: directly measuring the diversity is problematic.
2278:
2279: For the diversity tests we used a population size of 1000 again. For
2280: comparison we used FUSS, TOUR3 and TOUR12 both with random deletion
2281: and with FUDS. In each run we calculated two different statistics:
2282: The average hamming distance between individuals in the whole
2283: population, and the average hamming distance between individuals whose
2284: fitness was no more than 20 below the fittest individual in the
2285: population at the time. These two measurements give us the ``total
2286: population diversity'' and ``high fitness diversity'' graphs in
2287: Figure~\ref{cnf-diver}.
2288:
2289: We graphed these measurements against the solution cost of the fittest
2290: individual rather than the number of generations. This is only fair
2291: because if good solutions are found very quickly then an equally rapid
2292: decline in diversity is acceptable and to be expected. Indeed it is
2293: trivial to come up with a system which always maintains high
2294: population diversity how ever long it runs, but is unlikely to find
2295: any good solutions. The results were averaged over all 100 problems in
2296: the test set. Because the best solution found in each run varied we
2297: have only graphed each curve until such a point where fewer than 50\%
2298: of the runs were able to achieve this level of fitness. Thus the
2299: terminal point at the right of each curve is representative of fairly
2300: typical runs rather than just a few exceptional ones that perhaps
2301: found unusually good solutions by chance.
2302:
2303: The top two graphs in Figure~\ref{cnf-diver} show the total population
2304: diversity. As expected the diversity with TOUR3-R and TOUR12-R
2305: decline steadily as finding better solutions becomes increasingly
2306: difficult and the population tends to collapse into a narrow band of
2307: fitness. As we would expect, the total population diversity with
2308: TOUR3-R is higher than with TOUR12-R. While FUSS-R declines initially
2309: it then stabilizes at around 50 before becoming stuck. As the TOUR3-R
2310: and TOUR12-R curves both extend further to the right, even though the
2311: total population diversity becomes quite low, this show that diversity
2312: problems in the population as a whole are not a significant factor
2313: behind the performance problems with FUSS-R.
2314:
2315: The top right graph shows the same selection schemes, but this time
2316: with FUDS. As expected FUDS has significantly improved the total
2317: population diversity with both TOUR3 and TOUR12, while having little
2318: impact on FUSS which already has a relatively flat population
2319: distribution. As the maximal solution found by TOUR3-F and TOUR12-F
2320: were not better than TOUR3-R and TOUR12-R, this indicates that
2321: improved total population diversity is not a significant factor in the
2322: performance of the EA for this type of optimization problem. That
2323: FUDS has lifted the total diversity for TOUR3 and TOUR12 so that they
2324: are now above FUSS-F, is particularly interesting. This suggests that
2325: while FUSS has high total population diversity, there appears to be
2326: some more subtle effects that are causing the diversity to be lower
2327: than it could be. It may be related to the fact the FUSS sometimes
2328: heavily selects from small groups within the population during the
2329: early stages of the optimization process, as we noted in
2330: Section~\ref{secTSP}. However we are not certain whether this is
2331: occurring in this case.
2332:
2333: On the lower set of graphs we see the diversity among the fitter
2334: individuals in the population; specifically those whose fitness is no
2335: more than 20 below the fittest individual in the population at the
2336: time. On the first graph on the left we see that TOUR3 has
2337: significantly greater diversity than TOUR12 with both deletion
2338: schemes. This is expected as TOUR3 tends to search more evolutionary
2339: paths while TOUR12 just rushes down a few. Disappointingly FUDS does
2340: not appear to have made much difference to the diversity among these
2341: highly fit individuals, though the curves do flatten out a little as
2342: the diversity drops below 30, so perhaps FUDS is having a slight
2343: impact.
2344:
2345: For both FUSS-R and FUSS-F the diversity among the fit individuals was
2346: poor, indeed it was even worse than TOUR12 for both deletion schemes.
2347: Thus, while the total population diversity with FUSS tends to be high,
2348: the diversity among the fittest individuals in the population can be
2349: quite poor. Furthermore, the curves for high fitness diversity all
2350: end once the diversity drops into the 12 to 17 range. As this pattern
2351: was absent from the graphs of total population diversity, this
2352: indicates that it is indeed the diversity among the relatively fit
2353: individuals in the population that most determines when the EA is
2354: going to become stuck.
2355:
2356: In summary, these results show that while FUSS has been successful in
2357: maximizing total population diversity, for problems such as CNF3 SAT
2358: this is not sufficient. It appears to be more important that the EA
2359: maximizes the diversity among those individuals which have higher
2360: fitness and in this regard FUSS is poor, which leads to poor
2361: performance. This is most likely a characteristic of optimization
2362: problems which, while still difficult, are not as deceptive as SCP or
2363: random TSP.
2364:
2365:
2366: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2367: \section{Conclusions and Future Research Directions}\label{secConc}
2368: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2369:
2370: We have addressed the problem of balancing the selection intensity
2371: in EAs, which determines speed versus quality of a solution. We
2372: invented a new fitness uniform selection scheme FUSS. It generates
2373: a selection pressure toward sparsely populated fitness levels.
2374: This property is unique to FUSS as compared to other selection
2375: schemes (STD).
2376: %
2377: It results in the desired high selection pressure toward higher
2378: fitness if there are only a few fit individuals. The selection
2379: pressure is automatically reduced when the number of fit
2380: individuals increases.
2381: %
2382: We motivated FUSS as a scheme which bounds the number of {\em
2383: similar} individuals in a population. We defined a universal
2384: similarity relation solely depending on the fitness, independent
2385: of the problem structure, representation and EA details.
2386: %
2387: We showed analytically by way of a simple example that FUSS can be
2388: much more effective than STD. A joint pair selection scheme for
2389: recombination has been defined.
2390: %
2391: A heuristic worst case analysis of FUSS compared to STD has been
2392: given. For this, the fitness tree model has been defined, which is an
2393: interesting analytic tool in itself.
2394: %
2395: FUSS solves the problem of population takeover and the resulting
2396: loss of genetic diversity of STD, while still generating enough
2397: selection pressure. It does not help in getting a more uniform
2398: distribution within a fitness level.
2399:
2400: We have also invented a related system called FUDS which achieves a
2401: similar effect to FUSS except that it works through deletion rather
2402: than through selection. This means that FUDS shares many of the
2403: important characteristics of FUSS including strong total population
2404: diversity and the impossibility of population collapse. We showed
2405: analytically that for a simple deceptive optimization problem the
2406: performance of STD when used with FUDS scales similarly to FUSS.
2407:
2408: A test system has been constructed and used to evaluate the empirical
2409: performance of both FUSS and FUDS on a range of optimization problems
2410: with different population sizes, mutation probabilities and crossover
2411: probabilities. Their performance has been compared to the more
2412: standard methods of tournament selection and random deletion. For the
2413: artificial deceptive 2D optimization problem and random distance
2414: matrix TSP problems both FUSS and FUDS performed extremely well. For
2415: the deceptive 2D problem they dramatically improved the scaling
2416: exponent in the number of generations needed to find the global
2417: optimum. For the TSP problems FUSS-R performed as well as optimally
2418: tuned TOURx-R for all population sizes, and FUDS caused TOURx to
2419: perform near optimally for all tournament sizes and population sizes.
2420:
2421: With SCP problems with small populations the performance of FUSS-R was
2422: only better than TOURx-R when the tournament size was poorly set. For
2423: populations larger than 1,000 however, FUSS-R continued to perform as
2424: well as the optimal results for TOURx-R. FUDS was again consistently
2425: superior returning better results than random deletion for every
2426: combination of selection scheme, tournament size and population size
2427: tested.
2428:
2429: For CNF3 SAT problems we ran into difficulties however. While FUDS
2430: significantly improved the performance of FUSS, it was inferior to
2431: random deletion for low selection intensities. In other cases the
2432: performance was comparable. FUSS however had serious performance
2433: problems. Further investigations revealed that this appears to be due
2434: to the small number of individuals in the population that have
2435: relatively high fitness when using FUSS. We measured the diversity in
2436: the population and found that while the total population diversity
2437: with FUSS was high, the diversity among the fit individuals was
2438: relatively poor. This produced a serious diversity problem in the
2439: population when combined with the fact that there are relatively few
2440: individuals of high fitness when using FUSS.
2441:
2442: As the performance of TOURx-R was not impacted by high selection
2443: intensity on the CNF3 SAT problem this indicates that this problem
2444: does not have the kind of deceptive nature that harshly punishes
2445: greedy exploration that we were looking for. Perhaps for such
2446: problems a less extreme approach is called for. For example, rather
2447: than trying to spread the population across all fitness levels
2448: uniformly we should instead control the distribution so that it is
2449: biased toward high fitness but never collapses totally as it does with
2450: TOURx-R.
2451:
2452: We have experimented with a deletion scheme which deletes the
2453: population distribution down to a convex curve peaked at the fittest
2454: individual in the population. This is the deletion equivalent of the
2455: scale independent selection scheme described in
2456: Section~\ref{secCross}. Our results thus far indicate that the
2457: performance is equal or slightly superior to random deletion in all
2458: situations. However the dramatic improvements that FUDS has over
2459: random deletion in some cases are now less significant.
2460:
2461: Another possibility is to manipulate the fitness function to
2462: effectively achieve the same thing. For example, we have found that
2463: by taking the fitness to be the reciprocal of the number of
2464: unsatisfied clauses in the CNF3 SAT problem the performance of FUSS
2465: improves significantly, indeed it is then comparable to TOURx.
2466: Perhaps however it would be better to avoid these performance tricks
2467: and instead focus on extremely deceptive problems where high selection
2468: intensity is heavily punished, that is, the kinds of problems that
2469: FUSS and FUDS were specifically designed for.
2470:
2471: %------------------------------%
2472: \subsection{Acknowledgments}
2473: %------------------------------%
2474: This work was supported by SNF grants 2100-67712.02 and 200020-107616.
2475:
2476: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2477: % Bibliography %
2478: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2479:
2480: \begin{small}
2481: \begin{thebibliography}{XXXX}
2482:
2483: \bibitem[ACR00]{Applegate:00}
2484: D.~Applegate, W.~Cook, and A.~Rohe.
2485: \newblock Chained {L}in-{K}ernighan for large traveling salesman problems.
2486: \newblock Technical report, Department of Computational and Applied
2487: Mathematics, Rice University, Houston, TX, 2000.
2488:
2489: \bibitem[Bak85]{Baker:85}
2490: J.~E. Baker.
2491: \newblock Adaptive selection methods for genetic algorithms.
2492: \newblock In {\em Proc. 1st International Conference on Genetic Algorithms and
2493: their Applications}, pages 101--111, Pittsburgh, PA, 1985. Lawrence Erlbaum
2494: Associates.
2495:
2496: \bibitem[BC96]{Beasley:96}
2497: J.~Beasley and P.~Chu.
2498: \newblock A genetic algorithm for the set covering problem.
2499: \newblock {\em European Journal of Operational Research}, 94:392--404, 1996.
2500:
2501: \bibitem[Bea03]{Beasley:03}
2502: J.~Beasley.
2503: \newblock Or-library.
2504: \newblock {\em mscmga.ms.ic.ac.uk/jeb/orlib/scpinfo.html}, 2003.
2505:
2506: \bibitem[BHS91]{Baeck:91}
2507: T.~B{\"a}ck, F.~Hoffmeister, and H.~P. Schwefel.
2508: \newblock A survey of evolution strategies.
2509: \newblock In {\em Proc. 4th International Conference on Genetic Algorithms},
2510: pages 2--9, San Diego, CA, July 1991. Morgan Kaufmann.
2511:
2512: \bibitem[BT95]{Blickle:95a}
2513: T.~Blickle and L.~Thiele.
2514: \newblock A mathematical analysis of tournament selection.
2515: \newblock In {\em Proc. Sixth International Conference on Genetic Algorithms
2516: ({ICGA}'95)}, pages 9--16, San Francisco, California, 1995. Morgan Kaufmann
2517: Publishers.
2518:
2519: \bibitem[BT97]{Blickle:97}
2520: T.~Blickle and L.~Thiele.
2521: \newblock A comparison of selection schemes used in evolutionary algorithms.
2522: \newblock {\em Evolutionary {C}omputation}, 4(4):361--394, 1997.
2523:
2524: \bibitem[Cav70]{Cavicchio:70}
2525: D.~J. Cavicchio.
2526: \newblock {\em Adaptive search using simulated evolution}.
2527: \newblock PhD thesis, Unpublished doctoral dissertation, University of
2528: Michigan, Ann Arbor, 1970.
2529:
2530: \bibitem[CJ91]{Collins:91}
2531: R.~J. Collins and D.~R. Jefferson.
2532: \newblock Selection in massively parallel genetic algorithms.
2533: \newblock In {\em Proc. Fourth International Conference on Genetic Algorithms},
2534: San Mateo, CA, 1991. Morgan Kaufmann Publishers.
2535:
2536: \bibitem[CK03]{Crescenzi:04}
2537: P.~Crescenzi and V.~Kann.
2538: \newblock A compendium of {NP} optimization problems.
2539: \newblock {\em www.nada.kth.se/$\sim$viggo/problemlist/compendium.html}, 2003.
2540:
2541: \bibitem[EHM99]{Eiben:99}
2542: A.~E. Eiben, R.~Hinterding, and Z.~Michalewicz.
2543: \newblock Parameter control in evolutionary algorithms.
2544: \newblock {\em IEEE Transactions on Evolutionary Computation}, 3(2):124--141,
2545: 1999.
2546:
2547: \bibitem[Esh91]{Eshelman:91}
2548: L.~J. Eshelman.
2549: \newblock The {CHC} adaptive search algorithm: How to safe search when engaging
2550: in nontraditional genetic recombination.
2551: \newblock In G.~J.~E. Rawlings, editor, {\em Foundations of genetic
2552: algorithms}, pages 265--283. Morgan Kaufmann, San Mateo, 1991.
2553:
2554: \bibitem[FM93]{Forrest:93}
2555: S.~Forrest and M.~Mitchell.
2556: \newblock What makes a problem hard for a genetic algorithm? {S}ome anomalous
2557: results and their explanation.
2558: \newblock {\em Machine Learning}, 13(2--3):285--319, 1993.
2559:
2560: \bibitem[GA85]{Goldberg:85}
2561: D.~Goldberg and R.~Lingle. Alleles.
2562: \newblock Loci and the traveling salesman problem.
2563: \newblock In {\em Proc. International Conference on Genetic Algorithms and
2564: their Applications}, pages 154--159. Lawrence Erlbaum Associates, 1985.
2565:
2566: \bibitem[GD91]{Goldberg:91}
2567: D.~E. Goldberg and K.~Deb.
2568: \newblock A comparative analysis of selection schemes used in genetic
2569: algorithms.
2570: \newblock In G.~J.~E. Rawlings, editor, {\em Foundations of genetic
2571: algorithms}, pages 69--93. Morgan Kaufmann, San Mateo, 1991.
2572:
2573: \bibitem[Gol89]{Goldberg:89}
2574: D.~E. Goldberg.
2575: \newblock {\em Genetic Algorithms in Search, Optimization, and Machine
2576: Learning}.
2577: \newblock Addison-Wesley, Reading, Mass., 1989.
2578:
2579: \bibitem[GR87]{Goldberg:87}
2580: D.~E. Goldberg and J.~Richardson.
2581: \newblock Genetic algorithms with sharing for multi-modal function
2582: optimization.
2583: \newblock In {\em Proc. 2nd International Conference on Genetic Algorithms and
2584: their Applications}, pages 41--49, Cambridge, MA, July 1987. Lawrence Erlbaum
2585: Associates.
2586:
2587: \bibitem[Her92]{Herdy:92}
2588: M.~Herdy.
2589: \newblock Reproductive isolation as strategy parameter in hierarchically
2590: organized evolution strategies.
2591: \newblock In {\em Parallel problem solving from nature 2}, pages 207--217,
2592: Amsterdam, 1992. North-Holland.
2593:
2594: \bibitem[Hol75]{Holland:75}
2595: John~H. Holland.
2596: \newblock {\em Adpatation in Natural and Artificial Systems}.
2597: \newblock University of Michigan Press, Ann Arbor, MI, 1975.
2598:
2599: \bibitem[HS00]{Hoos:00}
2600: H.~H. Hoos and T.~St{\"u}tzle.
2601: \newblock {SATLIB: An Online Resource for Research on {SAT}}.
2602: \newblock In {\em SAT 2000}, pages 283--292. IOS press, 2000.
2603:
2604: \bibitem[Hut91]{Hutter:92cfs}
2605: M.~Hutter.
2606: \newblock {I}mplementierung eines {K}lassifizierungs-{S}ystems.
2607: \newblock Master's thesis, Theoretische Informatik, TU M{\"u}nchen, 1991.
2608: \newblock 72 pages with C listing, in German,
2609: http://www.idsia.ch/$\sim$marcus/ai/pcfs.htm.
2610:
2611: \bibitem[Hut02]{Hutter:01fuss}
2612: M.~Hutter.
2613: \newblock Fitness uniform selection to preserve genetic diversity.
2614: \newblock In {\em Proc. 2002 Congress on Evolutionary Computation (CEC-2002)},
2615: pages 783--788, Washington D.C, USA, May 2002. IEEE.
2616:
2617: \bibitem[JM97]{Johnson:97}
2618: D.~S. Johnson and A.~McGeoch.
2619: \newblock The traveling salesman problem: {A} case study.
2620: \newblock In E.~H.~L. Aarts and J.~K. Lenstra, editors, {\em Local Search in
2621: Combinatorial Optimization}, Discrete Mathematics and Optimization,
2622: chapter~8, pages 215--310. Wiley-Interscience, Chichester, England, 1997.
2623:
2624: \bibitem[Jon75]{DeJong:75}
2625: {K. de} Jong.
2626: \newblock An analysis of the behavior of a class of genetic adaptive systems.
2627: \newblock {\em Dissertation Abstracts International}, 36(10), 5140B, 1975.
2628:
2629: \bibitem[Leg04]{Legg:website}
2630: S.~Legg.
2631: \newblock Website.
2632: \newblock {\em www.idsia.ch/$\sim$shane}, 2004.
2633:
2634: \bibitem[LH05]{Legg:05fuds}
2635: S.~Legg and M.~Hutter.
2636: \newblock Fitness uniform deletion for robust optimization.
2637: \newblock In {\em Proc. Genetic and Evolutionary Computation Conference
2638: ({GECCO'05})}, pages 1271--1278, Washington, OR, 2005. ACM SigEvo.
2639:
2640: \bibitem[LHK04]{Legg:04fussexp}
2641: S.~Legg, M.~Hutter, and A.~Kumar.
2642: \newblock Tournament versus fitness uniform selection.
2643: \newblock In {\em Proc. 2004 Congress on Evolutionary Computation ({CEC'04})},
2644: pages 2144--2151, Portland, OR, 2004. IEEE.
2645:
2646: \bibitem[LK73]{Lin:73}
2647: S.~Lin and B.~W. Kernighan.
2648: \newblock An effective heuristic for the travelling salesman problem.
2649: \newblock {\em Operations Research}, 21:498--516, 1973.
2650:
2651: \bibitem[MO96]{Martin:96}
2652: O.~Martin and S.~Otto.
2653: \newblock Combining simulated annealing with local search heuristics.
2654: \newblock {\em Annals of Operations Research}, 63:57--75, 1996.
2655:
2656: \bibitem[MSV94]{Muehlenbein:94}
2657: Heinz M{\"u}hlenbein and Dirk Schlierkamp-Voosen.
2658: \newblock The science of breeding and its application to the breeder genetic
2659: algorithm ({BGA}).
2660: \newblock {\em Evolutionary {C}omputation}, 1(4):335--360, 1994.
2661:
2662: \bibitem[MT93]{Maza:93}
2663: {M. de la} Maza and B.~Tidor.
2664: \newblock An analysis of selection procedures with particular attention paid to
2665: proportional and {B}oltzmann selection.
2666: \newblock In {\em Proc. 5th International Conference on Genetic Algorithms},
2667: pages 124--131, San Mateo, CA, USA, 1993. Morgan Kaufmann.
2668:
2669: \bibitem[RPB99a]{Rogers:99gd}
2670: A.~Rogers and A.~Pr{\"u}gel-Bennett.
2671: \newblock Genetic drift in genetic algorithm selection schemes.
2672: \newblock {\em IEEE Transactions on Evolutionary Computation}, 3(4):298--303,
2673: 1999.
2674:
2675: \bibitem[RPB99b]{Rogers:99}
2676: A.~Rogers and A.~Pr{\"u}gel-Bennett.
2677: \newblock Modelling the dynamics of a steady-state genetic algorithm.
2678: \newblock In Wolfgang Banzhaf and Colin Reeves, editors, {\em Foundations of
2679: {G}enetic {A}lgorithms 5}, pages 57--68. Morgan Kaufmann, San Francisco, CA,
2680: 1999.
2681:
2682: \bibitem[Rud00]{Rudolph:00}
2683: G{\"u}nter Rudolph.
2684: \newblock On takeover times in spatially structured populations: array and
2685: ring.
2686: \newblock In K.~K. Lai, O.~Katai, M.~Gen, and B.~Lin, editors, {\em Proceedings
2687: of the Second Asia-Pacific Conference on Genetic Algorithms and Applications
2688: ({APGA}'00)}, pages 144--151, Hong Kong, PR China, 2000. Global-Link
2689: Publishing Company.
2690:
2691: \bibitem[SVM94]{Schlierkamp:94}
2692: D.~Schlierkamp-Voosen and H.~M{\"u}hlenbein.
2693: \newblock Strategy adaptation by competing subpopulations.
2694: \newblock In {\em {P}arallel {P}roblem {S}olving from {N}ature -- {PPSN III}},
2695: pages 199--208, Berlin, 1994. Springer.
2696: \newblock {L}ecture {N}otes in {C}omputer {S}cience 866.
2697:
2698: \bibitem[Whi89]{Whitley:89}
2699: D.~Whitley.
2700: \newblock The {G}{E}{N}{I}{T}{O}{R} algorithm and selection pressure: Why
2701: rank-based allocation of reproductive trials is best.
2702: \newblock In {\em Proc. Third International Conference on Genetic Algorithms
2703: ({ICGA}'89)}, pages 116--123, San Mateo, California, 1989. Morgan Kaufmann
2704: Publishers, Inc.
2705:
2706: \end{thebibliography}
2707: \end{small}
2708:
2709: \vspace{5ex}
2710: \begin{wrapfigure}[9]{l}{0.3\columnwidth}
2711: \vspace{-2.5ex}\includegraphics[width=0.3\columnwidth]{hutter.eps}
2712: \end{wrapfigure}
2713: \noindent{\bf Marcus Hutter}
2714: received the M.Sc.\ degree in computer science and the
2715: Ph.D.\ degree in theoretical particle physics from the (Technical)
2716: University, Munich, Germany, in 1992 and 1995, respectively.
2717:
2718: Thereafter, he developed algorithms in a medical software company
2719: for five years. Since 2000, he has published over 35 research papers
2720: while a Researcher with the Dalle Molle Institute for Artificial
2721: Intelligence (IDSIA), Lugano, Switzerland. He is the
2722: author of {\em Universal Artificial Intelligence} (EATCS: Springer,
2723: 2004). His current interests are centered around reinforcement
2724: learning, algorithmic information theory and statistics, universal
2725: induction schemes, adaptive control theory, and related areas.
2726:
2727: \vspace{5ex}
2728: \begin{wrapfigure}[9]{l}{0.3\columnwidth}
2729: \vspace{-2.5ex}\includegraphics[width=0.3\columnwidth]{legg.eps}
2730: \end{wrapfigure}
2731: \noindent{\bf Shane Legg}
2732: received the B.C.M.S.\ degree in mathematical and computer sciences
2733: from the University of Waikato, Hamilton, New Zealand, in 1996 and
2734: the M.Sc.\ degree in mathematics from Auckland University, Auckland,
2735: New Zealand, in 1997. He is currently working towards the Ph.D.\
2736: degree at the Dalle Molle Institute for Artificial Intelligence
2737: (IDSIA), Lugano, Switzerland.
2738:
2739: After receiving the M.Sc.\ degree in 1997, he then worked in a
2740: number of companies in New Zealand and the United States mainly
2741: focusing on commercial applications of artificial intelligence. His
2742: research is focused on genetic algorithms, complexity theory, and
2743: theoretical models of artificial intelligence.
2744:
2745: \end{document}
2746: