cs0402032/paper.tex
1: \documentclass[11pt]{article}
2: 
3: \usepackage{epsfig}
4: \usepackage{theorem}
5: \usepackage{enumerate}
6: \usepackage{apa-uiuc}
7: \usepackage{amsmath}
8: \usepackage{amssymb}
9: \usepackage{boxedminipage}
10: \usepackage{subfigure}
11: \usepackage{url}
12: 
13: \topmargin -7mm
14: \oddsidemargin 0mm
15: \evensidemargin 0mm
16: \headsep 0mm
17: \headheight 0mm
18: %\textwidth 6.5in
19: \textwidth 6.6in
20: \textheight 9in
21: \parskip 0.3ex
22: 
23: %\newcommand{\mysection}[1]{\vspace*{-0.75ex}\section{#1}\vspace*{-0.85ex}}
24: %\newcommand{\mysubsection}[1]{\vspace*{-0.5ex}\subsection{#1}\vspace{-0.5ex}}
25: 
26: \hyphenation{cross-over}
27: \hyphenation{re-com-bi-na-tion-based}
28: 
29: \begin{document}
30: \begin{titlepage}
31: \oddsidemargin 6mm
32: \vspace*{2.0in}
33: \vspace*{2ex}
34: \begin{center}
35: {\bf Fitness Inheritance in the \\Bayesian Optimization Algorithm}
36:  
37: \addvspace{0.25in}
38: {\bf Martin Pelikan and Kumara Sastry}\\
39: \addvspace{0.4in}
40: IlliGAL Report No. 2004009 \\
41: February 2004 \\
42:  
43: \vspace*{2.8in}
44: Illinois Genetic Algorithms Laboratory \\
45: University of Illinois at Urbana-Champaign \\
46: 117 Transportation Building \\
47: 104 S. Mathews Avenue
48: Urbana, IL 61801 \\
49: Office:  (217) 333-2346\\
50: Fax: (217) 244-5705 \\
51: \end{center}
52: \end{titlepage}
53:  
54: 
55: \title{Fitness Inheritance in the \\Bayesian Optimization Algorithm}
56: 
57: \author{Martin Pelikan\\
58: Dept. of Math. and Computer Science, 320 CCB\\
59: University of Missouri at St. Louis\\
60: 8001 Natural Bridge Rd.,
61: St. Louis, MO 63121\\
62: \url{pelikan@cs.umsl.edu}\\
63: ~\\
64: Kumara Sastry\\
65: Illinois Genetic Algorithms Laboratory, 107 TB\\
66: University of Illinois at Urbana-Champaign\\
67: 104 S. Mathews Ave.
68: Urbana, IL 61801\\
69: \url{kumara@illigal.ge.uiuc.edu}
70: }
71: 
72: %\institute{~\vspace{2ex}\\
73: %~\vspace{2ex}\\\
74: %~\vspace{2ex}\\\
75: %~\vspace{2ex}\\\
76: %~\vspace{2ex}\\
77: %~\\
78: %~}
79: 
80: \maketitle 
81: 
82: %===========================================================================================================
83: 
84: \begin{abstract}
85: This paper describes how fitness inheritance can be used to estimate fitness for a proportion of newly sampled candidate solutions in the Bayesian optimization algorithm (BOA). The goal of estimating fitness for some candidate solutions is to reduce the number of fitness evaluations for problems where fitness evaluation is expensive. Bayesian networks used in BOA to model promising solutions and generate the new ones are extended to allow not only for modeling and sampling candidate solutions, but also for estimating their fitness. The results indicate that fitness inheritance is a promising concept in BOA, because population-sizing requirements for building appropriate models of promising solutions lead to good fitness estimates even if only a small proportion of candidate solutions is evaluated using the actual fitness function. This can lead to a reduction of the number of actual fitness evaluations by a factor of 30 or more.
86: \end{abstract}
87: 
88: %===========================================================================================================
89: 
90: \section{Introduction}
91: To ensure reliable convergence to a global optimum, genetic and evolutionary algorithms (GEAs) must often maintain a large population of candidate solutions for a number of iterations. However, in many real-world problems, fitness evaluation is computationally expensive and evaluating even moderately sized populations of candidate solutions is intractable. For example, fitness evaluation may include a large finite element analysis, it may consist of a complex traffic simulation, or it may require interaction with a human expert. 
92: 
93: This leads to an interesting question: Would it be possible to make GEAs evolve not only the population of candidate solutions, but also a model of fitness, which could be used to evaluate a certain proportion of newly generated candidate solutions (fitness inheritance)? Fortunately, the answer to the above question is positive, and a few studies have been made to support this argument. Methods were proposed for fitness inheritance in the simple genetic algorithm (GA)~\cite{SmithR:94} and the univariate marginal distribution algorithm (UMDA)~\cite{Sastry:01}. In both cases, the results were promising and suggested that fitness inheritance can significantly reduce the number of fitness evaluations. 
94: 
95: The purpose of this paper is to propose a method that uses models of promising solutions developed by the Bayesian optimization algorithm (BOA)~\cite{Pelikan:99a,Pelikan:thesis} to model the fitness landscape and estimate fitness of newly generated candidate solutions. Two types of models are considered: (1) traditional Bayesian networks with full conditional probability tables (CPTs) used in BOA and (2) Bayesian networks with local structures used in BOA with decision graphs (dBOA)~\cite{Pelikan:01a*} and the hierarchical BOA (hBOA)~\cite{Pelikan:01*,Pelikan:03b}. Since the model in BOA captures significant nonlinearities in the fitness landscape, using this model as the basis for developing a model of the fitness landscape seems to be a promising approach. Of course, other methods, such as neural networks or various regression models, could be used instead. The proposed method is examined on BOA with decision trees on three example problems: onemax, concatenated traps of order 4, and concatenated traps of order 5. The results indicate that fitness inheritance is beneficial in BOA even if only less than $1\%$ of candidate solutions are evaluated using the actual fitness function. It turns out that due to the population sizing requirements for creating a correct model of promising solutions, the more fitness inheritance, the better.
96: 
97: The paper starts by discussing BOA and previous fitness inheritance studies. Section~\ref{section-BOA-inheritance} presents the proposed method for fitness inheritance in BOA. Section~\ref{section-experiments} presents and discusses experimental results. Section~\ref{section-conclusions} summarizes and concludes the paper.
98: 
99: %===========================================================================================================
100: 
101: \section{Bayesian optimization algorithm}
102: Probabilistic model-building genetic algorithms (PMBGAs)~\cite{Pelikan:02} replace traditional variation operators of genetic and evolutionary algorithms~\cite{Holland:75a,Goldberg:89d} by building a probabilistic model of promising solutions and sampling the model to generate new candidate solutions. The Bayesian optimization algorithm (BOA)~\cite{Pelikan:99a} uses Bayesian networks to model candidate solutions. 
103: 
104: BOA evolves a population of candidate solutions to the given problem. The first population of candidate solutions is usually generated randomly according to a uniform distribution over all solutions. The population is updated for a number of iterations using two basic operators: (1) selection, and (2) variation. The selection operator selects better solutions at the expense of the worse ones from the current population, yielding a population of promising candidates. The variation operator starts by learning a probabilistic model of the selected solutions that encodes features of these promising solutions and the inherent regularities. Bayesian networks are used to model promising solutions because Bayesian networks are among the most powerful tools for capturing and representing decomposition, which is an inherent feature of most complex real-world systems. The variation operator then proceeds by sampling the probabilistic model to generate new solutions, which are incorporated into the original population. Here, a simple replacement scheme is used where new solutions fully replace the original population. A more detailed description of BOA can be found in~\citeN{Pelikan:thesis}.
105: 
106: The remainder of this section discusses Bayesian networks, which are going to serve as the basis for developing the model of fitness in BOA.
107: 
108: %The remainder of this section describes Bayesian networks that are going to serve as the basis for developing the model of fitness in BOA. It is also discussed how local structures can be used in place of full conditional probability tables to enable more efficient representation of local conditional probability distributions in Bayesian networks.
109: 
110: \subsection{Bayesian Networks}
111: 
112: Bayesian networks (BNs)~\cite{Howard:81,Pearl:88,Buntine:91} are among the most popular graphical models, where statistics, modularity, and graph theory are combined in a practical tool for estimating probability distributions and inference. A Bayesian network is defined by two components: (1) a structure, and (2) parameters. The structure is encoded by a directed acyclic graph with the nodes corresponding to the variables in the modeled data set (in this case, to the positions in solution strings) and the edges corresponding to conditional dependencies. The parameters are represented by a set of conditional probability tables (CPTs) specifying a conditional probability for each variable given any instance of the variables that the variable depends on. 
113: 
114: A Bayesian network encodes a joint probability distribution\index{distribution!joint} given by
115: \begin{equation}
116: \label{eq-joint-distribution}
117: p(X) = \prod_{i=1}^{n}{p(X_i| \Pi_i)},
118: \end{equation}
119: where $X=(X_0,\ldots,X_{n-1})$ is a vector of all the variables in the problem; $\Pi_i$ is the set of parents of $X_i$ (the set of nodes from which there exists an edge to $X_i$); and $p(X_i| \Pi_i)$ is the conditional probability of $X_i$ given its parents $\Pi_i$.
120:                                                                                       
121: A directed edge relates the variables so that in the encoded distribution, the variable corresponding to the terminal  node is conditioned on the variable corresponding to
122: the initial node. More incoming edges into a node result in a conditional probability
123: of the variable with a condition containing all its parents. In addition to encoding dependencies, each Bayesian network encodes a set of independence assumptions. Independence assumptions state that each variable is independent of any of its antecedents in
124: the ancestral ordering, given the values of the variable's parents.
125: 
126: To learn Bayesian networks, a greedy algorithm is usually used for its efficiency and robustness. The greedy algorithm starts with an empty Bayesian network. Each iteration then adds an edge into the network that improves quality of the network the most. Network quality can be measured by any popular scoring metric for Bayesian networks, such as the Bayesian Dirichlet metric with likelihood equivalence (BDe)~\cite{Cooper:92,Heckerman+al:94} or the Bayesian information criterion (BIC)~\cite{Schwarz78a}. The learning is terminated when no more improvement is possible.
127: 
128: \subsection{Conditional probability tables (CPTs)}
129: Conditional probability tables (CPTs) store conditional probabilities $p(X_i|\Pi_i)$ for each variable $X_i$. The number of conditional probabilities for a variable that is conditioned on $k$ parents grows exponentially with $k$. For binary variables, for instance, the number of conditional probabilities is $2^k$, because there are $2^k$ instances of $k$ parents and it is sufficient to store the probability of the variable being $1$ for each such instance. Figure~\ref{figure-decision-tree-and-graph} shows an example CPT for $p(X_1|X_2,X_3,X_4)$.
130: 
131: Nonetheless, the dependencies sometimes also contain regularities. Furthermore, the exponential growth of full CPTs often obstructs the creation of models that are both accurate and efficient. That is why Bayesian networks are often extended with local structures that allow more efficient representation of local conditional probability distributions than full CPTs~\cite{Chickering:97,Friedman:99}. 
132: 
133: \subsection{Decision trees and graphs for conditional probabilities}
134: 
135: %Local structures in Bayesian networks are used to represent local conditional probabilities of each variable more efficiently than traditional CPTs. 
136: Decision trees are among the most flexible and efficient local structures, where conditional probabilities of each variable are stored in one decision tree. Each internal (non-leaf) node in the decision tree for $p(X_i|\Pi_i)$ has a variable from $\Pi_i$ associated with it and the edges connecting the node to its children stand for different values of the variable. For binary variables, there are two edges coming out of each internal node; one edge corresponds to 0, whereas the other edge corresponds to 1. For more than two values, either one edge can be used for each value, or the values may be classified into several categories and each category would create an edge. 
137: 
138: Each path in the decision tree for $p(X_i|\Pi_i)$ that starts in the root of the tree and ends in a leaf encodes a set of constraints on the values of variables in $\Pi_i$. Each leaf stores the value of a conditional probability of $X_i=1$ given the condition specified by the path from the root of the tree to the leaf. A decision tree can encode the full conditional probability table for a variable with $k$ parents if it splits to $2^k$ leaves, each corresponding to a unique condition. However, a decision tree enables more efficient and flexible representation of local conditional distributions. See Figure~\ref{figure-decision-tree-and-graph}b for an example decision tree for the conditional probability table presented earlier.
139: 
140: A decision graph allows more edges to terminate in a single node. In other words, internal nodes in the decision tree are allowed to share children and, as a result, each node can have more than one parent. That makes this representation even more flexible. However, our experience indicates that, in BOA, decision graphs usually do not provide better performance than decision trees. See Figure~\ref{figure-decision-tree-and-graph}c for an example decision graph.
141: 
142: To learn Bayesian networks with decision trees, a decision tree for each variable $X_i$ is initialized to an empty tree with a univariate probability of $X_i=1$. In each iteration, each leaf of each decision tree is split to determine how quality of the current network improves by executing the split, and the best split is performed. The learning is finished when no splits improve the current network anymore. Quality of each model can be estimated using any popular scoring metric. Here we use a combination of the BDe~\cite{Cooper:92,Heckerman+al:94} and BIC~\cite{Schwarz78a} metrics, where the BDe score is penalized with the number of bits required to encode parameters~\cite{Pelikan:thesis}. For decision graphs, a merge operation is introduced to allow for merging two leaves of any (but always the same) decision graph.
143: 
144: \begin{figure}[t]
145: \begin{center}
146:   \subfigure[Conditional probability table.]{~~~~~~~~~~~~~\epsfig{file=cpt.eps,height=1.6in}~~~~~~~~~~}\hspace{0in}
147: \subfigure[Decision tree.]{\epsfig{file=decision-tree.eps,height=1.6in}}\hspace{0in}
148: \subfigure[Decision graph.]{~~~~~~~\epsfig{file=decision-graph.eps,height=1.6in}}
149: \end{center}
150: \vspace*{-3ex}
151: \caption{A conditional probability table for $p(X_1|X_2,X_3,X_4)$ using traditional representation (a) as well as local structures (b and c).}
152: \vspace*{-2ex}
153: \label{figure-decision-tree-and-graph}
154: \end{figure}
155: 
156: \section{Previous fitness inheritance studies}
157: 
158: %Fitness inheritance is important because  However, for many problems evaluation of candidate solutions is time consuming and it can take minutes, hours, or even days, to evaluate each solution. Since there are many candidate solutions processed by practically any genetic and evolutionary algorithm, it might be a good idea to develop a model of the evaluation function and evaluate a proportion of candidate solutions given this model. 
159: 
160: %Although the idea of modeling the evaluation function or fitness sounds promising and may be the only way to approach many challenging problems, only few researchers published work in this area. 
161: 
162: Despite the importance of fitness inheritance in robust population-based search, surprisingly few studies of fitness inheritance can be found. This section reviews the most important studies.
163: 
164: \subsection{Fitness inheritance in the simple GA}
165: 
166: \citeN{SmithR:94} proposed two approaches to fitness inheritance in the simple GA~\cite{Goldberg:89d}. The first approach is to compute the fitness of an offspring as the average fitness of its parents. The second approach is to consider a weighted average based on how similar the offspring is to each parent. The results indicated that GAs with fitness inheritance outperformed those without inheritance. However, the above study of fitness inheritance did not consider the effects of fitness inheritance on crucial GA parameters such as the population size and the number of generations. As a result, the speed-up achieved by using fitness inheritance could not be estimated properly. 
167: 
168: \citeN{Zheng:97} used the aforementioned fitness inheritance model in the simple GA for design of vector quantization codebooks. 
169: 
170: \subsection{Fitness inheritance in PMBGAs}
171: \citeN{Sastry:01} considered the univariate marginal distribution algorithm (UMDA), which is one of the simplest PMBGAs. Using fitness inheritance in UMDA introduces new challenges, because UMDA does not use two-parent recombination and therefore it is difficult to find direct correspondence between parents and their offspring. Instead, Sastry et al. extend the probabilistic model to allow for estimating fitness of newly sampled candidate solutions. 
172: 
173: UMDA models the population of promising solutions after selection using the probability vector, which stores the probability of a 1 at each position. These probabilities are then used to sample new candidate solutions. To incorporate fitness inheritance, the probability vector $p=(p_1, p_2, \ldots, p_n)$ is extended to include additional two statistics $\bar{f}(X_i=0)$ and $\bar{f}(X_i=1)$ for each string position $i$. The term $\bar{f}(X_i=0)$ denotes the average fitness  of all solutions where the $i$th bit is $0$; analogously, the term $\bar{f}(X_i=1)$ denotes the average fitness of solutions with the $i$th bit equal to $1$. The fitness of each new solution can then estimated as
174: \begin{equation}
175: \label{eq-umda-estimate}
176: f_{est}(X_1, X_2, \ldots, X_n) = \bar{f} + \sum_{i=1}^n \left( \bar{f}(X_i) - \bar{f} \right),
177: \end{equation}
178: where $\bar{f}$ is the average fitness of all solutions used to estimate the fitness.
179: 
180: \shortciteN{Sastry:01} also developed theory for fitness inheritance in UMDA on onemax that estimates the number of actual fitness evaluations when a given proportion of candidate solutions inherits fitness, whereas the remaining candidate solutions are evaluated using the actual fitness. The basic idea is to start by adapting the population sizing and time-to-convergence models to UMDA with fitness inheritance, and relate these quantities to their counterparts in standard UMDA. If optimal population size is used in both cases, Sastry et al. showed that only about $20\%$ evaluations can be saved. However, if the same population size is used in both cases, the number of evaluations decreases by a factor of more than three.
181: 
182: %===========================================================================================================
183: 
184: \section{Modeling fitness in BOA}
185: \label{section-BOA-inheritance}
186: 
187: This section describes how the fitness model is built and updated using Bayesian networks, and how new candidate solutions can be evaluated using the model. Both Bayesian networks with full CPTs as well as the ones with local structures are discussed. The section also discusses where the statistics can be acquired from to built an accurate fitness model.
188: 
189: \subsection{Modeling fitness using Bayesian networks}
190: 
191: In UMDA, probabilities of a $1$ at each position that form the  probability vector are each coupled with an average fitness of a 0 and a 1 at that position. Analogically, Bayesian networks can be extended to incorporate an average fitness of a 0 and a 1 for each statistic stored by the model. 
192: 
193: In BOA, for every variable $X_i$ and each possible value $x_i$ of $X_i$, an average fitness of solutions with $X_i=x_i$ must be stored for each instance $\pi_i$ of $X_i$'s parents $\Pi_i$. In the binary case, each row in the conditional probability table is thus extended by two additional entries. Figure~\ref{figure-decision-tree-and-graph-with-inheritance}a shows an example conditional probability table extended with fitness information based on the conditional probability table presented in Figure~\ref{figure-decision-tree-and-graph}a. The fitness can then be estimated as
194: \begin{equation}
195: f_{est}(X_1, X_2, \ldots, X_n) = \bar{f} + \sum_{i=1}^n \left( \bar{f}(X_i|\Pi_i) - \bar{f}(\Pi_i) \right),
196: \end{equation}
197: where $\bar{f}(X_i|\Pi_i)$ denotes the average fitness of solutions with $X_i$ and $\Pi_i$, and $\bar{f}(\Pi_i)$ is the average fitness of all solutions with $\Pi_i$. Clearly,
198: \begin{equation}
199: \bar{f}(\Pi_i) = \sum_{X_i} p(X_i|\Pi_i) \bar{f}(X_i|\Pi_i).
200: \end{equation}
201: %where the sum runs over all possible values of $X_i$.
202: 
203: \subsection{Modeling fitness using Bayesian networks with decision graphs}
204: 
205: A similar method as for full CPTs can be used to incorporate fitness information into Bayesian networks with decision trees or graphs. The average fitness of each instance of each variable must be stored in every leaf of a decision tree or graph. Figure~\ref{figure-decision-tree-and-graph-with-inheritance} shows an example decision tree and graph extended with fitness information based on the decision tree and graph presented earlier in Figure~\ref{figure-decision-tree-and-graph}. The fitness averages in each leaf are restricted to solutions that satisfy the condition specified by the path from the root of the tree to the leaf. 
206: 
207: \begin{figure}[t]
208: \begin{center}
209:   \subfigure[Conditional probability table.]{~~~~\epsfig{file=cpt_ext2.eps,height=1.6in}~~~}
210: \subfigure[Decision tree.]{~~\epsfig{file=decision-tree_ext.eps,height=1.6in}~}\hspace{0in}
211: \subfigure[Decision graph.]{~~~~~\epsfig{file=decision-graph_ext.eps,height=1.6in}~~~~}
212: \end{center}
213: \vspace*{-3ex}
214: \caption{Fitness inheritance in a conditional probability table for $p(X_1|X_2,X_3,X_4)$ (a) and its representation using local structures (b and c).}
215: \label{figure-decision-tree-and-graph-with-inheritance}
216: \vspace*{-2ex}
217: \end{figure}
218: 
219: \subsection{Where to inherit fitness from?}
220: 
221: We still have not faced the following question: Where to obtain information to compute statistics used for fitness inheritance? More specifically, for each instance $x_i$ of $X_i$ and each instance $\pi_i$ of $X_i$'s parents $\Pi_i$, we must compute the average fitness of all solutions with $X_i=x_i$ and $\Pi_i=\pi_i$. Here we use two sources for computing the fitness-inheritance statistics:
222: 
223: \vspace{-1.95ex}
224: \begin{enumerate}
225: \item Selected parents that were evaluated using the actual fitness function, and 
226: %(other parents are excluded), and 
227: \item the offspring that were evaluated the actual fitness function.
228: \end{enumerate}
229: 
230: The reason for restricting computation of fitness-inheritance statistics to selected parents and offspring is that the probabilistic model used as the basis for selecting relevant statistics represents nonlinearities in the population of parents and the population of offspring. Since it is best to maximize learning data available, it seems natural to use these two populations to compute the fitness-inheritance statistics. The reason for restricting input for computing these statistics to solutions that were evaluated using the actual fitness function is that the fitness of other solutions was estimated only and it involves errors that could mislead fitness inheritance and propagate through generations. Both using only those solutions that were evaluated using the actual fitness function and incorporating the offspring in estimating inheritance statistics differs from previous fitness inheritance studies~\cite{SmithR:94,Sastry:01}.
231: 
232: %===========================================================================================================
233: 
234: \section{Experiments}
235: \label{section-experiments}
236: 
237: This section describes experiments and provides experimental results. Test problems are described first. Next, experimental results are presented and discussed.
238: 
239: \subsection{Onemax}
240: Onemax is a simple linear function that computes the sum of bits in the input binary string:
241: \begin{equation}
242: f_{onemax}(X_1, X_2, \ldots, X_n) = \sum_{i=1}^n X_i,
243: \end{equation}
244: where $(X_1, X_2, \ldots, X_n)$ denotes the input binary string of $n$ bits. In onemax, the fitness contribution of each bit is independent of its context. That is why a simple model used in UMDA that considers each variable independently of other variables suffices and yields convergence to the optimum in about $O(n\log n)$ evaluations. However, any other models of bounded complexity should work well, and practically any crossover operator used in standard GAs should also suffice. 
245: 
246: In the model of fitness developed by BOA, the average fitness of a $1$ in any leaf should be approximately $0.5$, whereas the average fitness of a $0$ in any leaf should be approximately $-0.5$. As a result, solutions will get penalized for $0$s, while they would be rewarded for $1$s. The average fitness will vary throughout the run. This paper considers onemax of $n=50$ bits.
247: 
248: \subsection{Concatenated 4-bit trap}
249: 
250: In concatenated 4-bit traps~\cite{Ackley:87b,Deb:94b}, the input string is first partitioned into independent groups of $4$ bits each. This partitioning should be unknown to the algorithm, but it should not change during the run. A 4-bit trap function is applied to each group of 4 bits and the contributions of all traps are added together to form the fitness. Each 4-bit trap is defined as follows:
251: \begin{equation}
252: trap_4(u) = 
253: \left\{
254: \begin{array}{ll}
255: 4 & \mbox{~~if $u=4$} \\
256: 3-u & \mbox{~~otherwise}
257: \end{array}
258: \right.,
259: \end{equation}
260: where $u$ is the number of $1$s in the input string of $4$ bits. An important feature of traps is that in each of the 4-bit traps, all 4 bits must be treated together, because all statistics of lower order lead the algorithm away from the optimum. That is why most crossover operators as well as the model in UMDA will fail at solving this problem faster than in exponential number of evaluations, which is just as bad as blind search. 
261: 
262: %If we restrict ourselves to leaves that encode three copies of a 1 in a trap corresponding to $X_i$, then $\bar{f}(X_i=0)\leq\bar{f}(X_i=1)$ should hold. If we consider leaves that encode at least one copy of a 0 in a trap corresponding to $X_i$, then $\bar{f}(X_i=0)\geq\bar{f}(X_i=1)$ should hold. Otherwise, the model does not encode appropriate dependencies and it is unlikely that a solution will be found. 
263: Unlike in onemax, $\bar{f}(X_i=0)$ and $\bar{f}(X_i=1)$ depend on the state of the search because the distribution of contexts of each bit changes over time and bits in a trap are not independent. The context of each leaf also determines whether  $\bar{f}(X_i=0)<\bar{f}(X_i=1)$ or $\bar{f}(X_i=0)>\bar{f}(X_i=1)$ in the leaf. This paper considers a trap consisting of 10 copies of the 4-bit trap, where the total number of bits is $n=40$. 
264: 
265: \subsection{Concatenated 5-bit trap}
266: Concatenated traps of order 5 can be defined analogically to traps of order 4, but instead of dealing with groups of 4 bits, groups of 5 bits are considered. The contribution of each group of $5$ bits is computed as
267: \begin{equation}
268: trap_5(u) = 
269: \left\{
270: \begin{array}{ll}
271: 5 & \mbox{~~if $u=5$} \\
272: 4-u & \mbox{~~otherwise}
273: \end{array}
274: \right.,
275: \end{equation}
276: where $u$ is the number of $1$s in the input string of $5$ bits. Traps of order 5 also necessitate that all bits in each group are treated together, because statistics of lower order are misleading. 
277: 
278: %Similarly as for traps of order 5, in leaves that encode four copies of a 1 in a trap corresponding to $X_i$, $\bar{f}(X_i=0)\leq\bar{f}(X_i=1)$ should hold. On the other hand, for leaves that encode at least one copy of a 0 in a trap corresponding to $X_i$, $\bar{f}(X_i=0)\geq\bar{f}(X_i=1)$ should hold. Otherwise, the model does not encode appropriate dependencies and it is unlikely that a solution will be found. Unlike in onemax, the actual values of $\bar{f}(X_i=0)$ and $\bar{f}(X_i=1)$ depend on the state of the search because the distribution of contexts of each bit changes over time and bits in a trap are not independent. 
279: Average fitness values $\bar{f}(X_i)$ depend on context similarly as for traps of order 4, and they thus follow similar dynamics. This paper considers a trap consisting of 10 copies of the 5-bit trap, where the total number of bits is $n=50$. 
280: 
281: \subsection{Experimental results}
282: 
283: On each test problem, the following fitness inheritance proportions were considered: $0$ to $0.9$ with step $0.1$, $0.91$ to $0.99$ with step $0.01$, and $0.991$ to $0.999$ with step $0.001$. For each test problem and fitness inheritance proportion, 30 independent experiments were performed. Each experiment consisted of 10 independent runs with the minimum population size to ensure convergence to a solution within $10\%$ of the optimum (i.e., with at least $90\%$ correct bits) in all 10 runs. For each experiment, bisection method was used to determine the minimum population size, and the number of evaluations (excl. the evaluations done using the model of fitness) was recorded. The average of 10 runs in all experiments was then computed and displayed as a function of the proportion of candidate solutions for which fitness was estimated using the fitness model. Speed-up is also computed, which is equal to the factor by which the number of evaluations decreases compared to the case with no inheritance.
284: %Each point thus represents an average of $300$ runs that found a solution that is at most $10\%$ from the optimum.
285: 
286: The results on onemax, traps of order 4, and traps of order 5, are shown in figures~\ref{figure-results-onemax}, \ref{figure-results-trap4}, and~\ref{figure-results-trap5}. In all experiments, the number of actual fitness evaluations decreases with the inheritance proportion and it reaches the optimum when the proportion of candidate solutions for fitness inheritance is more than $99\%$. That means that considering only the actual fitness evaluations, evaluating less than $1\%$ of candidate solutions with the actual fitness seems to be beneficial. The number of evaluations of the actual fitness can be decreased by a factor of more than 31 for onemax, 32 for the trap of order 4, and 53 for the trap of order 5. Although the actual savings depend on the problem considered, it can be expected that fitness inheritance enables significant reduction of fitness evaluations on many problems because deceptive problems of bounded difficulty bound a large class of important problems.
287: 
288: Considering only the actual fitness evaluations ignores time complexity of selection, model construction, generation of new candidate solutions, and fitness estimation. Combining these factors with the complexity estimate for the actual fitness evaluation can be used to compute the optimal proportion of candidate solutions to evaluate using fitness inheritance. Nonetheless, the results presented in this paper clearly indicate that using fitness inheritance in BOA can reduce the number of solutions that must be evaluated using the actual fitness function by a factor of 30 or more. Consequently, if fitness evaluation is a bottleneck, there is a lot of space for improvement using fitness inheritance in BOA. 
289: 
290: \begin{figure}[t]
291: \begin{center}
292: \subfigure[Number of evaluations.]{\epsfig{file=fi-onemax-50-2.eps,width=2.25in}}
293: \hspace{2ex}
294: \subfigure[Speed-up.]{\epsfig{file=speedup-onemax.eps,width=2.25in}}
295: \end{center}
296: %\vspace*{-5ex}
297: \caption{Results on a 50-bit onemax.}
298: \label{figure-results-onemax}
299: %\vspace*{-3ex}
300: \end{figure}
301: 
302: \begin{figure}[t]
303: \begin{center}
304: \subfigure[Number of evaluations.]{\epsfig{file=fi-trap4-40-2.eps,width=2.25in}}
305: \hspace{2ex}
306: \subfigure[Speed-up.]{\epsfig{file=speedup-trap4.eps,width=2.25in}}
307: \end{center}
308: %\vspace*{-5ex}
309: \caption{Results on a concatenated trap consisting of 10 traps of order 4.}
310: \label{figure-results-trap4}
311: %\vspace*{-3ex}
312: \end{figure}
313: 
314: \begin{figure}[t]
315: \begin{center}
316: \subfigure[Number of evaluations.]{\epsfig{file=fi-trap5-50-2.eps,width=2.25in}}
317: \hspace{2ex}
318: \subfigure[Speed-up.]{\epsfig{file=speedup-trap5.eps,width=2.25in}}
319: \end{center}
320: %\vspace*{-5ex}
321: \caption{Results on a concatenated trap consisting of 10 traps of order 5.}
322: \label{figure-results-trap5}
323: %\vspace*{-3ex}
324: \end{figure}
325: 
326: %\begin{figure*}[t]
327: %\begin{center}
328: %\subfigure[A 40-bit trap of order 4.]
329: %{\epsfig{file=fi-trap4-40-2.eps,width=2.05in}~~~}
330: %\subfigure[A 50-bit trap of order 5.]
331: %{~~~\epsfig{file=fi-trap5-50-2.eps,width=2.05in}}
332: %\end{center}
333: %\vspace*{-5ex}
334: %\caption{Results on concatenated traps of order 4 and 5.}
335: %\label{figure-results-traps}
336: %\vspace*{-3ex}
337: %\end{figure*}
338: 
339: %\begin{figure}[t]
340: %\begin{center}
341: %\epsfig{file=fi-trap4-40-2.eps,width=2.05in}
342: %\end{center}
343: %\caption{Results on a 40-bit concatenated trap of order 4.}
344: %\label{figure-results-trap4}
345: %\end{figure}
346: 
347: %\begin{figure}[t]
348: %\begin{center}
349: %\epsfig{file=fi-trap5-50-2.eps,width=2.05in}
350: %\end{center}
351: %\caption{Results on a 50-bit concatenated trap of order 5.}
352: %\label{figure-results-trap5}
353: %\end{figure}
354: 
355: %===========================================================================================================
356: 
357: \section{Summary and conclusions}
358: \label{section-conclusions}
359: \vspace*{-1ex}
360: Fitness inheritance enables genetic and evolutionary algorithms to evaluate only a certain proportion of candidate solutions using the actual fitness function, while the fitness of remaining solutions is computed using a model of the fitness landscape updated on the fly. Using fitness models that can be updated and used efficiently can significantly speed up solution to problems where fitness evaluation is computationally expensive. 
361: 
362: This paper showed that while fitness inheritance yields only moderate speed-ups of about $20\%$ in simple GAs and UMDA, in BOA the benefits of using fitness inheritance become more significant. Due to rather large population-sizing requirements for creating an adequate probabilistic model of promising solutions in BOA, the number of actual function evaluations decreases even if less than $1\%$ of candidate solutions are evaluated using the actual fitness function, while the fitness of the remaining solutions is estimated using only its model. That is an important result, because BOA and other advanced PMBGAs often require large populations, and evaluating large populations can become intractable for problems with computationally expensive fitness evaluation. 
363: 
364: Increasing the proportion of candidate solutions evaluated using a fitness model results in greater population-sizing requirements, and the optimal inheritance proportion depends on the complexity of building and sampling the model of promising solutions as well as that of evaluating solutions using the actual fitness function. The good news is that the more complex the evaluation function, the higher proportions of candidate solutions can be evaluated using the model of fitness instead of the actual fitness function. 
365: 
366: An important topic for future work is to incorporate fitness inheritance in presence of niching, which can lead to accumulation of candidate solutions whose fitness is overestimated. Resolving this problem would enable the use of fitness inheritance in the hierarchical BOA (hBOA)~\cite{Pelikan:01*,Pelikan:03b}, which combines BOA with local structures and niching. Another important topic is to develop theory that extends theoretical work on fitness inheritance in UMDA to BOA and other competent GAs. Finally, it is important to apply the proposed fitness inheritance model to solve challenging real-world problems with expensive fitness evaluation. 
367: 
368: \vspace*{-1ex}
369: \section*{Acknowledgments}
370: \vspace*{-1ex}
371: The authors would like to thank David E. Goldberg for discussions and comments. A part of this work was supported by the Research Award at the University of Missouri at St. Louis and the Research Board at the University of Missouri. Most experiments were done on Asgard cluster at the Institute of Theoretical Physics at the Swiss Federal Institute of Technology (ETH) Z\"{u}rich. The hBOA software, used by Pelikan, was developed by Martin Pelikan and David E. Goldberg at the University of Illinois at Urbana-Champaign. 
372: 
373: \vspace*{-2ex}
374: \bibliographystyle{apa-uiuc}
375: \bibliography{mybib}
376: 
377: \end{document}
378: