q-bio0511051/main.tex
1: \documentclass[11pt]{article}
2: \usepackage{epsfig,geometry} % see geometry.pdf on how to lay out the page. There's lots.
3: \geometry{a4paper} % or letter or a5paper or ... etc
4: 
5: % \geometry{landscape} % rotated page geometry
6: 
7: % See the ``Article customise'' template for come common customisations
8: 
9: \title{Quasispecies and recombination}
10: \author{Martin Nilsson Jacobi\footnote{{\tt mjacobi@chalmers.se}} and Mats Nordahl \\
11: 		Chalmers University of Technology\\
12: 		Gothenburg, Sweden.}
13: 
14: %%% BEGIN DOCUMENT
15: \begin{document}
16: 
17: \maketitle
18: 
19: \begin{abstract}
20: 
21: Recombination is introduced into Eigen's
22: theory of quasispecies evolution. Comparing numerical simulations 
23: of the rate equations in the
24: non-recombining and recombining cases show that 
25: recombination has a strong
26: effect on the error threshold and, for a wide range of mutation rates,
27: gives rise to two stable fixed points in the dynamics. This bi-stability
28: results in the existence of two error thresholds. We prove that, 
29: under some assumptions on the fitness landscape but for general crossover probability, 
30: a fixed point localized about the sequence with superior fitness is globally
31: stable for low mutation rates.  
32: 
33: \end{abstract}
34: 
35: \section{Introduction}
36: 
37: \label{introduction}
38: 
39: The quasispecies concept was introduced by Eigen in 1971~\cite{Eigen71}
40: to describe populations of self-replicating molecules.
41: A quasispecies is an equilibrium distribution of closely related gene 
42: sequences, localized in sequence space around one or a few sequences 
43: of high fitness. The quasispecies model can be viewed as a simple
44: framework that contains all the basic ingredients of Darwinian evolution. 
45: In particular, it captures the critical relation
46: between mutation rate and information transmission~\cite{Eigen71,Eigen77}.
47: The behavior of these systems has been extensively studied, 
48: see for instance~\cite{Eigen71,Eigen77,Schuster86,Schuster85,Swetina88}. 
49: Quasispecies have also been fruitfully studied using concepts and
50: techniques from statistical physics, see, e.g.,
51: \cite{Leuthausser86,Tarazona92,AF98}.
52: 
53: In the quasispecies model, the population dynamics is described
54: on the gene level, and a fitness landscape~\cite{Wright} is used to 
55: define the degree of adaptation directly from the gene sequence. 
56: Considerable amounts of work has gone into defining models of
57: rugged landscapes and analyzing their consequences for the
58: evolutionary dynamics (e.g.~\cite{Kauffman87,Palmer91,Fontana93,Macken91,Stadler95a}).
59: 
60: In this paper we introduce recombination into the quasispecies
61: model. With some exceptions (see, e.g., \cite{Boerlijst,OH98,Stadler96,Feldman}) 
62: previous work on quasispecies has only considered 
63: non-recombining populations where variation is created only by
64: mutation. However, most species in nature use crossover during replication, at least to some degree, which makes
65: this an important case to study. 
66: Besides applications to evolutionary biology,
67: developing an understanding for the dynamics of systems under
68: recombination is also important for gaining theoretical
69: insights into the behavior of genetic algorithms \cite{Holland75} in 
70: combinatorial optimization problems. 
71: 
72: Recombination introduces a non-linearity in the rate equations,
73: which in general results in the appearance of two stable fixed points.
74: For a wide range of mutation rates this divides the space of initial 
75: distributions into two regions: one where the population converges to
76: a distribution localized around the 
77: genome with highest fitness, and another where it converges to 
78: an approximately uniform distribution. 
79: This behavior is qualitatively different from that 
80: of non-recombining populations. Another interesting observation 
81: is the shift in the error threshold. 
82: 
83: The main contribution of the paper is a proof that, for a class of
84: fitness landscapes (see Section~\ref{singlefix} for details),
85: independent of the crossover probability, there exist exactly 
86: one globally stabile fixed point. The single peaked fitness landscape
87: is a special case that belongs to this class.
88: 
89: The rest of this paper is organized as follows:
90: 
91: Section~\ref{quasi} gives a short review of quasispecies evolving 
92: under mutation only, for comparison with the recombination case.
93: In section~\ref{recombination}, we introduce the rate equations for quasispecies
94: with mutation and recombination, and formulate a condition for 
95: the equilibrium distribution
96: as a generalized non-linear eigenvalue problem.
97:  
98: Section~\ref{num} contains results from numerical simulations of the rate 
99: equations for a recombining population. We demonstrate how the equilibrium distribution changes
100: with mutation rate for different initial distributions. 
101: As in the non-recombining case, a phase transition from a localized to
102: a uniform distribution occurs
103: when the mutation rate is increased. The dependence of the phase
104: transition point on the initial distribution is investigated.
105: 
106: In section~\ref{singlefix} we prove that, under some assumptions on the fitness landscape
107: but without constraint on the crossover,
108: when the mutation rate is low enough all initial distributions converge
109: to a fixed point localized around the genome with highest fitness. 
110: Finally, section~\ref{discussion} contains a discussion and conclusions.
111: 
112: \section{Quasispecies }
113: 
114: \label{quasi}
115: 
116: In this section we give a short review of relevant results
117: for quasispecies with non-recombining replication~\cite{Eigen71,Eigen77},
118: to allow us to compare with the results when recombination is included.
119: In the model, a self-replicating molecule is represented by a sequence of  
120: bases $s_k$, $\left( s_1 s_2 \cdots s_n \right)$. The bases are assumed 
121: to be binary $\{ 0, 1 \}$, 
122: and all sequences have equal length $n$. A genome is then
123: given by a binary string $\left( 011001 \cdots \right)$, which also 
124: can be represented by an integer $k$ ($0 \leq k < 2^n$). 
125: The space of all gene sequences in the model is called 
126: sequence space~\cite{Maynard70}. A quasi-species is defined as a  
127: distribution of sequences localized in sequence space. 
128: 
129: Selection in the quasispecies
130: model is expressed in terms of a fitness landscape,
131: which is a function of the phenotype and the environment.
132: The environment describes direct interactions with other organisms 
133: as well as the physical environment. 
134: In the quasispecies model we assume that the phenotype is directly 
135: determined by the genotype. There is no direct interaction between 
136: individuals in the population, only indirect competition for resources.
137: The fitness landscape can then be expressed as a function of the genotype only.
138: In the following, we only consider a simple landscape
139: with a single sequence of high fitness $A_0$, called the master sequence,
140: and with all other sequences $i$ having equal fitness $A_i < A_0$. 
141:  
142: Mutations are described by $Q_k ^l$,
143: the probability that replication of genome $l$  gives genome
144: $k$ as offspring. If the mutation rate per base, $p_m= 1 - q$, 
145: where $q$ is the copying accuracy per base, is assumed to be
146: constant in time and independent of position in the genome,
147: we obtain
148: \begin{eqnarray}
149:     Q_k ^i & = & p_m ^{h_{k i}} q ^{n - h_{k i}} = q ^n 
150:     \left( \frac{1-q}{q} \right) ^{h_{k i}} \label{eq1}
151: \end{eqnarray}
152: where $h_{k i}$ is the Hamming distance between genomes
153: $k$ and $i$. 
154: 
155: The rate equations that describe the dynamics of the population
156: are then given by (where $x_k$ denotes the relative concentration
157: of species $k$): 
158: \begin{eqnarray}
159:      \dot{x} _k & = & \sum _l Q_k ^l A_l x_l - e x_k 
160:              \label{eq2}
161: \end{eqnarray}
162: where $e  =  \sum _l A_l x_l$.
163: The second term ensures 
164: the total normalization of the population ($\sum _l x_l = 1$).
165: 
166: These differential equations can be solved analytically~\cite{Jones,Thomson}.
167: Equation (\ref{eq2}) can be made linear through a change of variables 
168: and we can then use standard techniques to find $x_k$.  If all the elements
169: of the matrix $Q_k ^l$ are strictly positive, $x_k$
170: always converges to a unique stable fixed point~\cite{Bellman},
171: given by the eigenvector
172: corresponding to the largest eigenvalue $\l = e$ of the matrix
173: $Q_k ^l A_l$.
174: 
175: For a landscape where the fitness only depends on the Hamming distance
176: from the master sequence, we can divide sequence space into error classes 
177: containing sequences with the same number of ones. 
178: The effective dimension of the system of equations
179: (\ref{eq2}) can then be reduced from $2^n$ to $n+1$ by summing over
180: error classes. In this way we obtain the new equations
181: 
182: \begin{eqnarray}
183:      \dot{x}_K & = & \sum _L \tilde{Q}_K ^L A_L x_L - E x_K
184:     \label{eq4}
185: \end{eqnarray}
186: where the indices $K$ and $L$ denote error classes, and
187: $\tilde{Q}_K ^L$ describes mutation probabilities between 
188: error classes rather than sequences.
189: 
190: We now consider a fitness landscape with $A_0 = 10$, and
191: $A_L = 1$ for all $ L \neq 0$. The sequences are indexed by their 
192: Hamming distance from the master sequence. The equilibrium distributions
193: corresponding to different mutation rates, $p_m$,
194: are shown in figure~\ref{plotmut50}. There is a sharp 
195: transition between a state where the population is localized around
196: the master sequence $x_0$ and a state where the
197: distribution is approximately binomial. This is the error
198: catastrophe (or error threshold) of Eigen and coworkers.\\
199: \\
200: Fig.~\ref{plotmut50} here.
201: \\
202: 
203: 
204: The error catastrophe occurs
205: approximately when $q ^n A _0 / A_i = 1$, or
206: when the selective advantage of the master sequence, $A_0 / A_i$, is
207: compensated by the finite probability $q^n < 1$ for the master 
208: sequence to replicate to itself.  
209: 
210: This observation is important for theories of prebiotic evolution 
211: of life. When polynucleotides replicate without replicase enzymes, the copying
212: fidelity is unlikely to exceed 0.99, which means that $n$ cannot be larger than
213: 100~\cite{Eigen71}. This is much smaller than coding regions for replicase enzymes, 
214: which are needed to increase the copying fidelity. This contradiction is
215: often called Eigen's paradox. There have been several different attempts to resolve this
216: problem, such as hyper-cycles~\cite{Eigen77}.
217: 
218: 
219: In the following sections we consider quasispecies where both recombination 
220: and mutation can occur during
221: replication. The introduction of recombination will cause major changes
222: in the population dynamics. As an example, we observe that the rate 
223: equations have multiple stable fixed points. The error threshold also
224: also significantly shifted.
225: 
226: \section{Recombination}
227: 
228: \label{recombination}
229: 
230: The crossover operator, $T_k ^{l m}$, denotes the
231: probability that parents $l$ and $m$ give rise to the offspring $k$ in one
232: recombination event~\cite{Boerlijst,Stadler96}. 
233: The crossover operator $T_k ^{l m}$ depends on the 
234: crossover probability $p_c \in [ 0 , 0.5 ]$, i.e., the probability per base pair
235: for the reading process to switch from one parent to the other.
236: As an example, $p_c = 0.5$ (uniform crossover) means that each position in the
237: genome is chosen with equal probability from each parent. Another
238: extreme case is $p_c = 0$ which means that the offspring inherits all
239: its genome from a single randomly chosen parent.
240: 
241: The crossover operator has the following properties
242: 
243: \begin{eqnarray}
244:   &&  0 \leq  T_k ^{l m} \leq 1 \label{eq5} \\
245:   &&  \sum _k T_k ^{l m} =  1 \:\:\:\:  \forall l,m \label{eq6}  \label{eq7} 
246: \end{eqnarray}
247: For uniform crossover we can write $T_k^{l m}$ explicitly as 
248: \begin{eqnarray}
249:         T_k^{l m} & = & \left\{ \begin{array}{lcl} 2^{- h_{l m}} & \mbox{if} & O(k,l,m) = 1 \\
250:                                                 0 & \mbox{if} & O(k,l,m) = 0 \end{array} \right.
251: \end{eqnarray}
252: where $O(k,l,m) = 1$ if at each position where the parents genome $l$ and $m$ are identical,
253: the same base also appears in the child genome $k$, else $O(k,l,m) = 0$. New genes can only be created by mutations.
254: 
255: The most realistic and interesting population dynamics involves both recombination
256: and mutations. In our model we have only recombining individuals and the point
257: mutations will come in as limited reading accuracy in the crossover process. We have
258:  chosen to let the number of offsprings depend on both parents. 
259: The rate equations for a population of sequences which both recombine
260: and mutate are then given by
261: 
262: \begin{eqnarray}
263:      \dot{x}_k & = & \sum _{l m} V _k ^{l m} A_l x_l A_m x_m - c x_k \label{eq8}
264: \end{eqnarray}
265: where $V _K ^{l m} = \sum _i Q_k ^i T_i ^{l m}$ and $c = \left( \sum _l A_l x_l\right)^2$ (which
266: is used to normalize the total growth as before).
267: 
268: The rate equations in the case of recombination are in general much harder
269: to analyze than in the case of pure mutations.  The crossover
270: operator acts on pairs of sequences, which gives rise to a non-linearity in 
271: the growth term. We are mainly interested
272: in the equilibrium distribution, i.e., the concentration of sequences after long time.
273: In the pure mutation case the stable equilibrium distribution could be calculated
274:  by solving a standard eigenvalue problem. When recombination is used
275: the fixed points of the rate equations (\ref{eq8}), $\vec{y}$, are solutions to
276: the generalized eigenvalue problem:
277: 
278: \begin{eqnarray}
279:     \sum _{l m} V _k ^{l m} A_l y_l A_m y_m & = & \lambda y_k \;\;\; \forall k \label{eq9}
280: \end{eqnarray}
281: 
282: All normalized ($\sum _l y_l = 1$) solutions 
283: to (\ref{eq9})  are also fixed points to the rate equations, since summing over $k$ gives the 
284: relation $\lambda = \left( \sum _l A_l y_l \right) ^2 = c$. There may however exist solutions to equation
285: (\ref{eq9}) which cannot be normalized to a vector of concentrations, since all elements
286: must be non-negative.
287: 
288: In general there exists more than one solution to (\ref{eq9}) which can be normalized
289: to a concentration vector. It turns out that these multiple fixed points can be stable,
290: see section~\ref{num}.
291: One of the most important differences between the non-recombining and the recombining case is 
292: in fact the uniqueness of the equilibrium distribution. 
293: As we will see in section~\ref{num} the equilibrium distribution of the rate 
294: equation (\ref{eq8})
295: depends on the initial distribution
296: (as was previously observed in other models, e.g.~\cite{Feldman}). 
297: This behavior is very different 
298: from the pure mutation
299: case, where all initial distributions converge to a unique stable fixed point,
300: as discussed in section~\ref{quasi}.
301: 
302: However, in Section~\ref{singlefix} we present a proof that in the zero
303: mutation rate limit, the only globally stabile fixedpoint corresponds
304: to a population totally localized on the fitness peak.
305: 
306: 
307: The dimension of  sequence space scales exponentially with the number of bases
308: in the genome.  In the non-recombining case we saw
309:  that the degrees
310: of freedom in the rate equations (\ref{eq8}) could be reduced from $2^n$ to $n+1$
311: by dividing the sequences into
312: error classes. This symmetry is in general broken by recombination (see
313: figure~\ref{brokensym}). 
314: The only  non trivial case when the rate equation
315: (\ref{eq8}) preserves the symmetry between the error classes,  is when $p_c = 0.5$
316: (uniform crossover). In this case we can write the reduced rate equations as
317: 
318: \begin{eqnarray}
319:      \dot{x}_K & = & \sum _{L,\:M} \tilde{V} _K ^{L M} A_L x_L A_M x_M - C x_K \label{eq18}
320: \end{eqnarray}
321: where we use the same notation as in equation (\ref{eq4}).
322: For $p_c=0.5$ and $p_m = 0$ the transition probabilities between error-classes 
323: $\tilde{V}_K ^{LM}$ are given by 
324: \begin{eqnarray}
325:       \tilde{V}_K^{L M} & = & \frac{\sum_{d=|M - L|}^{M+L+2\min(n-L-M,0)}
326:        \left( \begin{array}{c} L \\ 
327:                 \min(l,m) - \frac{2 d - | l-m |}{2} 
328:                 \end{array}\right)}
329:         {\left( \begin{array}{c} n \\ M \end{array} \right)}
330: \end{eqnarray}
331: 
332: 
333: In the more realistic case when $p_c < 0.5$, we either have to be satisfied with rather
334: small genome sizes or need to use some approximation method.\\
335: \\
336: Fig.~\ref{brokensym} here.
337: \\
338: 
339: \section{Numerical Results}
340: \label{num}
341: 
342: Fig.~\ref{numplot1} here.
343: \\
344: 
345: In this section we present results from computer simulations of the rate 
346: equations (\ref{eq8}). We concentrate on the asymptotic behavior as time goes
347: to infinity, and do not consider detailed  dynamics of the transients.
348: Equilibrium distributions are obtained  by a straight-forward simulation
349:  of the differential equations. All the simulations 
350: in this section  use uniform crossover ($p_c = 0.5$), 
351: which preserves the error class symmetry. 
352: 
353: We now consider a fitness landscape with an isolated peak ($A_0 = 10$, and $A_L = 1$ 
354: $\forall L \neq 0$). The equilibrium distributions for recombining and non-recombining populations 
355: are presented in figure~\ref{numplot1}, where the initial distribution is
356: binomial over the error classes. The phase transition between the localized and 
357: non-localized state is extremely sharp in the recombination case. The phase transition
358: occurs at a mutation rate which is orders of magnitude lower 
359: than in the non-recombining population.      
360: 
361: Figure~\ref{numplot2} shows the equilibrium distribution of recombination dynamics 
362: with the same fitness landscape as  figure~\ref{numplot1}; the only difference 
363: is the initial distribution which is completely localized to the master sequence
364: ($x_0 = 1$, and $x_K = 0$ $\forall K \neq 0$). We see that the equilibrium distributions
365: depend strongly on the initial distributions. The error threshold 
366: is still lower than in the pure mutation case, however the difference is
367:  much smaller. In general recombination
368: in single peak fitness landscapes tends to mix the gene sequences and push the
369: population above the error threshold.\\
370: \\
371: Fig.~\ref{numplot2} here.
372: \\
373: 
374: Figure~\ref{numplot25} and~\ref{numplot3} show how the equilibrium distributions and the 
375: phase transition point varies with the initial distribution. The initial distributions are given by 
376: 
377: \begin{eqnarray}
378: x_k (s ) & = & \frac{ 2^{-s \cdot k} \left( \begin{array}{c} N \\ k \end{array} \right)}
379:                 {\left( 1 + 2^{-s} \right) ^N}
380: \label{init}
381: \end{eqnarray} 
382: This gives a uniform distribution for $s =0$ and 
383: a distribution concentrated to the master-sequence for large $s$.
384: The graphs in figure~\ref{dist} show the initial 
385: distributions for some discrete 
386:  parameter values, $s = 0 , 1 , \cdots ,5$. Figure~\ref{numplot25} shows that there are two different
387: regions in the space of initial distributions, converging to two different fixed points.
388: In one corner of this space 
389: all the genomes are master-sequences. If the concentration vector starts out far from this corner
390: it will not converge into the corner unless the mutation rate is extremely low 
391: (as illustrated in figure~\ref{numplot1} 
392: or by the case of $s \in [ 0,1 ] $ in figure~\ref{numplot3}). If the initial distribution 
393: starts near the corner it will converge 
394: into the corner for much larger mutation rates (see figure~\ref{numplot2} or the region
395: $s \in [ 3 , 5 ] $ in 
396: figure~\ref{numplot3}). Figure~\ref{numplot3} shows the 
397: location of the phase transition point for different
398: initial distributions defined by equation~\ref{init}. This phase diagram shows how the border between the
399: two regions in figure~\ref{numplot25} changes with mutation rate. A change of $p_m$ from $9 \cdot 10^{-6}$ to
400: $0.055$, changes the border from $s =1$ to $3$. When the mutation rate is too low or too high only
401: one region exists corresponding to a single stable fixed-point. 
402: 
403: That there is an upper bound on the mutation rate where a stable localized fixed point ceases to exist
404: is obvious. The existence of a lower bound, below which all initial distributions converge to a 
405: localized distribution, is however non-trivial. This lower bound always exists and we will
406: present a proof of this in section~\ref{singlefix}.\\
407: \\
408: Fig.~\ref{dist} here.\\
409: Fig.~\ref{numplot25} here.\\
410: Fig.~\ref{numplot3} here.\\
411: 
412: The main conclusion to be drawn from these numerical simulations is that, for a wide
413: range of mutation rates, one finds a coexistence of two different equilibrium distributions
414: to the rate equations involving both recombination and point mutations. Which of these 
415: fixed points the population will converge to depends on the initial distribution. This means that
416: the space of initial distributions consists of two regions, with the border between this regions 
417: depending on the mutation rate. The whole range of mutation rates where a localized fixed point
418: exists is however lower than the phase transition point in the non-recombining case. This shows that
419: a recombining population is more sensitive to mutation than a non-recombining one on a
420: single peak landscape. Similar conclusions have been reached in a simpler model by
421: Bergman and Feldman~\cite{Feldman}. Similar results have also been shown in other work, 
422: see e.g.,~\cite{Boerlijst}.
423: 
424: \section{Existence of a single fixed-point at zero mutation rate.}
425: 
426: \label{singlefix}
427: 
428: In this section we investigate the behavior of the rate equations when $p_m \rightarrow 0^+$.
429: In section~\ref{num} it was shown numerically that at very low mutation rates, all initial distributions converge 
430: to a highly localized equilibrium distribution. Here we show that this region always 
431: exists for fitness landscapes fulfilling certain assumptions, to be specified below.
432: 
433: The idea behind the proof is to study the dynamics of one position or loci in the genome and sum
434: over all possibilities at the other positions. Let $S ^{ ( N, n , i)}_{\alpha}$ denote all genomes of length
435: $N$ that contain the sequence $\alpha$ starting at position $1 \leq i \leq N-n$, where $\alpha$ is an index coding for genomes 
436: of length $n$. For 
437: example; $S^{ ( 10,2,1 )}_3$ will be all genomes of length $10$ that starts with $( 1 1 )$. We also 
438: introduce the notation $x^{(N)}_k$, where $N$ simply indicates the genome length and 
439: affects decoding of the index $k$. We can now write the rate equations~(\ref{eq8}) as
440: 
441: \begin{eqnarray}
442:         \dot{x}^{(N)}_k & = & \sum _{l , m} V_k^{(N) l m} A_l x^{(N)}_l A_m x^{(N)}_m - 
443:                                \left( \sum _l A_l x_l ^{(N)} \right) ^2 x^{(N)}_k  
444: \end{eqnarray}
445: The crossover operator has the following property
446: 
447: \begin{eqnarray}
448:         \sum _{k \in S^{(N,n,i)}_{\alpha} } T_k ^{l m} & = & T_{\alpha}^{\beta \gamma} \mbox{ for } l \in S^{(N,n,i)}_{\beta} , 
449:                                                          m \in S^{(N,n,i)}_{\gamma}, \forall i
450: \end{eqnarray}
451: where no assumptions on the crossover probability in made.
452: 
453: Since the point mutation operator $Q^{(N) l}_k$ has the same property, so will the combined operator
454: $V^{(N) l m}_k$. We can now use this property and sum the rate equations over all sequences in $S^{(N, 1,i)}_{\alpha}$
455: 
456: \begin{eqnarray}
457:         \sum _{k \in S^{(N , 1,i)}_{\alpha}} \dot{x}^{(N)}_k & = & \sum _{k \in S^{(N , 1,i)}_{\alpha}} \left(
458:                     \sum_{l , m} V_k^{(N) l m} A_l x^{(N)}_l A_m x^{(N)}_m \right. \\ 
459:                   & & \left. -  \left( \sum _l A_l x_l ^{(N)} \right) ^2 x^{(N)}_k \right) \Rightarrow \nonumber \\
460:         \dot{x}^{(1)}_{\alpha} & = & \sum _{\beta , \gamma} V_{\alpha}^{(1) \beta \gamma} \sum _{l \in S^{(N , 1,i)}_{\beta}} A_l x^{(N)} _l 
461:             \sum _{m \in S^{(N , 1,i)}_{\gamma}} A_m x^{(N)} _m \\
462:                      & & - \left( \sum _{\beta} \sum _{l \in S^{(N , 1,i)}_{\beta}} A_l 
463:             x^{(N)} _l \right) ^2 x_{\alpha}^{(1)} \label{one_loci_eq}
464: \end{eqnarray}
465: The following, compact, notation is now introduced:
466: 
467: \begin{eqnarray}
468:         \sum _{ l \in S_{\beta}^{(N , 1,i)}} A_l x^{(N)}_l & = & \left\{ \begin{array}{lcl}  \Delta ^{(i)}_0 & \mbox{ if } & \beta = 0 \\
469:                                    \Delta ^{(i)} _1 & \mbox{ if } & \beta = 1 \end{array} \right.
470: \end{eqnarray}
471: Eq.~\ref{one_loci_eq} simplifies to
472: 
473: \begin{eqnarray}
474:         \dot{x}_0^{(1)} & = & q \left( \Delta ^{(i)} _0 \right) ^2 + \Delta ^{(i)} _0 \Delta ^{(i)} _1 + (1-q) \left( \Delta ^{(i)} _1 \right) ^2 -  
475: 		\left( \Delta ^{(i)} _0 + \Delta ^{(i)} _1 \right) ^2 x_0 ^{(1)} \\
476:         x_1^{(1)} & = & 1 - x_0 ^{(1)}
477: \end{eqnarray}
478: which, in the limit $q \rightarrow 1^-$, simplify to
479: 
480: \begin{eqnarray}
481:         \dot{x}_0 ^{(1)} & = & \left( \Delta ^{(i)} _0 + \Delta ^{(i)} _1 \right) \left( \Delta ^{(i)} _0 x_1^{(1)} - \Delta ^{(i)} _1 x_0 ^{(1)} \right) \nonumber \\
482: 	x_1 ^{(1)} & = & 1 - x_0 ^{(1)} 
483: 	\label{part_result}
484: \end{eqnarray}
485: To continue the following assumption on the fitness landscape is needed:
486: 
487: \begin{eqnarray}
488: 	A _l & \leq & A_m \hspace{0.3cm} \mbox{if} \hspace{0.1cm} l \in S ^{N,1,i} _1 ,  m \in S ^{N,1,i} _0
489: \label{assumption1}
490: \end{eqnarray}
491: We further assume that there exist a gene sequence $M \in S ^{N,1,i} _0$ such that
492: 
493: \begin{eqnarray}
494: 	A _l & < & A_M \hspace{0.2cm} \forall l \in S ^{N,1,i} _1
495: \label{assumption2}
496: \end{eqnarray}
497: These two assumptions mean that no sequences with a zero at position $i$ have a fitness 
498: inferior to any sequence with a one  at this position, and that there exist
499: at least one sequence with with a zero at position $i$ with strictly larger
500: fitness than the sequences with a one at this position. Under these assumptions, 
501: the following inequalities hold
502: 
503: \begin{eqnarray}
504: 	\Delta ^{(i)} _0 & \geq & \Delta ^{(i)} _{0, min} x^{(1)} _0 \nonumber \\
505: 	\Delta ^{(i)} _1 & \leq & \Delta ^{(i)} _{1, max} x^{(1)} _1 
506: \label{estimate}
507: \end{eqnarray}
508: where $\Delta ^{(i)} _{0, min}$ ($ \Delta ^{(i)} _{1, max}$) denotes the minimum (maximum)
509: fitness of the sequences with a $0$ ($1$) at position $i$. We further note that at least one of the
510: inequalities in Eq.~\ref{estimate} is strict unless $x _{M} ^{(N)} =0$ for all $M$ fulfilling 
511: Eq.~\ref{assumption2}. Eq.~\ref{estimate} implies the following estimate
512: 
513: \begin{eqnarray}
514: 	\dot{x}_0 ^{(1)} & \geq & \left( \Delta ^{(i)} _{0, min} -
515: 		\Delta ^{(i)} _{1, max} \right) x^{(1)} _1 x^{(1)} _0
516: \label{result}
517: \end{eqnarray}
518: with equality if and only if  $x _{M} ^{(N)} =0$ for all $M$ fulfilling 
519: Eq.~\ref{assumption2} or $ x^{(1)} _1 =0$ or $x^{(1)} _0 =0$. Note however 
520: that $x_0^{(1)}=0$, $x_1^{(1)}=1$ is an (unstable) fixed-point since no mutations 
521: implies no inventions of new genes.
522: 
523: 
524: From Eq.~\ref{result} it is clear that the rate equations will converge to a state
525: where all sequences has a zero at position $i$.
526:   This fixed point is unstable and it is clear that they cease to exist when
527: the mutation rate is non-zero.
528: 
529: We conclude that all sequences with a one at position $i$ will diminish 
530: after long time, and can therefore be be discarded. We can then search for 
531: a new position such that the remaining half of the fitness landscape 
532: satisfies the assumptions in 
533: Eq.~\ref{assumption1} and~\ref{assumption2}. If this can be repeated (possibly interchanging 
534: the zero and one as being superior, since this choice is arbitrary) until the last
535: position, we conclude that the rate equations converge to a state completely
536: dominated by genomes with the same sequence (which necessarily is a global optimum).
537: Loosely, we may describe such fitness landscapes as having a natural ordering of the
538: importance of its loci. One example of a fitness landscape fulfilling these requirements
539: is a single peaked fitness landscape, describing a degenerate case where the 
540: positions can be chosen arbitrarily.
541: 
542: 
543: \section{Conclusions and discussion}
544: \label{discussion}
545: 
546: We have studied Eigen's quasispecies model extended
547: to include crossover as well as mutations.
548: The numerical simulations of section~\ref{num} show that there are significant changes 
549: in the dynamics of the rate equations because of the non-linearity arising from
550: the introduction of crossover. For a wide range of mutation rates, 
551: two simultaneous stable fixed points
552: exist. One fixed point is concentrated around the master sequence while the other describes 
553: a uniform distribution. For extremely low and rather high mutation frequencies 
554: there is only
555: a single fixed point, corresponding to the localized distribution and the 
556: uniform one, respectively.
557: The mutation frequency at the point where the localized fixed point ceases to 
558: exist is still lower than the error threshold without recombination.
559: 
560: In this paper we prove that, for a class of fitness landscapes having a hierarchical 
561: ordering of the loci in the genome (see Section~\ref{singlefix} for details),
562: a single globally stabile fixed point exist in the limit of zero mutation rate.
563: Since the proof is valid for all crossover probabilities, the only natural 
564: generalization is to expand the class of fitness landscapes. A possible
565: generalization of the technique in Section~\ref{singlefix} could be to prove that; 
566: within larger class of 
567: fitness landscapes, for any point in time, i.e., for any distribution $\vec{x}^{(N)}$,
568: we can always find a position $i$ such that Eq.~\ref{result} is fulfilled.
569: The position $i$ would now depend on the distribution (which changes in time), 
570: not only the fitness landscape which is the case in our proof. Technically however,
571: this generalization is non-trivial since the changing of position with the 
572: distribution makes it complicated to argue that all locus in the global fixedpoint
573: will dominate completely in the infinite time limit.
574: 
575: \bibliographystyle{unsrt}
576: 
577: \bibliography{evolution}
578: 
579: %\begin{thebibliography}{10}
580: 
581: %\bibitem{Eigen71}
582: %M. Eigen, Naturwissenschaften {\bf 58},  465  (1971).
583: 
584: %\bibitem{Eigen77}
585: %M. Eigen and P. Schuster, Naturwissenschaften {\bf 64},  541  (1977).
586: 
587: %\bibitem{Schuster86}
588: %P. Schuster and P.F. Stadler, Physica D {\bf 16}, 100  (1986).
589: 
590: %\bibitem{Schuster85}
591: %P. Schuster and K. Sigmund, Ber. Bunsenges. Phys. Chem. {\bf 89},  668  (1985).
592: 
593: %\bibitem{Swetina88}
594: %J. Swetina and P. Schuster, Bull. Mat. Biol. {\bf 50}, 635, (1988).
595: 
596: %\bibitem{Leuthausser86}
597: %I. Leuth\"ausser, J. Chem. Phys. {\bf 84},  1884  (1986).
598: 
599: %\bibitem{Tarazona92}
600: %P. Tarazona, Phys. Rev. A {\bf 45},  6038  (1992).
601: 
602: %
603: %\bibitem{AF98}
604: %D. Alves and J. Fontanari, Phys. Rev. E. {\bf 57},  7008  (1998).
605: 
606: %\bibitem{Wright} 
607: %S. Wright, Proceedings of the Sixth International Congress on Genetics,  {\bf 1}, 356, (1932). 
608: 
609: %\bibitem{Kauffman87}
610: %S.A. Kauffman and S. Levin, J. Theo. Biol. {\bf 128}, 11, (1987).
611: 
612: %\bibitem{Palmer91}
613: %R. Palmer, {\em "Molecular Evolution on Rugged Landscapes: Proteins, RNA and the Immune System} 
614: %edited by A.S. Perelson and S.A. Kauffman (Addison Wesley, Redwood City, 1991), p. 3.
615: 
616: %\bibitem{Fontana93}
617: %W. Fontana, P.F. Stadler, E.G. Bornberg-Bauer, T. Griesmacher, I.L. Hofacker, M. Tacker, 
618: %P. Tarazona, E.D. Weinberger and P. Schuster, Phys. Rev. E. {\bf 47}, 2083, (1993).
619: 
620: %\bibitem{Macken91}
621: %C.A. Macken and A.S. Perelson, SIAM J Appl Math. {\bf 51}, 6191, (1991).
622: % 
623: %\bibitem{Stadler95a}
624: %P.F. Stadler, J. Math. Chem. {\bf 20}, 1, (1996).
625: 
626: %\bibitem{Charlesworth}
627: %B. Charlesworth, Genet. Res. {\bf 55}, 199-221 (1990)
628: 
629: %\bibitem{Boerlijst}
630: %M. Boerlijst, S. Bonhoeffer, and M. Nowak, Proc. R. Soc. Lond. B {\bf 263},
631: %  1577  (1996). 
632: 
633: %\bibitem{OH98}
634: %G. Ochoa and G. Harvey, {\em Foundations of Genetic Algorithms (FOGA-5)}, edited by W. Banzhaf 
635: %and C. Reeves (Morgan Kaufmann, San Francisco, 1998).
636: 
637: %\bibitem{Stadler96}
638: %P.F. Stadler and G.P. Wagner, Evol. Comp. {\bf 5}, 241, (1997).
639: 
640: %\bibitem{Feldman}
641: %A. Bergman and M.W. Feldman, Physica D. {\bf 56}, 57, (1992).
642: 
643: %\bibitem{Monroe}
644: %S. Monroe and M. Schlesinger, Proc Natl Acad Sci USA {\bf 80}, 3279-3283, (1983).
645: 
646: %\bibitem{Li}
647: %T. Li and J.Y. Zhang, Journal of Virology, 2000, {\bf 74}, 16, 7646-7650, (2000). 
648: 
649: %\bibitem{Holland75}
650: %J. Holland, {\em Adaptation In Natural and Artificial Systems}, (The University of Michigan Press, 1975).
651: 
652: %\bibitem{Maynard70}
653: %J. Maynard Smith, Nature, {\bf 225}, 563, (1970).
654: 
655: %\bibitem{Jones}
656: %B.L. Jones, R.H. Enns and S.S. Rangnekar, Bull. Math. Biol. {\bf 38}, 15, (1976).
657: 
658: %\bibitem{Thomson}
659: %C.J. Thomson and J.L. McBride, Math. Biosci. {\bf 21}, 127, (1974).
660: 
661: %\bibitem{Bellman}
662: %R. Bellman, {\em Introduction to Matrix Analysis}, (McGraw-Hill, New York, 1970).
663: 
664: %\bibitem{MaynardEvolSex}
665: %J. Maynard Smith, {\em The Evolution of Sex}, (Cambridge University Press, 1978).
666: 
667: 
668: %\bibitem{Kondrashov88}
669: %A.S. Kondrashov, Nature, {\bf 336}, 435, (1988).
670: 
671: %\end{thebibliography}
672: 
673: \newpage
674: 
675: \begin{figure}[h]
676: \centering
677: \leavevmode
678: \epsfxsize = .75 \columnwidth
679: \epsfbox{plotmut50.eps}
680: \caption{The relative equilibrium concentrations of the 51 different error classes
681: for sequences of length 50 for different mutation rates. The fitness landscape 
682: has a single peak
683: $A_0 = 10$, and $A_L = 1$ $ \forall L \neq 0$. The error catastrophe occurs around
684: $p_m \approx 0.045$.}. 
685: \label{plotmut50}
686: 
687: \end{figure}
688: 
689: \newpage
690: 
691: \begin{figure}[h]
692: \centering
693: \leavevmode
694: \epsfxsize = .75 \columnwidth
695: \epsfbox{errorsym.eps}
696: 
697: \caption{The equilibrium distribution for the concentration of genomes
698: at different mutation rates. The genomes have 
699: length 4 and the crossover probability $p_c$ is $0.1$. There is a small difference 
700: in concentration between genomes in the same error class. Genomes
701: 1 and 4 have the same concentration due to the mirror symmetry in the binary strings. 
702: The symmetry breaking tends to increase with genome length.}
703: \label{brokensym}
704: \end{figure}
705: 
706: \newpage
707: 
708: \begin{figure}
709: \centering
710: \leavevmode
711: \epsfxsize = .75 \columnwidth
712: \epsfbox{plotrec25binom.eps}
713: 
714: \centering
715: \leavevmode
716: \epsfxsize = .75 \columnwidth
717: \epsfbox{plotmut25.eps}
718: 
719: \caption{The equilibrium distributions for recombination (upper graph) and pure
720: mutation (lower graph) dynamics, when the initial distribution is binomial
721: between the error classes. The gene sequences has length 25 and the fitness landscape
722: has an isolated peak ($A_0 = 10$, and $A_L = 1$ $\forall L \neq 0$).}
723: \label{numplot1}
724: \end{figure}
725: 
726: \newpage
727: 
728: \begin{figure}[h]
729: \centering
730: \leavevmode
731: \epsfxsize = .75 \columnwidth
732: \epsfbox{plotrec25mas.eps}
733: \caption{The equilibrium distribution for a recombining population when
734: the initial distribution is concentrated to the master sequence, 
735: $x_0 = 1$, and $x_K = 0$ $\forall K \neq 0$.  The gene sequences have length 
736: 25 and the fitness landscape has an isolated peak ($A_0 = 10$, and $A_L = 1$ 
737: $\forall L \neq 0$).}
738: \label{numplot2}
739: \end{figure}
740: 
741: \newpage
742: 
743: \begin{figure}[h]
744: \centering
745: \leavevmode
746: \epsfxsize = .75 \columnwidth
747: \epsfbox{distributions.eps}
748: 
749: \caption{Initial distributions for different values of the parameter $s $.}
750: \label{dist}
751: \end{figure}
752: 
753: \newpage
754: 
755: \begin{figure}[h]
756: \centering
757: \leavevmode
758: \epsfxsize = .75 \columnwidth
759: \epsfbox{eqdist.eps}
760: 
761: \caption{Equilibrium distributions for different values of the parameter $s $. The copying fidelity
762: is constant $q = 0.97$. Note that there are only two different equilibrium distributions.}
763: \label{numplot25}
764: \end{figure}
765: 
766: \newpage
767: 
768: \begin{figure}[h]
769: \centering
770: \leavevmode
771: \epsfxsize = .75 \columnwidth
772: \epsfbox{phasediagram.eps}
773: \caption{The copying fidelity at the phase-transition for different initial distributions
774: $x_k (s )$ (as defined in equation~\ref{init}). The gene
775: sequence has length 25 and the fitness landscape has an isolated peak ($A_0 = 10$, and $A_L = 1$ 
776: $\forall L \neq 0$).}
777: \label{numplot3}
778: \end{figure}
779: 
780: 
781: 
782: \end{document}
783: 
784: 
785: 
786: