0511:q-bio0511051/main.tex

1: \documentclass[11pt]{article}

2: \usepackage{epsfig,geometry} % see geometry.pdf on how to lay out the page. There's lots.

3: \geometry{a4paper} % or letter or a5paper or ... etc

4:

5: % \geometry{landscape} % rotated page geometry

6:

7: % See the ``Article customise'' template for come common customisations

8:

9: \title{Quasispecies and recombination}

10: \author{Martin Nilsson Jacobi\footnote{{\tt mjacobi@chalmers.se}} and Mats Nordahl \\

11: 		Chalmers University of Technology\\

12: 		Gothenburg, Sweden.}

13:

14: %%% BEGIN DOCUMENT

15: \begin{document}

16:

17: \maketitle

18:

19: \begin{abstract}

20:

21: Recombination is introduced into Eigen's

22: theory of quasispecies evolution. Comparing numerical simulations

23: of the rate equations in the

24: non-recombining and recombining cases show that

25: recombination has a strong

26: effect on the error threshold and, for a wide range of mutation rates,

27: gives rise to two stable fixed points in the dynamics. This bi-stability

28: results in the existence of two error thresholds. We prove that,

29: under some assumptions on the fitness landscape but for general crossover probability,

30: a fixed point localized about the sequence with superior fitness is globally

31: stable for low mutation rates.

32:

33: \end{abstract}

34:

35: \section{Introduction}

36:

37: \label{introduction}

38:

39: The quasispecies concept was introduced by Eigen in 1971~\cite{Eigen71}

40: to describe populations of self-replicating molecules.

41: A quasispecies is an equilibrium distribution of closely related gene

42: sequences, localized in sequence space around one or a few sequences

43: of high fitness. The quasispecies model can be viewed as a simple

44: framework that contains all the basic ingredients of Darwinian evolution.

45: In particular, it captures the critical relation

46: between mutation rate and information transmission~\cite{Eigen71,Eigen77}.

47: The behavior of these systems has been extensively studied,

48: see for instance~\cite{Eigen71,Eigen77,Schuster86,Schuster85,Swetina88}.

49: Quasispecies have also been fruitfully studied using concepts and

50: techniques from statistical physics, see, e.g.,

51: \cite{Leuthausser86,Tarazona92,AF98}.

52:

53: In the quasispecies model, the population dynamics is described

54: on the gene level, and a fitness landscape~\cite{Wright} is used to

55: define the degree of adaptation directly from the gene sequence.

56: Considerable amounts of work has gone into defining models of

57: rugged landscapes and analyzing their consequences for the

58: evolutionary dynamics (e.g.~\cite{Kauffman87,Palmer91,Fontana93,Macken91,Stadler95a}).

59:

60: In this paper we introduce recombination into the quasispecies

61: model. With some exceptions (see, e.g., \cite{Boerlijst,OH98,Stadler96,Feldman})

62: previous work on quasispecies has only considered

63: non-recombining populations where variation is created only by

64: mutation. However, most species in nature use crossover during replication, at least to some degree, which makes

65: this an important case to study.

66: Besides applications to evolutionary biology,

67: developing an understanding for the dynamics of systems under

68: recombination is also important for gaining theoretical

69: insights into the behavior of genetic algorithms \cite{Holland75} in

70: combinatorial optimization problems.

71:

72: Recombination introduces a non-linearity in the rate equations,

73: which in general results in the appearance of two stable fixed points.

74: For a wide range of mutation rates this divides the space of initial

75: distributions into two regions: one where the population converges to

76: a distribution localized around the

77: genome with highest fitness, and another where it converges to

78: an approximately uniform distribution.

79: This behavior is qualitatively different from that

80: of non-recombining populations. Another interesting observation

81: is the shift in the error threshold.

82:

83: The main contribution of the paper is a proof that, for a class of

84: fitness landscapes (see Section~\ref{singlefix} for details),

85: independent of the crossover probability, there exist exactly

86: one globally stabile fixed point. The single peaked fitness landscape

87: is a special case that belongs to this class.

88:

89: The rest of this paper is organized as follows:

90:

91: Section~\ref{quasi} gives a short review of quasispecies evolving

92: under mutation only, for comparison with the recombination case.

93: In section~\ref{recombination}, we introduce the rate equations for quasispecies

94: with mutation and recombination, and formulate a condition for

95: the equilibrium distribution

96: as a generalized non-linear eigenvalue problem.

97:

98: Section~\ref{num} contains results from numerical simulations of the rate

99: equations for a recombining population. We demonstrate how the equilibrium distribution changes

100: with mutation rate for different initial distributions.

101: As in the non-recombining case, a phase transition from a localized to

102: a uniform distribution occurs

103: when the mutation rate is increased. The dependence of the phase

104: transition point on the initial distribution is investigated.

105:

106: In section~\ref{singlefix} we prove that, under some assumptions on the fitness landscape

107: but without constraint on the crossover,

108: when the mutation rate is low enough all initial distributions converge

109: to a fixed point localized around the genome with highest fitness.

110: Finally, section~\ref{discussion} contains a discussion and conclusions.

111:

112: \section{Quasispecies }

113:

114: \label{quasi}

115:

116: In this section we give a short review of relevant results

117: for quasispecies with non-recombining replication~\cite{Eigen71,Eigen77},

118: to allow us to compare with the results when recombination is included.

119: In the model, a self-replicating molecule is represented by a sequence of

120: bases $s_k$, $\left( s_1 s_2 \cdots s_n \right)$. The bases are assumed

121: to be binary $\{ 0, 1 \}$,

122: and all sequences have equal length $n$. A genome is then

123: given by a binary string $\left( 011001 \cdots \right)$, which also

124: can be represented by an integer $k$ ($0 \leq k < 2^n$).

125: The space of all gene sequences in the model is called

126: sequence space~\cite{Maynard70}. A quasi-species is defined as a

127: distribution of sequences localized in sequence space.

128:

129: Selection in the quasispecies

130: model is expressed in terms of a fitness landscape,

131: which is a function of the phenotype and the environment.

132: The environment describes direct interactions with other organisms

133: as well as the physical environment.

134: In the quasispecies model we assume that the phenotype is directly

135: determined by the genotype. There is no direct interaction between

136: individuals in the population, only indirect competition for resources.

137: The fitness landscape can then be expressed as a function of the genotype only.

138: In the following, we only consider a simple landscape

139: with a single sequence of high fitness $A_0$, called the master sequence,

140: and with all other sequences $i$ having equal fitness $A_i < A_0$.

141:

142: Mutations are described by $Q_k ^l$,

143: the probability that replication of genome $l$  gives genome

144: $k$ as offspring. If the mutation rate per base, $p_m= 1 - q$,

145: where $q$ is the copying accuracy per base, is assumed to be

146: constant in time and independent of position in the genome,

147: we obtain

148: \begin{eqnarray}

149:     Q_k ^i & = & p_m ^{h_{k i}} q ^{n - h_{k i}} = q ^n

150:     \left( \frac{1-q}{q} \right) ^{h_{k i}} \label{eq1}

151: \end{eqnarray}

152: where $h_{k i}$ is the Hamming distance between genomes

153: $k$ and $i$.

154:

155: The rate equations that describe the dynamics of the population

156: are then given by (where $x_k$ denotes the relative concentration

157: of species $k$):

158: \begin{eqnarray}

159:      \dot{x} _k & = & \sum _l Q_k ^l A_l x_l - e x_k

160:              \label{eq2}

161: \end{eqnarray}

162: where $e  =  \sum _l A_l x_l$.

163: The second term ensures

164: the total normalization of the population ($\sum _l x_l = 1$).

165:

166: These differential equations can be solved analytically~\cite{Jones,Thomson}.

167: Equation (\ref{eq2}) can be made linear through a change of variables

168: and we can then use standard techniques to find $x_k$.  If all the elements

169: of the matrix $Q_k ^l$ are strictly positive, $x_k$

170: always converges to a unique stable fixed point~\cite{Bellman},

171: given by the eigenvector

172: corresponding to the largest eigenvalue $\l = e$ of the matrix

173: $Q_k ^l A_l$.

174:

175: For a landscape where the fitness only depends on the Hamming distance

176: from the master sequence, we can divide sequence space into error classes

177: containing sequences with the same number of ones.

178: The effective dimension of the system of equations

179: (\ref{eq2}) can then be reduced from $2^n$ to $n+1$ by summing over

180: error classes. In this way we obtain the new equations

181:

182: \begin{eqnarray}

183:      \dot{x}_K & = & \sum _L \tilde{Q}_K ^L A_L x_L - E x_K

184:     \label{eq4}

185: \end{eqnarray}

186: where the indices $K$ and $L$ denote error classes, and

187: $\tilde{Q}_K ^L$ describes mutation probabilities between

188: error classes rather than sequences.

189:

190: We now consider a fitness landscape with $A_0 = 10$, and

191: $A_L = 1$ for all $ L \neq 0$. The sequences are indexed by their

192: Hamming distance from the master sequence. The equilibrium distributions

193: corresponding to different mutation rates, $p_m$,

194: are shown in figure~\ref{plotmut50}. There is a sharp

195: transition between a state where the population is localized around

196: the master sequence $x_0$ and a state where the

197: distribution is approximately binomial. This is the error

198: catastrophe (or error threshold) of Eigen and coworkers.\\

199: \\

200: Fig.~\ref{plotmut50} here.

201: \\

202:

203:

204: The error catastrophe occurs

205: approximately when $q ^n A _0 / A_i = 1$, or

206: when the selective advantage of the master sequence, $A_0 / A_i$, is

207: compensated by the finite probability $q^n < 1$ for the master

208: sequence to replicate to itself.

209:

210: This observation is important for theories of prebiotic evolution

211: of life. When polynucleotides replicate without replicase enzymes, the copying

212: fidelity is unlikely to exceed 0.99, which means that $n$ cannot be larger than

213: 100~\cite{Eigen71}. This is much smaller than coding regions for replicase enzymes,

214: which are needed to increase the copying fidelity. This contradiction is

215: often called Eigen's paradox. There have been several different attempts to resolve this

216: problem, such as hyper-cycles~\cite{Eigen77}.

217:

218:

219: In the following sections we consider quasispecies where both recombination

220: and mutation can occur during

221: replication. The introduction of recombination will cause major changes

222: in the population dynamics. As an example, we observe that the rate

223: equations have multiple stable fixed points. The error threshold also

224: also significantly shifted.

225:

226: \section{Recombination}

227:

228: \label{recombination}

229:

230: The crossover operator, $T_k ^{l m}$, denotes the

231: probability that parents $l$ and $m$ give rise to the offspring $k$ in one

232: recombination event~\cite{Boerlijst,Stadler96}.

233: The crossover operator $T_k ^{l m}$ depends on the

234: crossover probability $p_c \in [ 0 , 0.5 ]$, i.e., the probability per base pair

235: for the reading process to switch from one parent to the other.

236: As an example, $p_c = 0.5$ (uniform crossover) means that each position in the

237: genome is chosen with equal probability from each parent. Another

238: extreme case is $p_c = 0$ which means that the offspring inherits all

239: its genome from a single randomly chosen parent.

240:

241: The crossover operator has the following properties

242:

243: \begin{eqnarray}

244:   &&  0 \leq  T_k ^{l m} \leq 1 \label{eq5} \\

245:   &&  \sum _k T_k ^{l m} =  1 \:\:\:\:  \forall l,m \label{eq6}  \label{eq7}

246: \end{eqnarray}

247: For uniform crossover we can write $T_k^{l m}$ explicitly as

248: \begin{eqnarray}

249:         T_k^{l m} & = & \left\{ \begin{array}{lcl} 2^{- h_{l m}} & \mbox{if} & O(k,l,m) = 1 \\

250:                                                 0 & \mbox{if} & O(k,l,m) = 0 \end{array} \right.

251: \end{eqnarray}

252: where $O(k,l,m) = 1$ if at each position where the parents genome $l$ and $m$ are identical,

253: the same base also appears in the child genome $k$, else $O(k,l,m) = 0$. New genes can only be created by mutations.

254:

255: The most realistic and interesting population dynamics involves both recombination

256: and mutations. In our model we have only recombining individuals and the point

257: mutations will come in as limited reading accuracy in the crossover process. We have

258:  chosen to let the number of offsprings depend on both parents.

259: The rate equations for a population of sequences which both recombine

260: and mutate are then given by

261:

262: \begin{eqnarray}

263:      \dot{x}_k & = & \sum _{l m} V _k ^{l m} A_l x_l A_m x_m - c x_k \label{eq8}

264: \end{eqnarray}

265: where $V _K ^{l m} = \sum _i Q_k ^i T_i ^{l m}$ and $c = \left( \sum _l A_l x_l\right)^2$ (which

266: is used to normalize the total growth as before).

267:

268: The rate equations in the case of recombination are in general much harder

269: to analyze than in the case of pure mutations.  The crossover

270: operator acts on pairs of sequences, which gives rise to a non-linearity in

271: the growth term. We are mainly interested

272: in the equilibrium distribution, i.e., the concentration of sequences after long time.

273: In the pure mutation case the stable equilibrium distribution could be calculated

274:  by solving a standard eigenvalue problem. When recombination is used

275: the fixed points of the rate equations (\ref{eq8}), $\vec{y}$, are solutions to

276: the generalized eigenvalue problem:

277:

278: \begin{eqnarray}

279:     \sum _{l m} V _k ^{l m} A_l y_l A_m y_m & = & \lambda y_k \;\;\; \forall k \label{eq9}

280: \end{eqnarray}

281:

282: All normalized ($\sum _l y_l = 1$) solutions

283: to (\ref{eq9})  are also fixed points to the rate equations, since summing over $k$ gives the

284: relation $\lambda = \left( \sum _l A_l y_l \right) ^2 = c$. There may however exist solutions to equation

285: (\ref{eq9}) which cannot be normalized to a vector of concentrations, since all elements

286: must be non-negative.

287:

288: In general there exists more than one solution to (\ref{eq9}) which can be normalized

289: to a concentration vector. It turns out that these multiple fixed points can be stable,

290: see section~\ref{num}.

291: One of the most important differences between the non-recombining and the recombining case is

292: in fact the uniqueness of the equilibrium distribution.

293: As we will see in section~\ref{num} the equilibrium distribution of the rate

294: equation (\ref{eq8})

295: depends on the initial distribution

296: (as was previously observed in other models, e.g.~\cite{Feldman}).

297: This behavior is very different

298: from the pure mutation

299: case, where all initial distributions converge to a unique stable fixed point,

300: as discussed in section~\ref{quasi}.

301:

302: However, in Section~\ref{singlefix} we present a proof that in the zero

303: mutation rate limit, the only globally stabile fixedpoint corresponds

304: to a population totally localized on the fitness peak.

305:

306:

307: The dimension of  sequence space scales exponentially with the number of bases

308: in the genome.  In the non-recombining case we saw

309:  that the degrees

310: of freedom in the rate equations (\ref{eq8}) could be reduced from $2^n$ to $n+1$

311: by dividing the sequences into

312: error classes. This symmetry is in general broken by recombination (see

313: figure~\ref{brokensym}).

314: The only  non trivial case when the rate equation

315: (\ref{eq8}) preserves the symmetry between the error classes,  is when $p_c = 0.5$

316: (uniform crossover). In this case we can write the reduced rate equations as

317:

318: \begin{eqnarray}

319:      \dot{x}_K & = & \sum _{L,\:M} \tilde{V} _K ^{L M} A_L x_L A_M x_M - C x_K \label{eq18}

320: \end{eqnarray}

321: where we use the same notation as in equation (\ref{eq4}).

322: For $p_c=0.5$ and $p_m = 0$ the transition probabilities between error-classes

323: $\tilde{V}_K ^{LM}$ are given by

324: \begin{eqnarray}

325:       \tilde{V}_K^{L M} & = & \frac{\sum_{d=|M - L|}^{M+L+2\min(n-L-M,0)}

326:        \left( \begin{array}{c} L \\

327:                 \min(l,m) - \frac{2 d - | l-m |}{2}

328:                 \end{array}\right)}

329:         {\left( \begin{array}{c} n \\ M \end{array} \right)}

330: \end{eqnarray}

331:

332:

333: In the more realistic case when $p_c < 0.5$, we either have to be satisfied with rather

334: small genome sizes or need to use some approximation method.\\

335: \\

336: Fig.~\ref{brokensym} here.

337: \\

338:

339: \section{Numerical Results}

340: \label{num}

341:

342: Fig.~\ref{numplot1} here.

343: \\

344:

345: In this section we present results from computer simulations of the rate

346: equations (\ref{eq8}). We concentrate on the asymptotic behavior as time goes

347: to infinity, and do not consider detailed  dynamics of the transients.

348: Equilibrium distributions are obtained  by a straight-forward simulation

349:  of the differential equations. All the simulations

350: in this section  use uniform crossover ($p_c = 0.5$),

351: which preserves the error class symmetry.

352:

353: We now consider a fitness landscape with an isolated peak ($A_0 = 10$, and $A_L = 1$

354: $\forall L \neq 0$). The equilibrium distributions for recombining and non-recombining populations

355: are presented in figure~\ref{numplot1}, where the initial distribution is

356: binomial over the error classes. The phase transition between the localized and

357: non-localized state is extremely sharp in the recombination case. The phase transition

358: occurs at a mutation rate which is orders of magnitude lower

359: than in the non-recombining population.

360:

361: Figure~\ref{numplot2} shows the equilibrium distribution of recombination dynamics

362: with the same fitness landscape as  figure~\ref{numplot1}; the only difference

363: is the initial distribution which is completely localized to the master sequence

364: ($x_0 = 1$, and $x_K = 0$ $\forall K \neq 0$). We see that the equilibrium distributions

365: depend strongly on the initial distributions. The error threshold

366: is still lower than in the pure mutation case, however the difference is

367:  much smaller. In general recombination

368: in single peak fitness landscapes tends to mix the gene sequences and push the

369: population above the error threshold.\\

370: \\

371: Fig.~\ref{numplot2} here.

372: \\

373:

374: Figure~\ref{numplot25} and~\ref{numplot3} show how the equilibrium distributions and the

375: phase transition point varies with the initial distribution. The initial distributions are given by

376:

377: \begin{eqnarray}

378: x_k (s ) & = & \frac{ 2^{-s \cdot k} \left( \begin{array}{c} N \\ k \end{array} \right)}

379:                 {\left( 1 + 2^{-s} \right) ^N}

380: \label{init}

381: \end{eqnarray}

382: This gives a uniform distribution for $s =0$ and

383: a distribution concentrated to the master-sequence for large $s$.

384: The graphs in figure~\ref{dist} show the initial

385: distributions for some discrete

386:  parameter values, $s = 0 , 1 , \cdots ,5$. Figure~\ref{numplot25} shows that there are two different

387: regions in the space of initial distributions, converging to two different fixed points.

388: In one corner of this space

389: all the genomes are master-sequences. If the concentration vector starts out far from this corner

390: it will not converge into the corner unless the mutation rate is extremely low

391: (as illustrated in figure~\ref{numplot1}

392: or by the case of $s \in [ 0,1 ] $ in figure~\ref{numplot3}). If the initial distribution

393: starts near the corner it will converge

394: into the corner for much larger mutation rates (see figure~\ref{numplot2} or the region

395: $s \in [ 3 , 5 ] $ in

396: figure~\ref{numplot3}). Figure~\ref{numplot3} shows the

397: location of the phase transition point for different

398: initial distributions defined by equation~\ref{init}. This phase diagram shows how the border between the

399: two regions in figure~\ref{numplot25} changes with mutation rate. A change of $p_m$ from $9 \cdot 10^{-6}$ to

400: $0.055$, changes the border from $s =1$ to $3$. When the mutation rate is too low or too high only

401: one region exists corresponding to a single stable fixed-point.

402:

403: That there is an upper bound on the mutation rate where a stable localized fixed point ceases to exist

404: is obvious. The existence of a lower bound, below which all initial distributions converge to a

405: localized distribution, is however non-trivial. This lower bound always exists and we will

406: present a proof of this in section~\ref{singlefix}.\\

407: \\

408: Fig.~\ref{dist} here.\\

409: Fig.~\ref{numplot25} here.\\

410: Fig.~\ref{numplot3} here.\\

411:

412: The main conclusion to be drawn from these numerical simulations is that, for a wide

413: range of mutation rates, one finds a coexistence of two different equilibrium distributions

414: to the rate equations involving both recombination and point mutations. Which of these

415: fixed points the population will converge to depends on the initial distribution. This means that

416: the space of initial distributions consists of two regions, with the border between this regions

417: depending on the mutation rate. The whole range of mutation rates where a localized fixed point

418: exists is however lower than the phase transition point in the non-recombining case. This shows that

419: a recombining population is more sensitive to mutation than a non-recombining one on a

420: single peak landscape. Similar conclusions have been reached in a simpler model by

421: Bergman and Feldman~\cite{Feldman}. Similar results have also been shown in other work,

422: see e.g.,~\cite{Boerlijst}.

423:

424: \section{Existence of a single fixed-point at zero mutation rate.}

425:

426: \label{singlefix}

427:

428: In this section we investigate the behavior of the rate equations when $p_m \rightarrow 0^+$.

429: In section~\ref{num} it was shown numerically that at very low mutation rates, all initial distributions converge

430: to a highly localized equilibrium distribution. Here we show that this region always

431: exists for fitness landscapes fulfilling certain assumptions, to be specified below.

432:

433: The idea behind the proof is to study the dynamics of one position or loci in the genome and sum

434: over all possibilities at the other positions. Let $S ^{ ( N, n , i)}_{\alpha}$ denote all genomes of length

435: $N$ that contain the sequence $\alpha$ starting at position $1 \leq i \leq N-n$, where $\alpha$ is an index coding for genomes

436: of length $n$. For

437: example; $S^{ ( 10,2,1 )}_3$ will be all genomes of length $10$ that starts with $( 1 1 )$. We also

438: introduce the notation $x^{(N)}_k$, where $N$ simply indicates the genome length and

439: affects decoding of the index $k$. We can now write the rate equations~(\ref{eq8}) as

440:

441: \begin{eqnarray}

442:         \dot{x}^{(N)}_k & = & \sum _{l , m} V_k^{(N) l m} A_l x^{(N)}_l A_m x^{(N)}_m -

443:                                \left( \sum _l A_l x_l ^{(N)} \right) ^2 x^{(N)}_k

444: \end{eqnarray}

445: The crossover operator has the following property

446:

447: \begin{eqnarray}

448:         \sum _{k \in S^{(N,n,i)}_{\alpha} } T_k ^{l m} & = & T_{\alpha}^{\beta \gamma} \mbox{ for } l \in S^{(N,n,i)}_{\beta} ,

449:                                                          m \in S^{(N,n,i)}_{\gamma}, \forall i

450: \end{eqnarray}

451: where no assumptions on the crossover probability in made.

452:

453: Since the point mutation operator $Q^{(N) l}_k$ has the same property, so will the combined operator

454: $V^{(N) l m}_k$. We can now use this property and sum the rate equations over all sequences in $S^{(N, 1,i)}_{\alpha}$

455:

456: \begin{eqnarray}

457:         \sum _{k \in S^{(N , 1,i)}_{\alpha}} \dot{x}^{(N)}_k & = & \sum _{k \in S^{(N , 1,i)}_{\alpha}} \left(

458:                     \sum_{l , m} V_k^{(N) l m} A_l x^{(N)}_l A_m x^{(N)}_m \right. \\

459:                   & & \left. -  \left( \sum _l A_l x_l ^{(N)} \right) ^2 x^{(N)}_k \right) \Rightarrow \nonumber \\

460:         \dot{x}^{(1)}_{\alpha} & = & \sum _{\beta , \gamma} V_{\alpha}^{(1) \beta \gamma} \sum _{l \in S^{(N , 1,i)}_{\beta}} A_l x^{(N)} _l

461:             \sum _{m \in S^{(N , 1,i)}_{\gamma}} A_m x^{(N)} _m \\

462:                      & & - \left( \sum _{\beta} \sum _{l \in S^{(N , 1,i)}_{\beta}} A_l

463:             x^{(N)} _l \right) ^2 x_{\alpha}^{(1)} \label{one_loci_eq}

464: \end{eqnarray}

465: The following, compact, notation is now introduced:

466:

467: \begin{eqnarray}

468:         \sum _{ l \in S_{\beta}^{(N , 1,i)}} A_l x^{(N)}_l & = & \left\{ \begin{array}{lcl}  \Delta ^{(i)}_0 & \mbox{ if } & \beta = 0 \\

469:                                    \Delta ^{(i)} _1 & \mbox{ if } & \beta = 1 \end{array} \right.

470: \end{eqnarray}

471: Eq.~\ref{one_loci_eq} simplifies to

472:

473: \begin{eqnarray}

474:         \dot{x}_0^{(1)} & = & q \left( \Delta ^{(i)} _0 \right) ^2 + \Delta ^{(i)} _0 \Delta ^{(i)} _1 + (1-q) \left( \Delta ^{(i)} _1 \right) ^2 -

475: 		\left( \Delta ^{(i)} _0 + \Delta ^{(i)} _1 \right) ^2 x_0 ^{(1)} \\

476:         x_1^{(1)} & = & 1 - x_0 ^{(1)}

477: \end{eqnarray}

478: which, in the limit $q \rightarrow 1^-$, simplify to

479:

480: \begin{eqnarray}

481:         \dot{x}_0 ^{(1)} & = & \left( \Delta ^{(i)} _0 + \Delta ^{(i)} _1 \right) \left( \Delta ^{(i)} _0 x_1^{(1)} - \Delta ^{(i)} _1 x_0 ^{(1)} \right) \nonumber \\

482: 	x_1 ^{(1)} & = & 1 - x_0 ^{(1)}

483: 	\label{part_result}

484: \end{eqnarray}

485: To continue the following assumption on the fitness landscape is needed:

486:

487: \begin{eqnarray}

488: 	A _l & \leq & A_m \hspace{0.3cm} \mbox{if} \hspace{0.1cm} l \in S ^{N,1,i} _1 ,  m \in S ^{N,1,i} _0

489: \label{assumption1}

490: \end{eqnarray}

491: We further assume that there exist a gene sequence $M \in S ^{N,1,i} _0$ such that

492:

493: \begin{eqnarray}

494: 	A _l & < & A_M \hspace{0.2cm} \forall l \in S ^{N,1,i} _1

495: \label{assumption2}

496: \end{eqnarray}

497: These two assumptions mean that no sequences with a zero at position $i$ have a fitness

498: inferior to any sequence with a one  at this position, and that there exist

499: at least one sequence with with a zero at position $i$ with strictly larger

500: fitness than the sequences with a one at this position. Under these assumptions,

501: the following inequalities hold

502:

503: \begin{eqnarray}

504: 	\Delta ^{(i)} _0 & \geq & \Delta ^{(i)} _{0, min} x^{(1)} _0 \nonumber \\

505: 	\Delta ^{(i)} _1 & \leq & \Delta ^{(i)} _{1, max} x^{(1)} _1

506: \label{estimate}

507: \end{eqnarray}

508: where $\Delta ^{(i)} _{0, min}$ ($ \Delta ^{(i)} _{1, max}$) denotes the minimum (maximum)

509: fitness of the sequences with a $0$ ($1$) at position $i$. We further note that at least one of the

510: inequalities in Eq.~\ref{estimate} is strict unless $x _{M} ^{(N)} =0$ for all $M$ fulfilling

511: Eq.~\ref{assumption2}. Eq.~\ref{estimate} implies the following estimate

512:

513: \begin{eqnarray}

514: 	\dot{x}_0 ^{(1)} & \geq & \left( \Delta ^{(i)} _{0, min} -

515: 		\Delta ^{(i)} _{1, max} \right) x^{(1)} _1 x^{(1)} _0

516: \label{result}

517: \end{eqnarray}

518: with equality if and only if  $x _{M} ^{(N)} =0$ for all $M$ fulfilling

519: Eq.~\ref{assumption2} or $ x^{(1)} _1 =0$ or $x^{(1)} _0 =0$. Note however

520: that $x_0^{(1)}=0$, $x_1^{(1)}=1$ is an (unstable) fixed-point since no mutations

521: implies no inventions of new genes.

522:

523:

524: From Eq.~\ref{result} it is clear that the rate equations will converge to a state

525: where all sequences has a zero at position $i$.

526:   This fixed point is unstable and it is clear that they cease to exist when

527: the mutation rate is non-zero.

528:

529: We conclude that all sequences with a one at position $i$ will diminish

530: after long time, and can therefore be be discarded. We can then search for

531: a new position such that the remaining half of the fitness landscape

532: satisfies the assumptions in

533: Eq.~\ref{assumption1} and~\ref{assumption2}. If this can be repeated (possibly interchanging

534: the zero and one as being superior, since this choice is arbitrary) until the last

535: position, we conclude that the rate equations converge to a state completely

536: dominated by genomes with the same sequence (which necessarily is a global optimum).

537: Loosely, we may describe such fitness landscapes as having a natural ordering of the

538: importance of its loci. One example of a fitness landscape fulfilling these requirements

539: is a single peaked fitness landscape, describing a degenerate case where the

540: positions can be chosen arbitrarily.

541:

542:

543: \section{Conclusions and discussion}

544: \label{discussion}

545:

546: We have studied Eigen's quasispecies model extended

547: to include crossover as well as mutations.

548: The numerical simulations of section~\ref{num} show that there are significant changes

549: in the dynamics of the rate equations because of the non-linearity arising from

550: the introduction of crossover. For a wide range of mutation rates,

551: two simultaneous stable fixed points

552: exist. One fixed point is concentrated around the master sequence while the other describes

553: a uniform distribution. For extremely low and rather high mutation frequencies

554: there is only

555: a single fixed point, corresponding to the localized distribution and the

556: uniform one, respectively.

557: The mutation frequency at the point where the localized fixed point ceases to

558: exist is still lower than the error threshold without recombination.

559:

560: In this paper we prove that, for a class of fitness landscapes having a hierarchical

561: ordering of the loci in the genome (see Section~\ref{singlefix} for details),

562: a single globally stabile fixed point exist in the limit of zero mutation rate.

563: Since the proof is valid for all crossover probabilities, the only natural

564: generalization is to expand the class of fitness landscapes. A possible

565: generalization of the technique in Section~\ref{singlefix} could be to prove that;

566: within larger class of

567: fitness landscapes, for any point in time, i.e., for any distribution $\vec{x}^{(N)}$,

568: we can always find a position $i$ such that Eq.~\ref{result} is fulfilled.

569: The position $i$ would now depend on the distribution (which changes in time),

570: not only the fitness landscape which is the case in our proof. Technically however,

571: this generalization is non-trivial since the changing of position with the

572: distribution makes it complicated to argue that all locus in the global fixedpoint

573: will dominate completely in the infinite time limit.

574:

575: \bibliographystyle{unsrt}

576:

577: \bibliography{evolution}

578:

579: %\begin{thebibliography}{10}

580:

581: %\bibitem{Eigen71}

582: %M. Eigen, Naturwissenschaften {\bf 58},  465  (1971).

583:

584: %\bibitem{Eigen77}

585: %M. Eigen and P. Schuster, Naturwissenschaften {\bf 64},  541  (1977).

586:

587: %\bibitem{Schuster86}

588: %P. Schuster and P.F. Stadler, Physica D {\bf 16}, 100  (1986).

589:

590: %\bibitem{Schuster85}

591: %P. Schuster and K. Sigmund, Ber. Bunsenges. Phys. Chem. {\bf 89},  668  (1985).

592:

593: %\bibitem{Swetina88}

594: %J. Swetina and P. Schuster, Bull. Mat. Biol. {\bf 50}, 635, (1988).

595:

596: %\bibitem{Leuthausser86}

597: %I. Leuth\"ausser, J. Chem. Phys. {\bf 84},  1884  (1986).

598:

599: %\bibitem{Tarazona92}

600: %P. Tarazona, Phys. Rev. A {\bf 45},  6038  (1992).

601:

602: %

603: %\bibitem{AF98}

604: %D. Alves and J. Fontanari, Phys. Rev. E. {\bf 57},  7008  (1998).

605:

606: %\bibitem{Wright}

607: %S. Wright, Proceedings of the Sixth International Congress on Genetics,  {\bf 1}, 356, (1932).

608:

609: %\bibitem{Kauffman87}

610: %S.A. Kauffman and S. Levin, J. Theo. Biol. {\bf 128}, 11, (1987).

611:

612: %\bibitem{Palmer91}

613: %R. Palmer, {\em "Molecular Evolution on Rugged Landscapes: Proteins, RNA and the Immune System}

614: %edited by A.S. Perelson and S.A. Kauffman (Addison Wesley, Redwood City, 1991), p. 3.

615:

616: %\bibitem{Fontana93}

617: %W. Fontana, P.F. Stadler, E.G. Bornberg-Bauer, T. Griesmacher, I.L. Hofacker, M. Tacker,

618: %P. Tarazona, E.D. Weinberger and P. Schuster, Phys. Rev. E. {\bf 47}, 2083, (1993).

619:

620: %\bibitem{Macken91}

621: %C.A. Macken and A.S. Perelson, SIAM J Appl Math. {\bf 51}, 6191, (1991).

622: %

623: %\bibitem{Stadler95a}

624: %P.F. Stadler, J. Math. Chem. {\bf 20}, 1, (1996).

625:

626: %\bibitem{Charlesworth}

627: %B. Charlesworth, Genet. Res. {\bf 55}, 199-221 (1990)

628:

629: %\bibitem{Boerlijst}

630: %M. Boerlijst, S. Bonhoeffer, and M. Nowak, Proc. R. Soc. Lond. B {\bf 263},

631: %  1577  (1996).

632:

633: %\bibitem{OH98}

634: %G. Ochoa and G. Harvey, {\em Foundations of Genetic Algorithms (FOGA-5)}, edited by W. Banzhaf

635: %and C. Reeves (Morgan Kaufmann, San Francisco, 1998).

636:

637: %\bibitem{Stadler96}

638: %P.F. Stadler and G.P. Wagner, Evol. Comp. {\bf 5}, 241, (1997).

639:

640: %\bibitem{Feldman}

641: %A. Bergman and M.W. Feldman, Physica D. {\bf 56}, 57, (1992).

642:

643: %\bibitem{Monroe}

644: %S. Monroe and M. Schlesinger, Proc Natl Acad Sci USA {\bf 80}, 3279-3283, (1983).

645:

646: %\bibitem{Li}

647: %T. Li and J.Y. Zhang, Journal of Virology, 2000, {\bf 74}, 16, 7646-7650, (2000).

648:

649: %\bibitem{Holland75}

650: %J. Holland, {\em Adaptation In Natural and Artificial Systems}, (The University of Michigan Press, 1975).

651:

652: %\bibitem{Maynard70}

653: %J. Maynard Smith, Nature, {\bf 225}, 563, (1970).

654:

655: %\bibitem{Jones}

656: %B.L. Jones, R.H. Enns and S.S. Rangnekar, Bull. Math. Biol. {\bf 38}, 15, (1976).

657:

658: %\bibitem{Thomson}

659: %C.J. Thomson and J.L. McBride, Math. Biosci. {\bf 21}, 127, (1974).

660:

661: %\bibitem{Bellman}

662: %R. Bellman, {\em Introduction to Matrix Analysis}, (McGraw-Hill, New York, 1970).

663:

664: %\bibitem{MaynardEvolSex}

665: %J. Maynard Smith, {\em The Evolution of Sex}, (Cambridge University Press, 1978).

666:

667:

668: %\bibitem{Kondrashov88}

669: %A.S. Kondrashov, Nature, {\bf 336}, 435, (1988).

670:

671: %\end{thebibliography}

672:

673: \newpage

674:

675: \begin{figure}[h]

676: \centering

677: \leavevmode

678: \epsfxsize = .75 \columnwidth

679: \epsfbox{plotmut50.eps}

680: \caption{The relative equilibrium concentrations of the 51 different error classes

681: for sequences of length 50 for different mutation rates. The fitness landscape

682: has a single peak

683: $A_0 = 10$, and $A_L = 1$ $ \forall L \neq 0$. The error catastrophe occurs around

684: $p_m \approx 0.045$.}.

685: \label{plotmut50}

686:

687: \end{figure}

688:

689: \newpage

690:

691: \begin{figure}[h]

692: \centering

693: \leavevmode

694: \epsfxsize = .75 \columnwidth

695: \epsfbox{errorsym.eps}

696:

697: \caption{The equilibrium distribution for the concentration of genomes

698: at different mutation rates. The genomes have

699: length 4 and the crossover probability $p_c$ is $0.1$. There is a small difference

700: in concentration between genomes in the same error class. Genomes

701: 1 and 4 have the same concentration due to the mirror symmetry in the binary strings.

702: The symmetry breaking tends to increase with genome length.}

703: \label{brokensym}

704: \end{figure}

705:

706: \newpage

707:

708: \begin{figure}

709: \centering

710: \leavevmode

711: \epsfxsize = .75 \columnwidth

712: \epsfbox{plotrec25binom.eps}

713:

714: \centering

715: \leavevmode

716: \epsfxsize = .75 \columnwidth

717: \epsfbox{plotmut25.eps}

718:

719: \caption{The equilibrium distributions for recombination (upper graph) and pure

720: mutation (lower graph) dynamics, when the initial distribution is binomial

721: between the error classes. The gene sequences has length 25 and the fitness landscape

722: has an isolated peak ($A_0 = 10$, and $A_L = 1$ $\forall L \neq 0$).}

723: \label{numplot1}

724: \end{figure}

725:

726: \newpage

727:

728: \begin{figure}[h]

729: \centering

730: \leavevmode

731: \epsfxsize = .75 \columnwidth

732: \epsfbox{plotrec25mas.eps}

733: \caption{The equilibrium distribution for a recombining population when

734: the initial distribution is concentrated to the master sequence,

735: $x_0 = 1$, and $x_K = 0$ $\forall K \neq 0$.  The gene sequences have length

736: 25 and the fitness landscape has an isolated peak ($A_0 = 10$, and $A_L = 1$

737: $\forall L \neq 0$).}

738: \label{numplot2}

739: \end{figure}

740:

741: \newpage

742:

743: \begin{figure}[h]

744: \centering

745: \leavevmode

746: \epsfxsize = .75 \columnwidth

747: \epsfbox{distributions.eps}

748:

749: \caption{Initial distributions for different values of the parameter $s $.}

750: \label{dist}

751: \end{figure}

752:

753: \newpage

754:

755: \begin{figure}[h]

756: \centering

757: \leavevmode

758: \epsfxsize = .75 \columnwidth

759: \epsfbox{eqdist.eps}

760:

761: \caption{Equilibrium distributions for different values of the parameter $s $. The copying fidelity

762: is constant $q = 0.97$. Note that there are only two different equilibrium distributions.}

763: \label{numplot25}

764: \end{figure}

765:

766: \newpage

767:

768: \begin{figure}[h]

769: \centering

770: \leavevmode

771: \epsfxsize = .75 \columnwidth

772: \epsfbox{phasediagram.eps}

773: \caption{The copying fidelity at the phase-transition for different initial distributions

774: $x_k (s )$ (as defined in equation~\ref{init}). The gene

775: sequence has length 25 and the fitness landscape has an isolated peak ($A_0 = 10$, and $A_L = 1$

776: $\forall L \neq 0$).}

777: \label{numplot3}

778: \end{figure}

779:

780:

781:

782: \end{document}

783:

784:

785:

786: