0008:cond-mat0008123/4z.tex

1: \documentclass{article}

2: \usepackage{a4,amsmath,epsfig,amssymb}

3: %\newcommand{\bm}[1]{\mbox{\boldmath $#1$}}

4: \newcommand{\bm}[1]{\boldsymbol #1}

5: \newcommand{\zmpf}[1]{\mbox{\hspace{#1em}}}

6: \newcommand{\Id}{\mbox{$\,$\rm 1\zmpf{-0.62}{\small 1}}}

7: \newcommand{\RR}{\mathbb R}

8: \newcommand{\TT}{\mathbb T}

9: \newcommand{\CC}{\mathbb C}

10: \newcommand{\ZZ}{\mathbb Z}

11: \newcommand{\QQ}{\mathbb Q}

12:

13: \begin{document}

14:

15: \title{Four-state quantum chain\\ as a model of sequence evolution}

16: \author{{\sc Joachim Hermisson$^{1,2}$, Holger Wagner$^{3}$ and

17: Michael Baake$^{1}$}

18: \\[2mm]

19: ${}^{1}$Institut f\"ur Theoretische Physik, Universit\"at

20: T\"ubingen,\\ Auf der Morgenstelle 14, 72076 T\"ubingen, Germany\\

21: ${}^{2}$Institut f\"ur Theorie der Kondensierten Materie,\\

22: Universit\"at Karlsruhe, 76128 Karlsruhe, Germany\\

23: ${}^{3}$Max-Planck-Institut f\"ur Biophysikalische Chemie,\\

24: Am Fa{\ss}berg 11, 37077 G\"ottingen, Germany}

25: \maketitle

26: \begin{abstract}

27: A variety of selection-mutation models for DNA (or RNA) sequences,

28: well known in molecular evolution, can be translated into a model of coupled

29: Ising quantum chains. This correspondence is used to investigate the

30: genetic variability and error threshold behaviour in dependence of possible

31: fitness landscapes. In contrast to the two-state models treated

32: hitherto, the model explicitly takes the four-state nature of the

33: nucleotide alphabet into account and allowes for the distinction of

34: mutation rates for the different base substitutions, as given by

35: standard mutation schemes of molecular phylogeny. As a consequence of

36: this refined treatment, new phase diagrams for the error threshold

37: behaviour are obtained, with appearance of a novel phase in which the

38: nucleotide ordering of the wildtype sequence is only partially conserved.

39: Explicit analytic and numeric results are presented for evolution

40: dynamics and equilibrium behaviour in a number of accessible

41: situations, such as quadratic fitness landscapes and the Kimura

42: 2 parameter mutation scheme.

43: \end{abstract}

44:

45: \section{Introduction}

46:

47: One prominent phenomenon in the theory of molecular evolution that has

48: also attracted considerable attention in statistical physics is the

49: so-called {\em error threshold}. It describes the breakdown of

50: genetic order in mutation-selection models for mutation rates

51: surpassing a certain critical value. The prototype model for the

52: description of the error threshold is Eigen's quasispecies model

53: in sequence space \cite{E,ECS} (which is effectively equivalent to a

54: coupled mutation-selection model in population genetics, cf \cite{CK}),

55: originally designed for the description of prebiotic RNA

56: evolution. However, the threshold is supposed to be a phenomenon that

57: should occur in a rather general class of mutation-selection models.

58:

59: In order to set up a mutation-selection model that is tractable by

60: analytical (or at least numerical) methods, severe simplifications

61: of the original biological situation seem to be indispensable.

62: Analytical approaches generally have to restrict to the treatment

63: of infinitly large populations and rather simple fitness functions,

64: such as the sharply peaked landscape of Eigens original model.

65: Another common approximation, also used in previous studies of the

66: quasispecies model, amounts for the simplified

67: representation of genotypes as binary strings. In the context of

68: molecular evolutionary theory, this may be thought of as representing

69: DNA or RNA strands by sequences of {\em purins} and {\em pyrimidins},

70: hence with only two states per site, neglecting the fact that genetic

71: information is really given by a four-letter alphabet. In this

72: article, we present a four-state mutation-selection model

73: which is capable to describe the full nucleotide alphabet and

74: incorporates the standard mutation schemes of molecular phylogeny.

75: In particular, the phase diagramms are discussed in detail which

76: are more polymorphic than for the two-state model. This shows that,

77: for a full understanding of the error threshold behaviour in

78: molecular evolution, investigations can not be restricted entirely

79: to the study of two-state models.

80:

81: One important step towards an understanding of the

82: threshold phenomenon has been its identification with an equilibrium phase

83: transition in physics by the translation of a time-discrete version of

84: the quasispecies model into the transfer matrix of an anisotropic

85: two-dimensional Ising model \cite{Leut}. This equivalence was further

86: exploited to study various aspects of the error threshold

87: with methods from statistical physics \cite{Leut2,Tara,FPS,FP,MT}.

88: It turns out, however, that the anisotropy of that model is not so

89: easy to handle and the analysis of the relevant biological quantities

90: (which correspond to certain surface properties of the Ising model)

91: remains an involved problem. Due to the complications of the model,

92: almost all results obtained so far are approximate or numerical. The

93: only exact result for the {\em sharply peaked landscape} \cite{Gal}

94: has been worked out via a different analogy to a model of directed

95: polymers, using the specific properties of that very special fitness

96: landscape.

97:

98: An alternative approach to the analysis of mutation-selection models

99: and the error threshold which avoids some of the problems of the

100: anisotropic Ising model has been brought up in \cite{BBW,WBG}.

101: Here, the starting point on the biological side is a slightly changed

102: model which describes the evolution of a population with overlapping

103: generations in continuous time. It turns out that, after a

104: reformulation in tensor products, the two-state version of this model

105: is equivalent to the Hamiltonian of an Ising quantum chain. Thereby,

106: the change to continuous time in the biological description

107: corresponds to the anisotropic limit that connects the

108: two-dimensional Ising model and the quantum chain in physics

109: (cf.~\cite{Kogut}). The quantum chain model is technically easier to

110: handle, and exact results for two non-trivial fitness landscapes,

111: namely Onsager's landscape and the quadratic fitness function, have

112: been worked out \cite{BBW,WBG}.

113:

114: Accordingly, we extend this latter approach to a full four-state model

115: in this study. The quantum chain analogy allows to use well-known methods

116: from statistical mechanics for the solution of the model, so that we do not

117: have to dwell on technical details here. For an extended presentation

118: of methods (with regard to the two-state model) using techniques from

119: rigorous mean field theory, we refer to \cite{Wag,WBG}. The main focus

120: is instead on the discussion of the threshold behaviour and in

121: particular the increased complexity of the phase diagram due to the

122: consideration of the four-state nature of biological information and

123: the refined schemes of molecular mutation rates.

124:

125: In the following section, we start with a presentation of the biological

126: foundations of our model. Only thereafter, we will introduce the quantum

127: chain model in Section 3. In Section 4, analytical and numerical

128: results are presented for a number of specific four-state models

129: with permutation invariant fitness landscapes. Also the properties of

130: finite sequences and the evolution dynamics will be studied.

131: We close with a summary of our results and a discussion of open

132: problems in Section 5.

133:

134: \section{Biological foundations}

135:

136: Genetic information is coded in DNA (and RNA) molecules. These are

137: heteropolymers of four units (nucleotides) which differ in a specific

138: base. The essential aspect of a DNA sequence is captured in

139: a string over a four-letter alphabet

140: \begin{equation}

141: {\bm \sigma} \in V \equiv V_1 \times V_2 \times \dots \times V_N \;;\quad

142: V_i = \{A,C,G,T\}

143: \end{equation}

144: where each letter represents a particular base: $A$ and $G$  for

145: adenine and guanine (the purins), $C$ and $T$ for cytosine and thymine

146: (the pyrimidins). In RNA sequences, $T$ is replaced by $U$ for uracil.

147: We will therefore treat the $4^N$ different sequences of a fixed,

148: finite length $N$ as our genotypes (which may be thought of as coding

149: for something, such as a virus or an enzyme). Disregarding

150: environmental effects, we may identify a collection of genotypes with

151: a {\em population} of haploid `individuals'. Evolution then describes

152: the change of the population composition in time.

153:

154: A standard model for the evolution of an infinite, asexually

155: reproducing population under the basic forces of mutation and

156: selection which works in continuous time is given by the following

157: system of non-linear differential equations \cite{CK}

158: \begin{equation} \label{paramuse}

159: \dot{p}_{\bm{\sigma}}^{}(t) =

160: \big( r_{\bm{\sigma}}^{} - \bar{r}(t)\big) p_{\bm{\sigma}}^{}(t)

161: + \sum_{\bm{\sigma'}} m_{\bm{\sigma}\bm{\sigma'}} p_{\bm{\sigma'}}(t)\;.

162: \end{equation}

163: Here, $p_{\bm{\sigma}}^{}(t)$ denotes the relative frequency of genotype

164: ${\bm \sigma}$ at time $t$ with corresponding Malthusian fitness

165: (replication rate minus death rate) $r_{\bm \sigma}^{}$, and

166: \begin{equation}

167: \bar{r}(t) = \sum_{\bm{\sigma}} r_{\bm{\sigma}} p_{\bm{\sigma}}(t)

168: \end{equation}

169: is the {\em mean fitness} of the population. It is the origin of the

170: non-linearity in (\ref{paramuse}). Finally,

171: $m_{{\bm \sigma}{\bm \sigma'}}$ is the (time independent) rate at which

172: ${\bm \sigma'}$ mutates to ${\bm \sigma}$. This framework has

173: originally been defined in classical population genetics \cite{CK}. In

174: the sequence space context, it has been introduced in \cite{B} and has been

175: called the {\it para-muse} ({\em pa}rallel {\em mu}tation-{\em se}lection)

176: model, since it assumes mutation and selection to act independently

177: and in parallel at each instant of time.

178: The model ignores recombination and genetic drift due to finite

179: population size. Both assumptions can be considered as fairly reasonable

180: at least in the context of the evolution of viruses or bacteria where

181: populations can be huge and recombination is absent, or the

182: nucleotides are tightly linked. In the following subsections, the

183: basic processes of mutation and selection shall be described in some detail.

184:

185: \subsection{Mutation}

186:

187: We take mutation as a point process acting independently on

188: all sites, ignoring more complicated mechanisms, such as

189: insertions or deletions. Molecular mutation rates shall be chosen

190: according to the following scheme, known as the {\em Kimura 3 ST

191: model} in molecular phylogeny \cite{Li,SOWH}:

192: \begin{figure}[ht]

193: \centerline{\epsfysize=27mm \epsfbox{mutation.eps}}

194: \caption{Molecular mutation scheme according to the Kimura 3 ST model.}

195: \label{mutfig}

196: \end{figure}

197:

198: Within this general setup, a number of simpler models is contained,

199: which treat mutation at different levels of sophistication. In the

200: simplest approach, the mutation rates between all four nucleotides

201: are assumed to be equal $(\mu_1 = \mu_2 = \mu_3)$. This is the

202: so-called {\em Jukes-Cantor mutation scheme}. While this simple

203: frame already seems to be sufficient for a number of applications,

204: measurements reveal that there are indeed pronounced differences in

205: the mutation rates that should be accounted for in more realistic

206: models. In particular, the {\em transitions} between the two purins

207: (A,G) and the two pyrimidins (C,T) are much more frequent than the

208: purin--pyrimidin mutations which are called {\em transversions}. This

209: may range up to relative differences of

210: $\mu_1 \approx \mu_3 \simeq \mu_2/2$ in the

211: nucleus and $\mu_1 \approx \mu_3 \simeq \mu_2/40$ in mitochondrial

212: DNA \cite{Li}. A mutation scheme with $\mu_2 > \mu_1 = \mu_3$ is known as the

213: {\em Kimura 2 parameter model}. The full {\em Kimura 3 ST} scheme,

214: finally, also accounts for the small difference between $\mu_1$ and

215: $\mu_3$, such that $\mu_2 > \mu_1 > \mu_3$.

216:

217: Implementing this mutation model into the evolution equation

218: (\ref{paramuse}), we obtain the following mutation rates between

219: genotypes ($i \in \{1,2,3\}$)

220: \begin{equation} \label{mss}

221: m_{{\bm \sigma}{\bm \sigma'}} = \left\{

222: \begin{array}{rl}

223: \mu_i, \quad & d_i({\bm \sigma},{\bm \sigma'})

224: = d_{{\bm \sigma}{\bm \sigma'}} = 1

225: \\

226: -N \sum_i \mu_i,\quad   & {\bm \sigma} = {\bm \sigma'}

227: \\

228: 0,\quad       & d_{{\bm \sigma}{\bm \sigma'}} > 1

229: \end{array} \right. \;.

230: \end{equation}

231: Here,

232: \begin{eqnarray} \nonumber

233: d_1({\bm \sigma},{\bm \sigma'}) & = &

234: \#_{A \rightleftarrows C}({\bm \sigma},{\bm \sigma'})

235: + \#_{G \rightleftarrows T}({\bm \sigma},{\bm \sigma'})

236: \\ \label{Hamming}

237: d_2({\bm \sigma},{\bm \sigma'}) & = &

238: \#_{A \rightleftarrows G} ({\bm \sigma},{\bm \sigma'})

239: + \#_{C \rightleftarrows T}({\bm \sigma},{\bm \sigma'})

240: \\ \nonumber

241: d_3({\bm \sigma},{\bm \sigma'}) & = &

242: \# _{A \rightleftarrows T}({\bm \sigma},{\bm \sigma'})

243: + \#_{C \rightleftarrows G}({\bm \sigma},{\bm \sigma'})

244: \end{eqnarray}

245: are restricted Hamming distances between ${\bm \sigma}$ and ${\bm \sigma'}$.

246: In (\ref{Hamming}), $\#_{X \rightleftarrows Y}({\bm \sigma},{\bm \sigma'})$

247: counts the positions at which $X$ and $Y$ are exchanged in $\bm{\sigma}$ and

248: $\bm{\sigma}'$. Finally,

249: \begin{equation}

250: d_{{\bm \sigma}{\bm \sigma'}} = d_1({\bm \sigma},{\bm \sigma'})

251:  + d_2({\bm \sigma},{\bm \sigma'}) + d_3({\bm \sigma},{\bm \sigma'})

252: \end{equation}

253: is the total Hamming

254: distance. Note that the choice of the diagonal term

255: $m_{{\bm \sigma}{\bm \sigma}}$ in (\ref{mss}) just accounts for

256: probability conservation ($\sum_{\bm{\sigma}}

257: \dot{p}_{\bm{\sigma}} = 0$) in the mutation part of the

258: evolution equation (\ref{paramuse}).

259:

260: \subsection{Selection and fitness landscape}

261:

262: Whereas the mutational part of the dynamics is fairly well understood

263: at least on the microscopic (molecular) level, the relation of

264: genotype and fitness, which defines the respective selective success,

265: is notoriously complex.

266: Following the standard notion in molecular evolution, we define the

267: {\em fitness function} (or {\em fitness landscape})

268: \begin{equation}

269: f: \bm{\sigma} \mapsto r_{\bm{\sigma}}

270: \end{equation}

271: as a mapping from the configuration space $V= \{A,C,G,T\}^N$ into the

272: real numbers, assigning a reproduction rate (Malthusian fitness value)

273: $r_{\bm{\sigma}}$ to each

274: genotype. Implicitly, the fitness function incorporates all the

275: complicated interactions between the sites. These interactions

276: are typically long-ranged (since RNA strands or proteins fold in three

277: dimensions), highly correlated, and give rise to rather rugged landscapes.

278: Especially in the context of RNA evolution, the construction and

279: characterization of fitness landscapes has motivated numerous studies,

280: see e.g.\ \cite{Sta} for a review.

281:

282: Below we will show how the evolution equation (\ref{paramuse}), with

283: an arbitrary choice of the fitness function, can be adapted to the

284: methods from statistical physics by a reformulation in a quantum

285: chain framework. As an application, we then present a study (including

286: analytical and numerical results) for specific examples from the class

287: of permutation invariant fitness functions. Here, due to equivalence of

288: all sites, the fitness of a given genotype is solely a function of

289: its restricted Hamming distances from the so called {\em wildtype} sequence

290: with optimal fitness which we choose as the reference genotype.

291: This particularly simple class of fitness

292: landscapes is widely used, as a canonical first approximation,

293: especially in {\em multilocus theory}. Also in the context of sequence

294: space evolution, fitness functions of this type

295: have been used in a number of studies on the two-state model

296: \cite{OB,Leut2,Tara,BBW,WBG}. To implement the approach in our

297: four-state model, we fix an arbitrary sequence, denoted by

298: $\bm{\sigma}_{++}$, as

299: the wildtype. We will only consider directional selection here towards a

300: unique genotype with optimal fitness. The fitness of any other

301: sequence is then determined by the restricted Hamming distances

302: $d_i$ relative to $\bm{\sigma}_{++}$.

303: Permutation invariance with respect to the position in the sequence

304: thus leads to a drastic reduction of dimensions. For the four-state

305: model, the effective configuration

306: space forms a tetrahedron in 3d (see Fig.~\ref{select}) and is

307: conveniently represented in Cartesian coordinates which we

308: shall call (following \cite{BBW}) the {\em surplus components}:

309: \begin{eqnarray}\nonumber

310: s_1(\bm{\sigma}) &=& 1 - \frac{2}{N}

311: \Big(d_1(\bm{\sigma},\bm{\sigma}_{++})+d_3(\bm{\sigma},\bm{\sigma}_{++})\Big)\;;

312: \\ \label{surplus}

313: s_2(\bm{\sigma}) &=& 1 - \frac{2}{N}

314: \Big(d_2(\bm{\sigma},\bm{\sigma}_{++})+d_3(\bm{\sigma},\bm{\sigma}_{++})\Big)\;;

315: \\ \nonumber

316: s_3(\bm{\sigma}) &=& 1 - \frac{2}{N}

317: \Big(d_1(\bm{\sigma},\bm{\sigma}_{++})+d_2(\bm{\sigma},\bm{\sigma}_{++})\Big)\;.

318: \end{eqnarray}

319: \begin{figure}[t]

320: \centerline{\epsfysize=50mm \epsfbox{select2.eps}}

321: \caption{Permutation invariant configuration space of the four-state

322:   model in surplus coordinates.}

323: \label{select}

324: \end{figure}

325: With this choice, any unstructured random sequence has coordinates

326: $s_i \equiv 0$ (with probability 1 in the limit $N\to \infty$).

327: Any positive value of a surplus component, on the other hand, signals a

328: non-trivial overlap of the sequence with the wildtype $\bm{\sigma}_{++}$.

329: In particular, $s_1$ measures the surplus of sites with purins or pyrimidins

330: as given in $\bm{\sigma}_{++}$ over the purin--pyrimidin mutated sites.

331:

332: Within this frame, a natural class of permutation invariant fitness

333: functions is

334: \begin{equation} \label{fit}

335: f: \bm{\sigma} \mapsto

336: r_{\bm{\sigma}} = N \sum_{i=1}^3 \left[\alpha_i^{} s_i(\bm{\sigma}) +

337: \frac{\gamma_i^{}}{2} s_i^2(\bm{\sigma}) \right]

338: \end{equation}

339: which includes the following special cases

340: \begin{itemize}

341: \item

342: Setting $\alpha_i > 0$ and $\gamma_{i} = 0$, we obtain the purely additive

343: {\em Fujiyama landscape} without genetic interactions. Here, every

344: mutation relative to the wildtype has a fixed deleterious effect,

345: independent of any other mutation that may be present in the sequence.

346: The additive landscape is a canonical zeroth-order approximation, ignoring

347: any kind of genetic interactions. In the context of sequence

348: evolution, this fitness function has been discussed e.g.~in \cite{OB,BBW}.

349: \item

350: With the choice $\alpha_i \ge - \gamma_{i} > 0$, the model

351: corresponds to a concave quadratic fitness function

352: (with directional selection) as it is frequently met

353: in multilocus theory. Due to the gene interactions, existing mutations

354: tend to aggravate further ones, which is called {\em positive epistasis}.

355: \item

356: For $\alpha_i \ge 0$ and $\gamma_i > 0$, we finally obtain a convex fitness

357: function for directional selection with long-range gene interactions and

358: {\em negative epistasis} (existing mutations tend to alleviate further

359: ones). Since we want to have $\bm{\sigma}_{++}$ as unique wildtype

360: sequence and a fitness function which is monotonous in the surplus

361: components, we restrict $f$ to the octant $s_i \ge 0$ and (smoothly)

362: truncate the fitness function by introduction of a step function

363: $\Theta(s_i)$ whenever frequencies of genotypes with $s_i < 0$ are

364: non-zero:

365: \begin{equation} \label{fit2}

366: \tilde{f}: \bm{\sigma} \mapsto

367: r_{\bm{\sigma}} = N \sum_{i=1}^3

368: \left[\left(\alpha_i^{} s_i(\bm{\sigma}) +

369: \frac{\gamma_i^{}}{2} s_i^2(\bm{\sigma}) \right)\Theta(s_i) \right]\;.

370: \end{equation}

371: \end{itemize}

372: The variables $\alpha_i$ and $\gamma_i$ may further be used to

373: distinguish between the effects of the different types of mutations

374: (as defined in Fig \ref{mutfig}) on the fitness. In this article,

375: we will present explicit results for the two following cases:

376: \begin{enumerate}

377: \item

378: For the simplest choice, $\alpha_1=\alpha_2=\alpha_3$ and

379: $\gamma_1=\gamma_2=\gamma_3$, any mutation away from the wildtype has

380: the same effect. Together with the Jukes-Cantor mutation scheme,

381: symmetry here leads to equal values of the surplus components in the

382: mutation--selection equilibrium. The model may thus also be thought

383: of as a two-state model, where any site is only regarded as occupied

384: either with a {\em wildtype} or with a {\em mutant} nucleotide.

385: In contrast to the simple two-state model of \cite{BBW}, however,

386: there is an effectively asymmetric mutation rate between wildtype

387: and mutant in the case considered here.

388: \item

389: In a more refined model, we distinguish between transitions and

390: transversions. In the mutational part, this is done by applying the

391: Kimura 2 parameter mutation scheme. In the fitness function, we take

392: into account that the deleterious effects of the transversions often

393: dominate over those of the transitions: $\alpha_1 > \alpha_{2,3}$

394: and/or $\gamma_1 > \gamma_{2,3}$.

395: \end{enumerate}

396:

397:

398: \section{Quantum chain model}

399:

400: \subsection{Symmetries}

401:

402: Since mutation is a random process that is independent of

403: the fitness values of the genotypes involved, the molecular mutation

404: scheme consequently makes no reference to fitness concepts like the

405: {\em wildtype}. Biological observables measurable from sequence data,

406: such as the surplus components (\ref{surplus}), and also the fitness

407: functions as defined in (\ref{fit}) or (\ref{fit2}), on the other

408: hand, are defined relative to the wildtype sequence. In order to set

409: up these concepts in a common framework, it is convenient to

410: reformulate also the mutational part of the evolution equation in

411: coordinates relative to the wildtype. This may always be done

412: due to certain symmetries inherent in the mutation scheme of

413: Fig.~\ref{mutfig}.

414:

415: The basic symmetry of the mutation scheme, if all three mutation rates

416: $\mu_1, \mu_2, \mu_3$ are pairwise different, is $C_2 \times C_2$

417: (Klein's 4-group), generated by two involutions. If we write the

418: operations in standard permutation notation, we can take as generators

419: the transformations

420: \begin{equation}

421: \begin{pmatrix}

422: A&C&G&T \\ C&A&T&G

423: \end{pmatrix} \quad \text{and} \quad

424: \begin{pmatrix}

425: A&C&G&T \\ G&T&A&C

426: \end{pmatrix}\;,

427: \end{equation}

428: both being the product of two transpositions. This symmetry may

429: now be exploited for a redefinition of the mutation scheme in

430: wildtype coordinates. To this end, we fix, for every site of the

431: wildtype sequence, the element of the 4-group (in the above

432: representation) with the letter of the wildtype nucleotide in the

433: first position (e.g. the string $(T,G,C,A)$ for wildtype nuceotide

434: $T$). An alternative representation of the configuration space in wildtype

435: coordinates as

436: \begin{equation}

437: {\bm \sigma} \in V^\pm \equiv V_1^\pm \times V_2^\pm

438:  \times \dots \times V_N^\pm \;;\quad

439: V_i^\pm = \{++,-+,+-,--\}

440: \end{equation}

441: is now given by the mapping, on each site, of the string of

442: labels $(++,-+,+-,--)$ to the symmetry element of 4-group defined

443: above. With this notation, the three types of mutations included in the

444: Kimura 3 ST scheme simply switch the signs of the labels:

445: $\pm\pm \to \mp\pm$ at rate $\mu_1$, $\pm\pm \to \pm\mp$ at rate

446: $\mu_2$, and $\pm\pm \to \mp\mp$ at rate $\mu_3$.

447:

448: Higher symmetries of the mutation model are obtained if mutation rates are

449: equal. For the Kimura 2 parameter scheme, $\mu_1 = \mu_3 \neq \mu_2$,

450: the operation

451: \begin{equation}

452: A \to C \to G \to T \to A \; = \;

453: \begin{pmatrix}

454: A&C&G&T \\ C&G&T&A

455: \end{pmatrix}

456: \end{equation}

457: is also a symmetry and generates a cyclic group $C_4$. Together with

458: the previous $C_2 \times C_2$, this generates a dihedral group, $D_4$,

459: with 8 elements. Finally, if $\mu_1 = \mu_2 = \mu_3$, we additionally

460: get the simple transposition $A \leftrightarrow C$

461: and have the full permutation group $S_4$ as symmetry. Note that

462: $S_4$, which corresponds to the full tetrahedral group with 24

463: elements, is also the symmetry group of the configuration space of

464: permutation invariant configurations visualized in

465: Fig.~\ref{select}. The {\em global} symmetry (with the same

466: transformation acting at each site simultaneously) of our class of

467: mutation-selection models with fitness functions according to

468: (\ref{fit}) is therefore always a subgroup of $S_4$.

469: In particular, the symmetric fitness model with $\alpha_1 = \alpha_2 =

470: \alpha_3$, $\gamma_1 = \gamma_2 = \gamma_3$, and Jukes-Cantor mutation

471: scheme possesses $C_{3v}$ symmetry, or the full tetrahedral symmetry if the

472: linear part in the fitness function vanishes ($\alpha_i = 0$).

473: The transition-transversion model finally, with $\alpha_1 >

474: \alpha_2 = \alpha_3$, or $\gamma_1 > \gamma_2 = \gamma_3$, and Kimura 2

475: parameter mutation has simple $C_2$ symmetry, or $D_4$ symmetry if

476: $\alpha_i \equiv 0$. In the latter case, the combination of

477: $\gamma_2=\gamma_3$ with $\mu_1=\mu_3$ is necessary, not a

478: misprint. Other combinations with global $D_4$ symmetry are $(\gamma_1

479: = \gamma_3; \mu_2=\mu_3)$ and $(\gamma_1=\gamma_2; \mu_1=\mu_2)$.

480:

481: \subsection{Construction}

482:

483: With the above preparations, we may now follow the lines of

484: \cite{BBW,WBG} where the two-state model is treated.

485:

486: In a first step, we represent the $4^N$-dimensional vector space in

487: which we describe the

488: genotype frequencies as the $N$-fold tensor product space

489: $W = \otimes_{j=1}^N W_j$. Hereby, the configuration space $V^\pm$ is

490: canonically embedded in $W$ by the mapping of the elements of

491: $V_i^\pm$ onto the basis vectors

492: $\{e_{j}^{++}, e_{j}^{-+}, e_{j}^{+-}, e_{j}^{--}\}$ of $W_j \simeq \RR^4$.

493: Since the nonlinear part in the differential

494: equations (\ref{paramuse}) only amounts to normalization of the

495: frequencies, a transformation to so-called

496: {\em absolute frequencies} \cite{TM,BBW}

497: \begin{equation}

498: z_{\bm \sigma}^{}(t) = p_{\bm \sigma}^{}(t) \exp\Big( \sum_{\bm \sigma'}

499: r_{\bm \sigma'}^{} \int_0^t p_{\bm \sigma'}^{}(\tau) \,d\tau \Big)

500: \end{equation}

501: then reduces the system to the linear equation

502: \begin{equation} \label{LGS}

503: \dot{z}_{\bm \sigma}^{}(t) = \big({\cal M} + {\cal R}\big)

504: z_{\bm \sigma}^{}(t)

505: \end{equation}

506: where the mutation and reproduction matrices, ${\cal M} =

507: (m_{\bm\sigma \bm\sigma'})$ and ${\cal R} = \text{diag}(r_{\bm\sigma}^{})$,

508: may now be conveniently represented in the frequency space $W$. Defining

509: \begin{equation}

510: \sigma_j^{(\alpha,\beta)} := \left(\otimes^{j-1} \Id_4 \right) \otimes

511: \left(\sigma^\alpha \otimes \sigma^\beta \right)

512: \otimes \left(\otimes^{N-j-1} \Id_4\right)

513: \end{equation}

514: where $\sigma^\alpha$, $\alpha \in \{0,x,z\}$, are the real Pauli matrices and

515: $\sigma^0 \equiv \Id_2$, we find

516: \begin{equation}

517: {\cal M} = \sum_{j=1}^N \left[ \mu_1 \sigma_j^{(x,0)} + \mu_2

518: \sigma_j^{(0,x)} + \mu_3 \sigma_j^{(x,x)} - (\mu_1+\mu_2+\mu_3) \Id\right]

519: \end{equation}

520: for the mutation matrix. The reproduction matrix ${\cal R}$ is, for a

521: general fitness landscape, an element of the algebra generated by

522: $\sigma_j^{(z,0)}$ and $\sigma_j^{(0,z)}$, $1\le j\le N$,

523: \begin{equation}

524: {\cal R} = r_0 \Id + \sum_{k,\ell = 1}^N

525: \sum_{[j_1^{} \dots j_k^{}]} \sum_{[j_1^{} \dots j_\ell^{}]}

526: \varepsilon_{[j_1^{} \dots j_k^{}],[j_1^{} \dots j_\ell^{}]}^{}

527: \prod_{m=1}^k \sigma_{j_m^{}}^{(z,0)} \prod_{n=1}^\ell

528: \sigma_{j_n^{}}^{(0,z)},

529: \end{equation}

530: where $[j_1^{} \dots j_k^{}]$ is an ordered $k$-tupel in $\{1,\dots,N\}$.

531: Now, from a physical point of view, ${\cal H} = {\cal M} + {\cal R}$

532: is (up to a global minus sign) the Hamiltonian of two coupled Ising

533: quantum chains in a tunable transverse magnetic field (the mutation)

534: and general spin-interactions within the chains.

535:

536: Translated to our quantum chain model, the fitness function of the

537: permutation invariant landscape defined in (\ref{fit}) results in a

538: (longitudinal) magnetic field and a mean field spin-interaction. We find

539: ${\cal R } = {\cal R}_\alpha + {\cal R}_\gamma$, where

540: \begin{equation}

541: {\cal R}_\alpha = \sum_{j=1}^N \left[\alpha_1 \sigma_j^{(z,0)}

542: + \alpha_2 \sigma_j^{(0,z)} + \alpha_3 \sigma_j^{(z,z)} \right]

543: \end{equation}

544: and

545: \begin{equation} \label{rgamma}

546: {\cal R}_\gamma = \frac{1}{2N} \sum_{j,k = 1}^N \left[ \gamma_1

547: \sigma_j^{(z,0)}\sigma_k^{(z,0)} + \gamma_2 \sigma_j^{(0,z)}\sigma_k^{(0,z)} +

548: \gamma_3 \sigma_j^{(z,z)}\sigma_k^{(z,z)} \right]

549: \end{equation}

550: Let us stress that, in contrast to most physical applications, the mean

551: field model is a much more natural approach in the biological

552: context where interactions are typically long-range. So, it is a

553: legitimate model here, not an inevitable approximation.

554:

555:

556: \subsection{Biological and physical observables} \label{bpo}

557:

558: In this subsection, we relate the quantities of biological interest,

559: mean and variance of the surplus components and the fitness, to the

560: physical observables. In what follows, we assume the occuring limits

561: to exist.

562:

563: \paragraph{Genotype composition}

564: According to (\ref{LGS}), the Hamiltonian of the quantum chain determines the

565: time evolution of our population of genotypes in an environment that does not

566: constrain the population size. For any genotype-independent

567: regulation of the population size, the relative genotype frequencies

568: are found by {\em statistical} normalization. We therefore define the

569: vector of the genotype composition $|\bm{p}(t) \rangle$ and the

570: equilibrium composition $|0\rangle$ as

571: \begin{equation}

572: |\bm{p}(t) \rangle =

573: \frac{\exp(t{\cal H})

574: |\bm{p}_0\rangle} {\langle \Omega|\exp(t{\cal H})|\bm{p}_0\rangle}

575: \quad ; \quad

576: |0\rangle := \lim_{t\to \infty} |\bm{p}(t) \rangle

577: \end{equation}

578: where $|\bm{p}_0\rangle$ is the initial composition and

579: $4^{-N}|\Omega\rangle$ is the equidistribution of genotypes.

580: Note that the {\em equilibrium composition} of the genotype population

581: just corresponds to the {\em ground state} of the quantum chain on

582: the physical side (with a different `biological' normalization

583: $\langle \Omega|0\rangle = 1$).

584:

585:

586: \paragraph{Fitness} The {\em density of the mean fitness} (or mean

587: fitness per site) of the population is given by the expression

588: \begin{equation}

589: w(t) := N^{-1} \bar{r}(t) =

590: N^{-1} \langle\Omega|{\cal R}|\bm{p}(t)\rangle \;.

591: \end{equation}

592: Since

593: \begin{equation}

594: w := \lim_{t \to \infty} w(t) = N^{-1} \langle \Omega| {\cal R} | 0

595: \rangle = N^{-1} \frac{\langle 0| {\cal H} |0\rangle}{\langle 0| 0

596:   \rangle}

597: \end{equation}

598: the {\em equilibrium} mean fitness (per site) is just given by the

599: (unique) largest eigenvalue of ${\cal H}$, corresponding to

600: $|0\rangle$. For an unconstrained population, $w$ also determines the

601: growth rate in the long-time limit. In the physical picture,

602: $(-w)$ is obviously just the {\em ground state energy} (per spin).

603:

604: Using ${\cal M} |\Omega\rangle = 0$, we derive for the time evolution

605: of the mean fitness

606: \begin{equation} \label{zeit}

607: \dot{w}(t) = V_r(t) + N^{-1}

608: \langle \Omega| [{\cal R},{\cal M}] | \bm{p}(t) \rangle

609: \end{equation}

610: where $V_r(t)$ is the {\em variance of fitness} (per site),

611: \begin{equation}

612: V_r(t) = \frac{1}{N}\left(\langle \Omega|{\cal R}^2|\bm{p}(t)\rangle

613: - \langle \Omega|{\cal R}|\bm{p}(t)\rangle^2 \right)\;.

614: \end{equation}

615: In the absence of mutation, (\ref{zeit}) is of course just a special case

616: of Fisher's ``Fundamental Theorem of Natural Selection'' \cite{Fish} which

617: states that the rate of increase in fitness is equal to the genetic

618: variance in fitness. For the mutation-selection models considered

619: here, the relation has the following intuitive interpretation:

620: The change in mean fitness is driven by two independent forces. The

621: first one stems from the change of genotype frequencies due to

622: selection and is proportional to the variance of fitness values

623: present in the population. Since variances are positive, it always

624: tends to increase fitness. The second term on the right hand side of

625: (\ref{zeit}) typically decreases fitness. It measures the population

626: mean of the change in fitness at time $t$ due to the action of mutation.

627: In mutation-selection equilibrium, both terms balance, and the entire

628: residual variance is due to mutation.

629:

630: \paragraph{Surplus} Another quantity that characterizes the genetic

631: order of the population, as it may be measured from sequence data, is

632: the {\em mean surplus}. We define, following and generalizing \cite{BBW},

633: \begin{equation}

634: u_i(t) = \sum_{\bm{\sigma}} s_i(\bm{\sigma}) p_{\bm{\sigma}}^{}(t)

635: \quad ; \quad

636: u_i = \lim_{t \to \infty} u_i(t) \;.

637: \end{equation}

638: In particular,

639: \begin{equation}

640: \#_m(t) := \frac{1}{4} \big(3 - (u_1(t)+u_2(t)+u_3(t))\big)

641: \end{equation}

642: measures the mean number of mutations per site relative to the wildtype while

643: \begin{equation}

644: \#_{tr}(t) := \frac{1}{2} \big( 1 - u_1(t) \big)

645: \end{equation}

646: denotes the mean number of transversions alone.

647: As a {\em biological order parameter}, the mean surplus plays a

648: similar r{\^o}le as the physical magnetization. However, as already

649: noted in \cite{BBW2}, both quantities are quite distinct and in many

650: cases not even easily related. In the language of the quantum chain,

651: the equilibrium mean surplus may be derived as

652: \begin{equation}

653: u_1 = \frac{\langle \Omega|\sum_i\sigma_i^{(z,0)}|0\rangle}{N}

654: \quad ;\quad

655: u_2 = \frac{\langle \Omega|\sum_i\sigma_i^{(0,z)}|0\rangle}{N}

656: \quad ;\quad

657: u_3 = \frac{\langle \Omega|\sum_i\sigma_i^{(z,z)}|0\rangle}{N}

658: \; ,

659: \end{equation}

660: whereas the three-component magnetization is defined as the ground

661: state expectation value

662: \begin{equation}

663: m_1 = \frac{\langle 0|\sum_i\sigma_i^{(z,0)}|0\rangle}

664: {N \langle 0|0\rangle} \quad;\quad

665: m_2 = \frac{\langle 0|\sum_i\sigma_i^{(0,z)}|0\rangle}

666: {N \langle 0|0\rangle} \quad ;\quad

667: m_3 = \frac{\langle 0|\sum_i\sigma_i^{(z,z)}|0\rangle}

668: {N \langle 0|0\rangle} \; .

669: \end{equation}

670: As we will show below, magnetization and surplus can show rather

671: different behaviour especially near phase transitions. The biological

672: and physical phase diagrams, however, coincide if phase transitions

673: (or error thresholds) are defined as nonanalyticity points of the

674: ground state energy (or mean fitness) $w$ in the thermodynamic limit

675: (cf.~the discussion in Section 5).

676:

677: \section{Results}

678:

679: \subsection{Fujiyama model}

680:

681: As in the two-letter case \cite{BBW}, the quantum chain model

682: decomposes into non-interacting one-site Hamiltonians for the

683: additive landscape. The mean fitness and its variance are linear

684: functions in the surplus components. In particular, we obtain from

685: (\ref{zeit})

686: \begin{equation}

687: V_r(t) = \dot{w}(t) + 2\big(

688: (\mu_1 +\mu_3) \alpha_1 u_1(t)

689: + (\mu_2 +\mu_3) \alpha_2 u_2(t) + (\mu_1 +\mu_2) \alpha_3 u_3(t)\big)

690: \;.

691: \end{equation}

692: For Jukes-Cantor mutation, $\mu_1 = \mu_2 = \mu_3 \equiv \mu$, this reduces to

693: \begin{equation}

694: V_r(t) = \left(4 \mu + \frac{\text{d}}{\text{d}t}\right) w(t)

695: \end{equation}

696: and $V_r$ is proportional to the mean fitness in the

697: mutation--selection equilibrium. Exact results are easily

698: found from the solution of the four-dimensional eigenvalue problem of

699: the one-site Hamiltonian. We only give the expression for the mean

700: fitness in the symmetric case, $\alpha_1 = \alpha_2 = \alpha_3 \equiv \alpha$

701: with Jukes-Cantor mutation scheme ($\mu_1 = \mu_2 = \mu_3 \equiv \mu$):

702: \begin{equation}

703: w(t) =

704: \frac{\exp[2t(\alpha+\mu)]\cosh[2tQ]\left(\alpha-2\mu+2Q\tanh[2tQ]\right)

705: -\alpha-4\mu}{1+\exp[2t(\alpha+\mu)]\cosh[2tQ]}

706: \end{equation}

707: where

708: \begin{equation}

709: Q = \sqrt{\mu^2+\alpha^2 -\alpha\mu}

710: \end{equation}

711: and the equidistribution of genotypes is chosen as starting configuration.

712:

713: Means and variances of the fitness and the surplus in

714: mutation--selection balance are shown in Fig.~\ref{finite} below.

715: A plot of the time evolution of fitness is given in Fig.~\ref{time2}.

716: There is clearly no phase transition (resp.~no {\em error threshold}

717: behaviour) for the additive Fujiyama landscape, as expected in view of

718: the complete absence of interactions (resp.\ epistasis).

719:

720:

721: %\begin{equation}

722: %w = \alpha \left(2\sqrt{\left(\frac{\mu}{\alpha}\right)^2 -

723: %\frac{\mu}{\alpha} + 1} - 2\frac{\mu}{\alpha} +1\right)

724: %\end{equation}

725:

726:

727: \subsection{Quadratic fitness model: Equilibrium results}

728:

729: In contrast to the additive case, no simple relation between surplus

730: and fitness is known in the case of the quadratic landscape as

731: long as $t$ or $N$ are kept finite. However, due to the permutation

732: invariance of the Hamiltonian, the individual fitness--surplus

733: relation (\ref{fit}) is recovered in the thermodynamic limit

734: for the corresponding mean values of the equilibrium population.

735: We obtain in analogy to \cite{BBW2}:

736: \begin{equation} \label{surrel}

737: w = \lim_{t \to \infty} w(t) = \sum_{i=1}^3 \left(\alpha_i u_i

738: + \frac{\gamma_i}{2} u_i^2 \right)

739: \end{equation}

740: and, from (\ref{zeit}), for the equilibrium variance of fitness per site

741: \begin{multline} \label{variance}

742: V_r = \lim_{t \to \infty} V_r(t) =

743: 2(\mu_1+\mu_3)\left(\alpha_1 u_1 + \gamma_1 u_1^2\right) +

744: \\

745: 2(\mu_2+\mu_3)\left(\alpha_2 u_2 + \gamma_2 u_2^2\right) +

746: 2(\mu_1+\mu_2)\left(\alpha_3 u_3 + \gamma_3 u_3^2\right)\;.

747: \end{multline}

748: %\begin{equation}

749: %\textswab{h}= \mu_1\sigma^{(x,0)}+\mu_2 \sigma^{(0,x)} +\mu_3 \sigma^{(x,x)} +%\gamma_1 m_1 \sigma^{(z,0)} + \gamma_2 m_2 \sigma^{(0,z)} + \gamma_3 m_3

750: %\sigma^{(z,z)}

751: %\end{equation}

752: The key to the solution in the thermodynamic limit is now the minimum

753: principle of the physical free energy which translates to a maximum

754: principle for the equilibrium mean fitness. Maximizing

755: \begin{equation}

756: \langle \bm{x} | {\cal M} + {\cal R} | \bm{x} \rangle -

757: w \big(\langle \bm{x} |\bm{x}  \rangle -1\big)

758: \end{equation}

759: with respect to $w$ and $\bm{x}$, we obtain, taking permutation symmetry of

760: $\bm{x}$ into account, the following variational expression for $w$:

761: \begin{equation} \label{fitm}

762: \begin{align} \nonumber

763: w(\bm{\alpha},\bm{\mu}&,\bm{\gamma}) \;\; =

764: \sup_{m_1,m_2,m_3} \bigg[\alpha_1 m_1 +

765: \alpha_2 m_2 + \alpha_3 m_3 + \frac{\gamma_1}{2} m_1^2 +

766: \frac{\gamma_2}{2} m_2^2 + \frac{\gamma_3}{2} m_3^2 +

767: \\ \nonumber

768: &\frac{\mu_1}{2}

769: \left(\sqrt{(1+m_2)^2-(m_1+m_3)^2}+\sqrt{(1-m_2)^2-(m_1-m_3)^2}-2\right)+

770: \\ \nonumber

771: &\frac{\mu_2}{2}

772: \left(\sqrt{(1+m_1)^2-(m_2+m_3)^2}+\sqrt{(1-m_1)^2-(m_2-m_3)^2}-2\right)+

773: \\

774: &\frac{\mu_3}{2}

775: \left(\sqrt{(1+m_3)^2-(m_1+m_2)^2}+\sqrt{(1-m_3)^2-(m_1-m_2)^2}-2

776: \right)\bigg]

777: \end{align}

778: \end{equation}

779: where $m_i \in [-1,1]$ are the components of the physical

780: magnetization. Let us stress that, from the biological point of view,

781: the translation to the physical framework seems a necessary technical

782: step since we do not know of any variational principle for the

783: biological model which works directly in $L^1$. We now take a closer

784: look at two special cases.

785:

786: \paragraph{Symmetric fitness model} For the symmetric

787: {\em wildtype--mutant} model with $\alpha_i \equiv \alpha$,

788: $\gamma_i \equiv \gamma$ and Jukes-Cantor mutation rate $\mu$,

789: all components of the order parameters are equal,

790: $m_i \equiv m$ and $u_i \equiv u$, respectively.

791: Here, the variational expression (\ref{fitm}) for $w$ leads to the

792: following self-consistency condition for $m$:

793: \begin{equation} \label{sc}

794: m = \frac{1}{3}\left[ 1 + \frac{2(\alpha + \gamma m) - \mu}

795: {\sqrt{(\alpha + \gamma m)^2 - \mu(\alpha + \gamma m) + \mu^2}}\right]\;.

796: \end{equation}

797: This is a quartic equation in $m$ and can be solved using the

798: standard formulas. However, since the explicit solution is rather

799: lengthly, we do not include it here, but give a qualitative

800: discussion instead.

801:

802: Obviously, the relation has a unique real solution for any $\alpha$ and

803: $\mu$ whenever $\gamma$ is {\em negative}. Like in the case of the

804: two-state model, we thus obtain no phase transition for positive

805: epistasis. In the following, we therefore concentrate our discussion

806: on positive $\gamma$ (or negative epistasis). Note that, for

807: calculations in the thermodynamic limit, always the fitness function $f$

808: (\ref{fit}), and hence the reproduction matrix ${\cal R_\gamma}$

809: (\ref{rgamma}), can be used instead of the truncated form $\tilde{f}$

810: (\ref{fit2}), since the frequencies of genotypes with negative surplus

811: vanish. For $\alpha_i \equiv 0$, this is due to spontaneous breaking of

812: the extra $C_2 \times C_2$ symmetry of

813: ${\cal H} = {\cal M} + {\cal R_\gamma}$.

814:

815: In contrast to the two-state model, where a phase transition in the

816: thermodynamic limit is only found for zero external field, it turns

817: out that the present model has phase transitions for a whole range of

818: the linear fitness parameter $\alpha$ when epistasis is negative:

819: For $\tilde{\alpha} := \alpha/\gamma$ in the interval

820: \begin{equation}

821: 0 \le \tilde{\alpha} < \frac{1}{3}

822: \left(\sqrt{\frac{4}{3}}-1\right) \simeq 0.0515668

823: \end{equation}

824: we find a first order phase transition of the system at

825: \begin{equation}

826: \tilde{\mu} := \frac{\mu}{\gamma} = \tilde{\mu}_c = \frac{2}{3}

827:  + 2 \tilde{\alpha}

828: \end{equation}

829: with a finite jump in the magnetization from $m_+$ to $m_-$ where

830: \begin{equation}

831: m_\pm = \frac{1}{3}\left(1 \pm

832: \sqrt{1 - 27 \tilde{\alpha}^2 - 18\tilde{\alpha}}\right)\;.

833: \end{equation}

834: From $m$ we derive the mean fitness $w$ using (\ref{fitm}), from $w$

835: we obtain the surplus $u$ via (\ref{surrel}) and, finally, the variance of

836: the fitness $V_r = 12\mu(\alpha u +\gamma u^2)$.

837: Looking at the surplus $u$, we also find a phase transition at

838: $\tilde{\mu}= \tilde{\mu}_c$. As $m$, it vanishes in the disordered

839: phase for $\alpha = 0$. Note however that, since $w$ is continuous,

840: due to the relation (\ref{surrel}), also the surplus is continuous at a phase

841: transition. In \cite{BBW2} it has been shown that these differences of the

842: biological and physical order parameters arise with the change from classical

843: to quantum mechanical probabilities (resp.\ the change from $L^1$ to $L^2$)

844: in translating the biological model into the physical one. We remark

845: that a different, discontinuous behaviour of the biological order

846: parameter at a (physical) first order transition has been observed for

847: the sharply peaked landscape in Eigen's quasispecies model \cite{FP}.

848: Mean fitness and its variance, magnetization, and surplus for different

849: values of $\alpha$ are shown below in Fig.~\ref{JC}.

850:

851: \begin{figure}[th]

852: \centerline{\epsfxsize=65mm \epsfysize=55mm \epsfbox{symfit.ps}

853: \epsfxsize=65mm  \epsfysize=55mm \epsfbox{symvarfit.ps}}

854: \centerline{\epsfxsize=65mm \epsfysize=55mm \epsfbox{symsur.ps}

855: \epsfxsize=65mm  \epsfysize=55mm \epsfbox{symmag.ps}}

856: \caption{Mean fitness and its variance, surplus and magnetization in

857:   the symmetric fitness model for various linear parts of the fitness

858:   function in the infinite sites limit.}

859: \label{JC}

860: \end{figure}

861:

862:

863:

864: \paragraph{Transition--transversion model} In our second example, we

865: wish to distinguish mutations between like and unlike nucleotides. In

866: a first step, we retain the symmetric fitness landscape

867: $\gamma_1 = \gamma_2 = \gamma_3 \equiv \gamma$ (for simplicity

868: with vanishing linear part $\alpha = 0$), but let the relative

869: frequencies of transitions and transversions differ by assuming the

870: {\em Kimura 2 parameter} mutation scheme,

871: $\mu_1 = \mu_3 \equiv \mu \neq \mu_2$.

872:

873: \begin{figure}[ht]

874: \centerline{\epsfysize=60mm \epsfbox{nor1.ps}}

875: \caption{Phase diagram of the transition--transversion model with

876: with symmetric fitness landscape and Kimura 2 parameter mutation

877: scheme. Solid and dotted lines correspond to first and second order

878: phase transitions, respectively. The dashed line indicates the

879: Jukes-Cantor mutation scheme.}

880: \label{pd1}

881: \end{figure}

882: In the extended parameter space of the reduced mutation rates

883: $\tilde{\mu} = \mu/\gamma$; $\tilde{\mu}_2 = \mu_2/\gamma$, we now

884: obtain a phase diagram with {\em three} distinct phases

885: (see Fig.~\ref{pd1}).

886: \begin{itemize}

887: \item

888: For $\tilde{\mu}$ and $\tilde{\mu}_2$ sufficiently small,

889: all three surplus components

890: are positive, indicating genetic order with respect to the entire

891: 4-letter alphabet of the nucleotides: {\em ACGT phase}.

892: \item

893: If we increase the mutation rate $\tilde{\mu}_2$ for low $\tilde{\mu}$,

894: the system crosses over to a phase which does no longer distinguish

895: between the different kinds of purins (A,G) and pyrimidins (C,T), but

896: is still ordered with respect to transversions. This is the limiting

897: case described by the two-state model. We call this the {\em PP phase}.

898: \item

899: For higher mutation rates $\tilde{\mu},\tilde{\mu}_2$, we finally enter a

900: completely {\em disordered phase} with vanishing fitness and surplus.

901: \end{itemize}

902: In a second step, we now also let the mutation effects of transitions

903: and transversions differ and assume a fitness landscape

904: with $\gamma_2 = \gamma_3 \equiv \gamma$, but $\gamma_1 \neq \gamma$

905: in general. The changes in the phase diagram for increasing

906: $\tilde{\gamma}_1 = \gamma_1/\gamma$ are shown in Fig.~\ref{pd2}.

907: The phase transitions between the three phases may be first or second

908: order. In general, we obtain the following phase space structure:

909: \begin{itemize}

910: \item

911: Phase transitions between the disordered and PP phase are second order and

912: located on the line $\tilde{\mu} = \tilde{\gamma}_1/2$. This phase

913: transition corresponds to the one also seen in the two-state model \cite{BBW}.

914: \item

915: The phase transition line between the ACGT and PP phases in

916: general changes from first to second order with increasing

917: mutation rate $\tilde\mu_2$ (see Figs.~\ref{pd1}, \ref{pd2}).

918: For the second order transitions we derive, on

919: expanding (\ref{fitm}) to lowest order in $m_2 = m_3$,

920: \begin{equation}

921: \mu = \frac{\gamma_1}{\gamma_1 + 2\gamma}

922: \sqrt{(\gamma_1 + \mu_2)(2\gamma-\mu_2)} \;.

923: \end{equation}

924: Numerically, we find that the first order transitions are

925: located on a straight

926: line up to $\tilde{\mu} = \tilde{\gamma}_1/2$ where the PP phase

927: changes into the disordered phase. The $\tilde{\mu}_2$-interval of

928: first-order transitions decreases for increasing $\tilde{\gamma}_1$.

929: For $\tilde{\gamma}_1 \gtrapprox 8.45$, all phase transitions

930: between the ACGT and PP phases are second order.

931: \item

932: Finally, for $\tilde{\gamma}_1 \le 4$, there are direct first order

933: phase transitions between the ACGT phase and the disordered phase

934: (for $\tilde{\mu}_2$ sufficiently small). For higher values of

935: $\tilde{\gamma}_1$, these two phases are separated by the PP phase.

936: \end{itemize}

937:

938: \begin{figure}[ht]

939: \centerline{\epsfxsize=43mm\epsfysize=35mm \epsfbox{pdg2.ps}

940: \epsfxsize=43mm\epsfysize=35mm \epsfbox{pdg4.ps}

941: \epsfxsize=43mm\epsfysize=35mm \epsfbox{pdg10.ps}}

942: \caption{Phase diagrams for anisotropic fitness landscapes $\gamma_1 >

943: \gamma_2 = \gamma_3 \equiv \gamma$ and Kimura 2 parameter mutation

944: scheme. Solid and dotted lines correspond to first and second order

945: phase transitions, respectively.}

946: \label{pd2}

947: \end{figure}

948: As for the symmetric fitness function discussed above, there are no

949: compact analytic expressions for the fitness or the surplus in the

950: ACGT phase. In the PP phase, however, the following values for

951: the mean fitness and the non-zero components of the mean surplus and the

952: magnetization are found:

953: \begin{equation}

954: w = \frac{\gamma_1}{2} \left(1 - \frac{2\mu}{\gamma_1}\right)^2  \quad ; \quad

955: u_1 = 1 - \frac{2\mu}{\gamma_1} \quad ; \quad

956: m_1 = \sqrt{1- \left(\frac{2\mu}{\gamma_1}\right)^2}\;.

957: \end{equation}

958: The variance in fitness per site, finally, is proportional to the mean

959: fitness in the PP phase: $V_r = 8 \mu w$. Note that all these

960: expressions are independent of the transition rate $\mu_2$ and

961: directly comparable to the results of the two-state model

962: \cite{BBW,WBG} by idebtifying $\{++,+-\}$ with `$+$' and $\{-+,--\}$

963: with `$-$'.

964:

965:

966: \subsection{Quadratic fitness model: Finite sequence length} \label{fs}

967:

968: For the Fujiyama model with independent sites, all the quantities

969: calculated here, means and variances per site in infinite populations,

970: are independent of the assumed length $N$ of the sequences.

971: This is no longer the case for models including epistasis. In this

972: subsection, we therefore present a quick numerical investigation of the

973: symmetric fitness model

974: for finite system sizes and compare the results with those in the

975: thermodynamic limit. Since the frequencies of genotypes with negative

976: values of the surplus no longer vanish for finite sequences, we use

977: the truncated fitness function (\ref{fit2}), with $\gamma_i \equiv

978: \gamma > 0$ and $\alpha_i = 0$ for our calculations.

979:

980: All results are obtained by direct numerical solution of the eigenvalue

981: problem in the $[(N+1)(N+2)(N+3)/6]$-dimensional vector space of

982: permutation invariant population vectors. Numerically precise

983: calculations have been performed up to $N = 60$ (39711-dim.), the results

984: are shown in Fig.~\ref{finite}. It is seen that the mean surplus and

985: the mean and the variance of the fitness rapidly approach the limiting

986: curves and behave qualitatively different from the Fujiyama model

987: even for very small system sizes. We also show the finite-size

988: behaviour of the variance of the surplus $V_s$. Since this quantity

989: vanishes as $1/N$, it is not obtainable from the leading order terms

990: in the thermodynamic

991: limit. In our finite size calculations, we rescale $V_s$ with the

992: sequence length to obtain comparable results. Whereas $V_s$ is

993: monotonously increasing for the additive model (where $N V_s = 1-

994: u^2$), it runs through a maximum for quadratic fitness. Note that this

995: maximum, in contrast to the variance of fitness, is located directly

996: at the error threshold. The behaviour is qualitatively similar to the

997: two-state model \cite{Oli}.

998:

999: \begin{figure}[ht]

1000: \centerline{\epsfxsize=65mm \epsfysize=55mm \epsfbox{fit2.ps}

1001: \epsfxsize=65mm  \epsfysize=55mm \epsfbox{varfit.ps}}

1002: \centerline{\epsfxsize=65mm \epsfysize=55mm \epsfbox{sur.ps}

1003: \epsfxsize=65mm  \epsfysize=55mm \epsfbox{varsur.ps}}

1004: \caption{Equilibrium behaviour of fitness and surplus of the symmetric

1005:   fitness model with finite sequence length. Results for the Fujiyama

1006:   model with scaling $\alpha = \gamma/2$ are also shown.}

1007: \label{finite}

1008: \end{figure}

1009: Since there has been some discussion recently on the correct scaling

1010: of fitness values and mutation rates with the length of the sequence (cf

1011: \cite{FP,BG}), let us finally remark that the finite size results in

1012: this and the next section show that our choice, keeping fitness and

1013: mutation rate {\em per site} fixed, is adequate for all quantities

1014: considered here.

1015:

1016:

1017: \subsection{Quadratic fitness model: Time evolution}

1018:

1019: Originally, the error threshold has been defined as an equilibrium

1020: phenomenon (cf \cite{ECS,BG}): For special forms of the fitness

1021: landscape, there is a finite critical value $\mu_c$ of the mutation

1022: rate beyond which genetic order is no longer maintained by selection.

1023: For the four-state model with quadratic fitness, this situation has been

1024: discussed above.

1025: However, for a suitable fitness function, the threshold

1026: is not necessarily connected with high mutation rates.

1027: In this subsection,

1028: we consider the relaxation of a non-equilibrium population to

1029: mutation-selection balance. It turns out that, depending on the

1030: starting configuration, an even stronger threshold effect may be

1031: observed in the time evolution of the fitness and the surplus for

1032: all mutation rates below the critical equilibrium value.

1033:

1034: \paragraph{Zero-mutation limit of the transition-transversion model}

1035: The essence of the threshold phenomenon in the time evolution is

1036: already contained in the selection dynamics alone. In a first step, we

1037: therefore disregard mutation altogether by working in the

1038: zero-mutation limit. Obviously, we then deal with a classical

1039: mean-field model on the physical side. As our starting configuration,

1040: we choose the completely unstructured population with an equidistribution of

1041: genotypes $|\bm{p}_0\rangle = 4^{-N}|\Omega\rangle$.

1042: In this particular situation, some progress is possible also

1043: analytically. Noting that

1044: \begin{equation}

1045: \langle \hat{C} \rangle(t) =

1046: \frac{\langle \Omega|\hat{C} \exp(t {\cal

1047:     R})|\Omega\rangle}{\langle \Omega|\exp(t {\cal R})|\Omega\rangle}

1048:  = \frac{\text{tr}(\hat{C} \exp(t {\cal R}))}{\text{tr}(\exp(t {\cal R}))}

1049: \end{equation}

1050: for any element $\hat{C}$ of the algebra generated by

1051: $\{\sigma_i^{(z,0)},\sigma_i^{(0,z)}\}$, the biological and physical

1052: pictures coincide in this case. Using the fitness function

1053: of the transition-transversion model with

1054: $\gamma_2 = \gamma_3 \equiv \gamma > 0$, we obtain the

1055: following implicit equations for the time evolution of the surplus

1056: components:

1057: \begin{eqnarray}

1058: u &=& \frac{\sinh(2\gamma t u)} {\cosh(2\gamma t u) +

1059: \exp[ -2\gamma_1 t(2u\coth(2\gamma t u) -1)]}

1060: \\[1mm]

1061: u_1 &=& \frac{\cosh[\gamma t Q(u_1)] - \exp(-2\gamma_1 t u_1)}

1062: {\cosh[\gamma t Q(u_1)] + \exp(-2\gamma_1 t u_1)}

1063: \end{eqnarray}

1064: where

1065: \begin{equation}

1066: Q(u_1) = \sqrt{(1+u_1)^2-\exp(4\gamma_1 t u_1)(1-u_1)^2}\;.

1067: \end{equation}

1068: The resulting dynamical phase diagram is shown in Fig.~\ref{time1}.

1069: As in the equilibrium situation, there are three phases.

1070: Depending on the ratio $\tilde{\gamma}_1 = \gamma_1/\gamma$, the

1071: system directly crosses to an ordered phase after a sharply defined

1072: waiting time $t_c$, or performs two consecutive transitions, entering

1073: the PP phase in the first one.

1074:

1075: \begin{figure}[ht]

1076: \centerline{\epsfxsize=65mm \epsfysize=55mm \epsfbox{zeitpd.ps}

1077: \epsfxsize=65mm \epsfysize=55mm \epsfbox{surzg2.ps}}

1078: \caption{Dynamical phase diagram of the transition-transversion model

1079:   for vanishing mutation starting from the equidistribution. (Solid:

1080:   first order; dashed: second order transition). Right: Time

1081:   evolution of the surplus components for $\tilde{\gamma}_1 = 2$.}

1082: \label{time1}

1083: \end{figure}

1084: As in the equilibrium phase diagram, the dynamical transitions may

1085: be of first or second order.

1086: \begin{itemize}

1087: \item

1088: Second order transitions are located at

1089: $\tilde{t} = \gamma t = 1$ for $\tilde{\gamma} \le 1/4$ and at

1090: $\tilde{t} = 1/\tilde{\gamma}_1$ for the transition from the

1091: disordered phase to the PP phase. The transition from the PP phase to

1092: the ACGT phase is second order above $\tilde{\gamma}_1 \approx 1.9009$

1093: and implicitly given through $2\tilde{t}_c = 1 +

1094: \exp[2\tilde{\gamma}_1(\tilde{t}_c - 1)]$. A similar second order

1095: transition (with a one-component order parameter) has also been

1096: observed in the two-state model \cite{Wag,WBG}.

1097: \item

1098: In an interval around the symmetry point $\gamma_1 = \gamma$, the

1099: system possesses a first order transition (in the sense that there is a

1100: finite jump in the magnetization). Note that, in contrast to the

1101: equilibrium

1102: case, also the surplus and even the mean fitness are discontinous on

1103: this line, giving rise to a rather pronounced threshold effect in the

1104: evolution dynamics (cf.\ the solid line in Fig.~\ref{time2}

1105: for $\tilde{\gamma} = 1$).

1106: \end{itemize}

1107: As for the equilibrium values, we also consider the effect of finite

1108: sequence lengths on the time evolution. Again, calculations are

1109: performed by direct diagonalization of the symmetric fitness model

1110: ($\tilde{\gamma} = 1$). Fig.~\ref{time2} shows how the jump

1111: discontinouity in the mean fitness (internal energy) and the

1112: delta function singularity in the variance of the fitness (specific heat)

1113: are approached by the finite systems. A threshold phenomenon is absent

1114: in the time evolution of the Fujiyama model which is also shown

1115: in Fig.~\ref{time2}.

1116: \begin{figure}[ht]

1117: \centerline{\epsfxsize=65mm \epsfysize=55mm \epsfbox{fitzh0.ps}

1118: \epsfxsize=65mm  \epsfysize=55mm \epsfbox{varfitzh0.ps}}

1119: \caption{Time evolution of the equidistribution of genotypes

1120:   in the zero mutation-limit of the symmetric fitness model for different

1121:   sequence lengths.}

1122: \label{time2}

1123: \end{figure}

1124:

1125:

1126:

1127:

1128: %\begin{figure}[ht]

1129: %\centerline{\epsfxsize=60mm \epsfysize=50mm \epsfbox{zvarfit1.ps}

1130: %\epsfxsize=60mm  \epsfysize=50mm \epsfbox{zvarfit2.ps}}

1131: %\caption{Time evolution of the symetric fitness model for different

1132: %  starting configurations.}

1133: %\label{time1}

1134: %\end{figure}

1135:

1136:

1137: \paragraph{Finite mutation rates and different starting configurations}

1138: In a last step, we now discuss the influence of the mutation rate and

1139: the starting configuration on the evolution dynamics. Consider first the

1140: time evolution of the equilibrium distribution of genotypes

1141: $4^{-N}|\Omega\rangle$. Although no analytical results are available here,

1142: we may give the following intuitive argument that there is a phase

1143: transition at finite $t = t_c$ for any mutation rate below the

1144: critical equilibrium mutation rate $\mu_c$: Since mutation alone tries to

1145: keep the population in the equilibrium distribution, the evolution

1146: dynamics will be slowed down by mutation for small $t$. In particular,

1147: mean fitness and surplus will remain zero on a finite interval at

1148: least up to the threshold value of the corresponding model with

1149: vanishing mutation. On the other hand, the limiting values of $w$ and

1150: $u$ are finite for $\mu < \mu_c$, giving rise to a non-analytical

1151: point of $w(t)$ and $u(t)$ at some finite $t = t_c$. As shown in the

1152: upper graph of Fig.~\ref{time3}, this behaviour is clearly visible in

1153: numerical results for finite sequence sizes.

1154: \begin{figure}[ht]

1155: \centerline{\epsfxsize=130mm \epsfysize=45mm \epsfbox{zvarfit1a.ps}}

1156: \vspace*{-12mm}

1157: \centerline{\epsfxsize=130mm  \epsfysize=90mm \epsfbox{zvarfit2b.ps}}

1158: \caption{Time evolution of the variance of the fitness in the symmetric

1159: fitness model with sequence length $N=60$. Results are shown for

1160: varying mutation rates and two different starting configurations.}

1161: \label{time3}

1162: \end{figure}

1163:

1164: In order to contrast the time evolution of the unstructured population with

1165: an equidistribution of genotypes as starting configuration, we have

1166: also performed calculations for the opposite case of a population with

1167: initially homogeneous phenotypes. Here, at $t=0$, any "individual"

1168: in the population has the same value $s_i = 0$ for the three surplus

1169: components. The result (for finite sequence length $N=60$) is shown

1170: in the lower viewgraph of Fig.~\ref{time3}. As for the

1171: equidistribution of

1172: genotypes, there is a clear threshold effect in the time evolution for

1173: any finite value $0<\mu<\mu_c$ of the mutation rate. The transition

1174: appears to be particularly sharp for small mutation rates. In contrast to the

1175: unstructured case, the critical waiting time $t_c$ for the transition

1176: is no longer monotonously increasing with the mutation rate $\mu$, but

1177: is separated in two regimes: For mutation rates near the equilibrium

1178: threshold value $\mu_c$, the situation is similar to the unstructured

1179: case: Here, single mutants with higher fitness appear in the

1180: population after a short while. Due to the continuing mutation

1181: pressure, however, a certain time is needed for these fitter

1182: individuals to grow to a finite proportion and to dominate the mean

1183: values in the infinite population. For small $\mu$, on the other hand,

1184: the critical waiting time $t_c$ is dominated by the time needed for

1185: mutation to explore the configuration space and to generate

1186: individuals with higher fitness at a sufficient rate.

1187:

1188:

1189: \section{Discussion}

1190:

1191: When in \cite{BBW} a class of models for sequence space evolution was

1192: introduced, using the framework of Ising quantum chains, the calculations

1193: started with four major simplifications of the biological situation.

1194: These are the consideration of a two-state model, the assumption of an

1195: infinite sequence length, the use of simplistic fitness landscapes,

1196: and the restriction on infinite population sizes. In this paper, we

1197: have looked at the first two of these simplifying assumptions.

1198: Finally, an extended discussion of the evolution dynamics of these

1199: models has also been presented. In the following paragraphs, we give

1200: a summary of our findings and an outlook on the remaining open problems.

1201:

1202: \paragraph{Two-state versus four-state models.}

1203: The main concern of this contribution is the generalization of the

1204: modelling framework, introduced in \cite{BBW}, to four states

1205: (corresponding to the four nucleotides) on each site. The

1206: generalization presented makes use of the $C_2 \times C_2$ symmetry

1207: inherent in the {\em Kimura 3 ST} mutation scheme. On the `physical

1208: side' this leads to a model of two coupled Ising quantum chains

1209: (rather than to a four-state Potts model). Compared with the two-state

1210: model, the extension can be thought of as consisting of two steps. In

1211: a first step, we represent the four states on each site by the spin

1212: values of two spins in decoupled chains. Note that already in this

1213: simplified model three phases occur in the phase diagram since the

1214: transition lines of the two decoupled chains will not in general

1215: coincide. The second step consists of the introduction of

1216: a more realistic mutation scheme which also changes the configuration

1217: space topology and the corresponding use of a refined fitness landscape.

1218: Both these extensions lead to a coupling of the chains, and an even

1219: richer phase space structure is found, including first-order transitions.

1220: As may be seen from the introduction of a small linear field term into the

1221: fitness function in subsection 4.2, this change of the transition to

1222: first order leads to an increased robustness of the threshold

1223: phenomena with respect to symmetry-breaking perturbations.

1224:

1225: \paragraph{Finite sequence length.}

1226: Typical sequence lengths of enzymes or viruses are of the order $10^3$

1227: -- $10^4$. While these numbers are certainly far off the typical sizes of

1228: macroscopic systems in physics, they are, in principle, large enough

1229: to successfully supress $1/N$-corrections. However, especially models

1230: with simple fitness landscapes describe -- at best -- the evolution

1231: dynamics in a very restricted configuration space of particularly

1232: `important' sites, disregarding neutral or altogether lethal

1233: mutations. In view of this fact, consideration of finite sequence

1234: lengths is indispensible and calculations in the thermodynamic

1235: limit even seem to be questionable at first sight. In order to clarify

1236: the usefulness of infinite-size methods in this context, we performed

1237: a number of numerical calculations for finite sequence lengths. The

1238: results are quite encouraging. As shown in subsection \ref{fs}, the

1239: characteristic properties of the thermodynamic limit are well visible

1240: even for tiny sequence sizes, such as $N = 10$, and the approximation

1241: is already quantitatively reasonable for sequences of length $60$.

1242:

1243: \paragraph{The fitness landscape.}

1244: The construction of a tractable fitness landscape which nevertheless

1245: comprises the relevant biology is certainly the major task for all

1246: these models. In this contribution, in order to obtain at least some

1247: analytical

1248: results, we have chosen a fitness function from the smooth end of the

1249: landscape zoo. Due to its permutation invariance, the quadratic

1250: fitness function effectively disregards any local variance in

1251: the interaction between sites, but only considers the average epistatic

1252: effect. As such, it is in many respects certainly no more than a

1253: toy-model for evolution. However, the assumption of permutation

1254: invariance of the sites is quite common in evolutionary biology and

1255: comprises a large number of standard models for evolution, such as the

1256: quadratic optimum model or Eigen's original sharply peaked landscape.

1257: The results show that the essential structure responsible

1258: for characteristic effects such as the error threshold is already

1259: contained in this simplified framework and may

1260: also serve as a reference for future work on fitness functions

1261: with increased ruggedness, such as the NK-landscape hierarchy \cite{KL}.

1262: Here, we expect the results for the quadratic fitness model to be

1263: qualitatively stable at least under certain forms of mild ruggedness,

1264: such as the introduction of site-randomness in the fields and

1265: interactions \cite{DK}. Pronounced changes, on the other hand, should

1266: be expected when spin-glass effects come into play.

1267:

1268: \paragraph{Finite population size.}

1269: In going from the deterministic limit to the evolution of finite

1270: populations, the ordinary differential equation (\ref{paramuse}) has

1271: to be replaced by the master equation of a stochastic process which is

1272: no longer covered by the theoretical framework presented in this

1273: article. Due to the complexity of the stochastic equations, analytical results

1274: seem to be out of reach at present for all but the simplest selection

1275: schemes. Monte-Carlo simulations, however, should be possible and

1276: could considerably add to theoretical insight here.

1277:

1278: Although the general picture of the deterministic case should persist

1279: at least for sufficiently large populations, the study of finite

1280: population effects is certainly of importance.

1281: For related models, such as the quasispecies model with the

1282: {\em single peaked} landscape, it is has been found \cite{NS}

1283: that the deterministic

1284: results can be interpreted as the time averages of the stochastic

1285: process for mutation rates outside a certain interval around an error

1286: transition. Directly at the threshold, however, large fluctuations and

1287: a jump in the long-time averages appear in the stochastic system at a critical

1288: mutation rate which seems to be lower by an amount roughly

1289: proportional to $1/\sqrt{N}$ in comparison with the deterministic case.

1290: Mainly because of these expected finite population effects we have

1291: restricted discussions in this article entirely to the phase space

1292: structure of the models and the order of the phase transitions. Any further

1293: details of the transitions, even critical exponents, will presumably

1294: never be visible in real biological systems and thus seem to be

1295: of limited relevance in this context.

1296:

1297: Let us finally remark that, although biological populations are

1298: certainly finite, the consideration of the infinite population limit

1299: is not (only) a technical necessity, but also of direct importance for the

1300: study of the error threshold. That is so because this effect, in distinction

1301: to the phenomenon of Muller's ratchet, is {\em by definition} not due to

1302: genetic drift, but solely due to the form of the fitness function. It

1303: has thus always to be shown that the threshold effect persists even

1304: for infinitly large population sizes.

1305:

1306:

1307: \paragraph{Error threshold behaviour.}

1308:

1309: Since there are more than one and sometimes conflicting definitions of

1310: the error threshold in literature (cf.\ the discussion in \cite{BG}),

1311: let us start this paragraph with a few clarifying remarks. In this

1312: article,

1313: following \cite{BG}, we use the notion of the error threshold as

1314: equivalent to phase transitions. As such, a clear-cut mathematical

1315: definition (as non-analytical points in the mean fitness) is possible

1316: only in the infinite sites (or thermodynamic) limit. However, since

1317: the thermodynamic limit can be considered as an excellent

1318: approximation already for rather small systems, the infinite system

1319: property gives a valid explanation for prominent features which are

1320: observable for finite sequences as well. In our study, we have always

1321: considered sequences of a fixed length and have treated the mutation

1322: rate per site as the variable driving the transition. In comparing

1323: systems of different length, we have scaled the variables such that a

1324: well-defined limit is approached as $N \to \infty$. In particular, the

1325: `critical' mutation rate per site in a finite system quickly converges

1326: to the limiting value $\tilde{\mu}_c$.

1327: Originally, the threshold has been viewed as a limitating factor on

1328: the sequence length \cite{E}. This, however, should not be confusing:

1329: We switch to this latter picture simply by letting the reduced

1330: mutation rate depend linearly on the sequence length,

1331: $\tilde{\mu} \sim N$, and obtain a critical length

1332: $N_c \sim \tilde{\mu}_c$ (for sufficiently large sequences).

1333:

1334: Our results on the error threshold phenomenon fit previous ones for

1335: the two-state case and related models in that negative epistasis is

1336: needed to observe a transition (cf.\ \cite{W,BG}).

1337: Contrary to the two-state case, the threshold corresponds to a

1338: first-order transition for certain parameter ranges and persists for

1339: a sufficiently small linear part in the fitness function. Both, the

1340: equilibrium and the dynamical phase diagram of the

1341: transition-transversion model (with $\alpha_i = 0$),

1342: possess two ordered phases characterized by non-zero values of one or

1343: all three components of the surplus order-parameter and the disordered

1344: phase with zero surplus where selection ceases to operate. The

1345: threshold effect appears to be especially sharp in the evolution

1346: dynamics, where a jump in the mean surplus and fitness and a delta

1347: singularity in the variance of fitness occurs.

1348:

1349: Besides the threshold effect, however, other properties of

1350: mutation-selection models may be studied within the framework

1351: presented. After all, exclusive concentration on phase

1352: transitions is perhaps too much a physicist's point of view on these

1353: systems. The relations between surplus, mutation rate and the variance of

1354: fitness (\ref{zeit}), (\ref{variance}), for example, are valid for the entire

1355: time evolution and arbitrary mutation rates. Depending on the fitness

1356: function applied, they may give rise to characteristic features also

1357: far off the transition point. This is particularly explicit for the

1358: equilibrium variance of fitness which runs through a pronounced

1359: maximum for fitness functions with negative epistasis at a mutation

1360: rate much smaller than the threshold value.

1361:

1362: \section*{Acknowledgments}

1363:

1364: It is our pleasure to thank Ellen Baake and Oliver Redner for numerous

1365: discussions and comments on the manuscript. Financial support from the

1366: German Science Foundation (DFG) is gratefully acknowledged.

1367:

1368: %\appendix{Threshold criterion for the symmetric model}

1369:

1370: %In the

1371:

1372: %\begin{equation}

1373: %f(\bm{\sigma}) := 3N \sum_{n=0}^\infty \left(\frac{c_n^{}}{n}

1374: %s^n(\bm{\sigma}) \right) \;;\quad  s_1 = s_2 = s_3 = s \;.

1375: %\end{equation}

1376:

1377: %\begin{equation}

1378: %hkgjkgh

1379: %\end{equation}

1380:

1381:

1382:

1383:

1384:

1385: \begin{thebibliography}{99}

1386: \bibitem{B}

1387: E.\ Baake,

1388:    Diploid models on sequence space,

1389:    {\it J.\ Biol.\ Syst.\/} {\bf 3} (1995) 343--9.

1390: \bibitem{BBW}

1391: E.\ Baake, M.\ Baake and H.\ Wagner,

1392:    Ising quantum chain is equivalent to a model of biological evolution,

1393:    {\it Phys.\ Rev.\ Lett.\/} {\bf 78} (1997) 559--62; Erratum:

1394:    {\it Phys.\ Rev.\ Lett.\/} {\bf 79} (1997) 1782.

1395: \bibitem{BBW2}

1396: E.\ Baake, M.\ Baake and H.\ Wagner,

1397:    Quantum mechanics versus classical propability in biological evolution,

1398:    {\it Phys.\ Rev.\/} {\bf E57} (1998) 1191--2.

1399: \bibitem{BG}

1400: E.\ Baake and W.\ Gabriel,

1401:    Biological evolution through mutation, selection, and drift: An introductory

1402:    review,

1403:    {\it Ann.\ Rev.\ Comput.\ Phys.\/} {\bf 7}

1404:    ({\em in press}, cond-mat/9907372).

1405: \bibitem{CK}

1406: J.\ Crow and M.\ Kimura,

1407:    {\em An Introduction to Population Genetics Theory}, Harper \& Row

1408:    (New York 1970).

1409: \bibitem{DK}

1410: N.G.~Duffield and R.~K\"uhn,

1411:    The thermodynamics of site-random mean-field quantum spin systems,

1412:    {\it J.\ Phys.\/} {\bf A22} (1989) 4643--58.

1413: \bibitem{E}

1414: M.\ Eigen,

1415:    Selforganization of matter and the evolution of biological

1416:    macromolecules,

1417:    {\it Naturwiss.\/} {\bf 58} (1971) 465--523.

1418: \bibitem{ECS}

1419: M.\ Eigen, J.\ McCaskill and P.\ Schuster,

1420:    The molecular quasi-species,

1421:    {\it J.\ Chem.\ Phys.\/} {\bf 75} (1989) 149--263.

1422: \bibitem{Fish}

1423: R.A.~Fisher,

1424:    {\em The Genetical Theory of Natural Selection}, Clarendon Press

1425:    (Oxford 1930).

1426: \bibitem{FP}

1427: S.~Franz and L.~Peliti,

1428:    Error threshold in simple landscapes,

1429:    {\it J.~Phys.\/} {\bf A26} (1993) 4481--7.

1430: \bibitem{FPS}

1431: S.~Franz, L.~Peliti, and M.~Sellitto,

1432:    An evolutionary version of the random energy model,

1433:    {\it J.\ Phys.\/} {\bf A26} (1993) L1195--9.

1434: \bibitem{Gal}

1435: S.~Galluccio,

1436:    Exact solution of the quasispecies model in a sharply-peaked

1437:    landscape,

1438:    {\it Phys.\ Rev.\/} {\bf E56} (1997) 4526--39.

1439: \bibitem{KL}

1440: S.A.~Kauffmann and S.A.~Levin,

1441:    Towards a general theory of adaptive walks on rugged landscapes,

1442:    {\it J.\ Theor.\ Biol.\/} {\bf 128} (1987) 11--45.

1443: \bibitem{Kogut}

1444: J.~Kogut,

1445:    An introduction to lattice gauge theory and spin systems,

1446:    {\it Rev.\ Mod.\ Phys.\/} {\bf 51} (1979) 656--713.

1447: \bibitem{Leut}

1448: I.\ Leuth\"ausser,

1449:    An exact correspondence between Eigen's evolution model and a

1450:    two-dimensional Ising system,

1451:    {\it J.\ Chem.\ Phys.\/} {\bf 84} (1986) 1884--5.

1452: \bibitem{Leut2}

1453: I.~Leuth\"ausser,

1454:    Statistical mechanics of Eigen's evolution model,

1455:    {\it J.~Stat.~Phys.\/} {\bf 48} (1987) 343--60.

1456: \bibitem{Li}

1457: W.-H.\ Li,

1458:    {\it Molecular Evolution}, Sinauer (Sunderland, 1997).

1459: \bibitem{MT}

1460: K.~Malarz and D.~Tiggemann,

1461:    Dynamics in Eigen's evolution model,

1462:    {\it Int.\ J.\ Mod.\ Phys.\/} {\bf C9} (1997) 481--90.

1463: \bibitem{NS}

1464: M.~Nowak and P.~Schuster,

1465:    Error thresholds of replication in finite populations. Mutation

1466:    frequencies and the onset of Muller's ratchet.

1467:    {\it J.\ Theor.\ Biol.\/} {\bf 137} (1989) 375--95.

1468: \bibitem{OB}

1469: P.~O'Brien,

1470:    A genetic model with mutation and selection,

1471:    {\em Math.~Biosci.\/} {\bf 73} (1985) 239--51.

1472: \bibitem{Oli}

1473: O.\ Redner,

1474:    {\em private communication} (1999).

1475: \bibitem{Sta}

1476: P.\ Stadler,

1477:    Landscapes and their correlation functions,

1478:    {\em J.\ Math.\ Chem.\/} {\bf 20} (1996) 1--45.

1479: \bibitem{SOWH}

1480: D.\ Swofford, G.\ Olsen, P.\ Waddell and D.\ Hillis,

1481:    Phylogenetic inference, in: M.\ Hillis, C.\ Moritz and E.\ Mable (Eds.):

1482:    {\em Molecular Systematics}, Sinauer (Sunderland, 1995), pp.\ 407--517.

1483: \bibitem{Tara}

1484: P.~Tarazona,

1485:    Error thresholds for molecular quasispecies as phase

1486:    transitions: From simple landscapes to spin-glass models,

1487:    {\it Phys.\ Rev.\/} {\bf A45} (1992) 6038--50.

1488: \bibitem{TM}

1489: C.J.\ Thompson and J.L.\ McBridge,

1490:    On Eigen's theory of the self-organization of matter and the evolution of

1491:    biological macromolecules,

1492:    {\it Math.\ Biosci.\/} {\bf 21} (1974) 127--42.

1493: \bibitem{Wag}

1494: H.\ Wagner,

1495:    {\em Biologische Sequenzraummodelle und Statistische Mechanik},

1496:    PhD thesis, University of T\"ubingen, Dissertations Druck

1497:    (Darmstadt 1998).

1498: \bibitem{WBG}

1499: H.\ Wagner, E.\ Baake and T.\ Gerisch,

1500:    Ising Quantum chain and sequence evolution,

1501:    {\it J.\ Stat.\ Phys.\/} {\bf 92} (1998) 1017--52.

1502: \bibitem{W}

1503: T.~Wiehe,

1504:    Model dependency of error thresholds: the role of the fitness

1505:    functions and contrasts between the finite and infinite sites

1506:    models,

1507:    {\it Genet.\ Res.\ Camb.\/} {\bf 69} (1997) 127--36.

1508: \end{thebibliography}

1509: \end{document}

1510:

1511:

1512:

1513: