1: \documentclass{article}
2: \usepackage{a4,amsmath,epsfig,amssymb}
3: %\newcommand{\bm}[1]{\mbox{\boldmath $#1$}}
4: \newcommand{\bm}[1]{\boldsymbol #1}
5: \newcommand{\zmpf}[1]{\mbox{\hspace{#1em}}}
6: \newcommand{\Id}{\mbox{$\,$\rm 1\zmpf{-0.62}{\small 1}}}
7: \newcommand{\RR}{\mathbb R}
8: \newcommand{\TT}{\mathbb T}
9: \newcommand{\CC}{\mathbb C}
10: \newcommand{\ZZ}{\mathbb Z}
11: \newcommand{\QQ}{\mathbb Q}
12:
13: \begin{document}
14:
15: \title{Four-state quantum chain\\ as a model of sequence evolution}
16: \author{{\sc Joachim Hermisson$^{1,2}$, Holger Wagner$^{3}$ and
17: Michael Baake$^{1}$}
18: \\[2mm]
19: ${}^{1}$Institut f\"ur Theoretische Physik, Universit\"at
20: T\"ubingen,\\ Auf der Morgenstelle 14, 72076 T\"ubingen, Germany\\
21: ${}^{2}$Institut f\"ur Theorie der Kondensierten Materie,\\
22: Universit\"at Karlsruhe, 76128 Karlsruhe, Germany\\
23: ${}^{3}$Max-Planck-Institut f\"ur Biophysikalische Chemie,\\
24: Am Fa{\ss}berg 11, 37077 G\"ottingen, Germany}
25: \maketitle
26: \begin{abstract}
27: A variety of selection-mutation models for DNA (or RNA) sequences,
28: well known in molecular evolution, can be translated into a model of coupled
29: Ising quantum chains. This correspondence is used to investigate the
30: genetic variability and error threshold behaviour in dependence of possible
31: fitness landscapes. In contrast to the two-state models treated
32: hitherto, the model explicitly takes the four-state nature of the
33: nucleotide alphabet into account and allowes for the distinction of
34: mutation rates for the different base substitutions, as given by
35: standard mutation schemes of molecular phylogeny. As a consequence of
36: this refined treatment, new phase diagrams for the error threshold
37: behaviour are obtained, with appearance of a novel phase in which the
38: nucleotide ordering of the wildtype sequence is only partially conserved.
39: Explicit analytic and numeric results are presented for evolution
40: dynamics and equilibrium behaviour in a number of accessible
41: situations, such as quadratic fitness landscapes and the Kimura
42: 2 parameter mutation scheme.
43: \end{abstract}
44:
45: \section{Introduction}
46:
47: One prominent phenomenon in the theory of molecular evolution that has
48: also attracted considerable attention in statistical physics is the
49: so-called {\em error threshold}. It describes the breakdown of
50: genetic order in mutation-selection models for mutation rates
51: surpassing a certain critical value. The prototype model for the
52: description of the error threshold is Eigen's quasispecies model
53: in sequence space \cite{E,ECS} (which is effectively equivalent to a
54: coupled mutation-selection model in population genetics, cf \cite{CK}),
55: originally designed for the description of prebiotic RNA
56: evolution. However, the threshold is supposed to be a phenomenon that
57: should occur in a rather general class of mutation-selection models.
58:
59: In order to set up a mutation-selection model that is tractable by
60: analytical (or at least numerical) methods, severe simplifications
61: of the original biological situation seem to be indispensable.
62: Analytical approaches generally have to restrict to the treatment
63: of infinitly large populations and rather simple fitness functions,
64: such as the sharply peaked landscape of Eigens original model.
65: Another common approximation, also used in previous studies of the
66: quasispecies model, amounts for the simplified
67: representation of genotypes as binary strings. In the context of
68: molecular evolutionary theory, this may be thought of as representing
69: DNA or RNA strands by sequences of {\em purins} and {\em pyrimidins},
70: hence with only two states per site, neglecting the fact that genetic
71: information is really given by a four-letter alphabet. In this
72: article, we present a four-state mutation-selection model
73: which is capable to describe the full nucleotide alphabet and
74: incorporates the standard mutation schemes of molecular phylogeny.
75: In particular, the phase diagramms are discussed in detail which
76: are more polymorphic than for the two-state model. This shows that,
77: for a full understanding of the error threshold behaviour in
78: molecular evolution, investigations can not be restricted entirely
79: to the study of two-state models.
80:
81: One important step towards an understanding of the
82: threshold phenomenon has been its identification with an equilibrium phase
83: transition in physics by the translation of a time-discrete version of
84: the quasispecies model into the transfer matrix of an anisotropic
85: two-dimensional Ising model \cite{Leut}. This equivalence was further
86: exploited to study various aspects of the error threshold
87: with methods from statistical physics \cite{Leut2,Tara,FPS,FP,MT}.
88: It turns out, however, that the anisotropy of that model is not so
89: easy to handle and the analysis of the relevant biological quantities
90: (which correspond to certain surface properties of the Ising model)
91: remains an involved problem. Due to the complications of the model,
92: almost all results obtained so far are approximate or numerical. The
93: only exact result for the {\em sharply peaked landscape} \cite{Gal}
94: has been worked out via a different analogy to a model of directed
95: polymers, using the specific properties of that very special fitness
96: landscape.
97:
98: An alternative approach to the analysis of mutation-selection models
99: and the error threshold which avoids some of the problems of the
100: anisotropic Ising model has been brought up in \cite{BBW,WBG}.
101: Here, the starting point on the biological side is a slightly changed
102: model which describes the evolution of a population with overlapping
103: generations in continuous time. It turns out that, after a
104: reformulation in tensor products, the two-state version of this model
105: is equivalent to the Hamiltonian of an Ising quantum chain. Thereby,
106: the change to continuous time in the biological description
107: corresponds to the anisotropic limit that connects the
108: two-dimensional Ising model and the quantum chain in physics
109: (cf.~\cite{Kogut}). The quantum chain model is technically easier to
110: handle, and exact results for two non-trivial fitness landscapes,
111: namely Onsager's landscape and the quadratic fitness function, have
112: been worked out \cite{BBW,WBG}.
113:
114: Accordingly, we extend this latter approach to a full four-state model
115: in this study. The quantum chain analogy allows to use well-known methods
116: from statistical mechanics for the solution of the model, so that we do not
117: have to dwell on technical details here. For an extended presentation
118: of methods (with regard to the two-state model) using techniques from
119: rigorous mean field theory, we refer to \cite{Wag,WBG}. The main focus
120: is instead on the discussion of the threshold behaviour and in
121: particular the increased complexity of the phase diagram due to the
122: consideration of the four-state nature of biological information and
123: the refined schemes of molecular mutation rates.
124:
125: In the following section, we start with a presentation of the biological
126: foundations of our model. Only thereafter, we will introduce the quantum
127: chain model in Section 3. In Section 4, analytical and numerical
128: results are presented for a number of specific four-state models
129: with permutation invariant fitness landscapes. Also the properties of
130: finite sequences and the evolution dynamics will be studied.
131: We close with a summary of our results and a discussion of open
132: problems in Section 5.
133:
134: \section{Biological foundations}
135:
136: Genetic information is coded in DNA (and RNA) molecules. These are
137: heteropolymers of four units (nucleotides) which differ in a specific
138: base. The essential aspect of a DNA sequence is captured in
139: a string over a four-letter alphabet
140: \begin{equation}
141: {\bm \sigma} \in V \equiv V_1 \times V_2 \times \dots \times V_N \;;\quad
142: V_i = \{A,C,G,T\}
143: \end{equation}
144: where each letter represents a particular base: $A$ and $G$ for
145: adenine and guanine (the purins), $C$ and $T$ for cytosine and thymine
146: (the pyrimidins). In RNA sequences, $T$ is replaced by $U$ for uracil.
147: We will therefore treat the $4^N$ different sequences of a fixed,
148: finite length $N$ as our genotypes (which may be thought of as coding
149: for something, such as a virus or an enzyme). Disregarding
150: environmental effects, we may identify a collection of genotypes with
151: a {\em population} of haploid `individuals'. Evolution then describes
152: the change of the population composition in time.
153:
154: A standard model for the evolution of an infinite, asexually
155: reproducing population under the basic forces of mutation and
156: selection which works in continuous time is given by the following
157: system of non-linear differential equations \cite{CK}
158: \begin{equation} \label{paramuse}
159: \dot{p}_{\bm{\sigma}}^{}(t) =
160: \big( r_{\bm{\sigma}}^{} - \bar{r}(t)\big) p_{\bm{\sigma}}^{}(t)
161: + \sum_{\bm{\sigma'}} m_{\bm{\sigma}\bm{\sigma'}} p_{\bm{\sigma'}}(t)\;.
162: \end{equation}
163: Here, $p_{\bm{\sigma}}^{}(t)$ denotes the relative frequency of genotype
164: ${\bm \sigma}$ at time $t$ with corresponding Malthusian fitness
165: (replication rate minus death rate) $r_{\bm \sigma}^{}$, and
166: \begin{equation}
167: \bar{r}(t) = \sum_{\bm{\sigma}} r_{\bm{\sigma}} p_{\bm{\sigma}}(t)
168: \end{equation}
169: is the {\em mean fitness} of the population. It is the origin of the
170: non-linearity in (\ref{paramuse}). Finally,
171: $m_{{\bm \sigma}{\bm \sigma'}}$ is the (time independent) rate at which
172: ${\bm \sigma'}$ mutates to ${\bm \sigma}$. This framework has
173: originally been defined in classical population genetics \cite{CK}. In
174: the sequence space context, it has been introduced in \cite{B} and has been
175: called the {\it para-muse} ({\em pa}rallel {\em mu}tation-{\em se}lection)
176: model, since it assumes mutation and selection to act independently
177: and in parallel at each instant of time.
178: The model ignores recombination and genetic drift due to finite
179: population size. Both assumptions can be considered as fairly reasonable
180: at least in the context of the evolution of viruses or bacteria where
181: populations can be huge and recombination is absent, or the
182: nucleotides are tightly linked. In the following subsections, the
183: basic processes of mutation and selection shall be described in some detail.
184:
185: \subsection{Mutation}
186:
187: We take mutation as a point process acting independently on
188: all sites, ignoring more complicated mechanisms, such as
189: insertions or deletions. Molecular mutation rates shall be chosen
190: according to the following scheme, known as the {\em Kimura 3 ST
191: model} in molecular phylogeny \cite{Li,SOWH}:
192: \begin{figure}[ht]
193: \centerline{\epsfysize=27mm \epsfbox{mutation.eps}}
194: \caption{Molecular mutation scheme according to the Kimura 3 ST model.}
195: \label{mutfig}
196: \end{figure}
197:
198: Within this general setup, a number of simpler models is contained,
199: which treat mutation at different levels of sophistication. In the
200: simplest approach, the mutation rates between all four nucleotides
201: are assumed to be equal $(\mu_1 = \mu_2 = \mu_3)$. This is the
202: so-called {\em Jukes-Cantor mutation scheme}. While this simple
203: frame already seems to be sufficient for a number of applications,
204: measurements reveal that there are indeed pronounced differences in
205: the mutation rates that should be accounted for in more realistic
206: models. In particular, the {\em transitions} between the two purins
207: (A,G) and the two pyrimidins (C,T) are much more frequent than the
208: purin--pyrimidin mutations which are called {\em transversions}. This
209: may range up to relative differences of
210: $\mu_1 \approx \mu_3 \simeq \mu_2/2$ in the
211: nucleus and $\mu_1 \approx \mu_3 \simeq \mu_2/40$ in mitochondrial
212: DNA \cite{Li}. A mutation scheme with $\mu_2 > \mu_1 = \mu_3$ is known as the
213: {\em Kimura 2 parameter model}. The full {\em Kimura 3 ST} scheme,
214: finally, also accounts for the small difference between $\mu_1$ and
215: $\mu_3$, such that $\mu_2 > \mu_1 > \mu_3$.
216:
217: Implementing this mutation model into the evolution equation
218: (\ref{paramuse}), we obtain the following mutation rates between
219: genotypes ($i \in \{1,2,3\}$)
220: \begin{equation} \label{mss}
221: m_{{\bm \sigma}{\bm \sigma'}} = \left\{
222: \begin{array}{rl}
223: \mu_i, \quad & d_i({\bm \sigma},{\bm \sigma'})
224: = d_{{\bm \sigma}{\bm \sigma'}} = 1
225: \\
226: -N \sum_i \mu_i,\quad & {\bm \sigma} = {\bm \sigma'}
227: \\
228: 0,\quad & d_{{\bm \sigma}{\bm \sigma'}} > 1
229: \end{array} \right. \;.
230: \end{equation}
231: Here,
232: \begin{eqnarray} \nonumber
233: d_1({\bm \sigma},{\bm \sigma'}) & = &
234: \#_{A \rightleftarrows C}({\bm \sigma},{\bm \sigma'})
235: + \#_{G \rightleftarrows T}({\bm \sigma},{\bm \sigma'})
236: \\ \label{Hamming}
237: d_2({\bm \sigma},{\bm \sigma'}) & = &
238: \#_{A \rightleftarrows G} ({\bm \sigma},{\bm \sigma'})
239: + \#_{C \rightleftarrows T}({\bm \sigma},{\bm \sigma'})
240: \\ \nonumber
241: d_3({\bm \sigma},{\bm \sigma'}) & = &
242: \# _{A \rightleftarrows T}({\bm \sigma},{\bm \sigma'})
243: + \#_{C \rightleftarrows G}({\bm \sigma},{\bm \sigma'})
244: \end{eqnarray}
245: are restricted Hamming distances between ${\bm \sigma}$ and ${\bm \sigma'}$.
246: In (\ref{Hamming}), $\#_{X \rightleftarrows Y}({\bm \sigma},{\bm \sigma'})$
247: counts the positions at which $X$ and $Y$ are exchanged in $\bm{\sigma}$ and
248: $\bm{\sigma}'$. Finally,
249: \begin{equation}
250: d_{{\bm \sigma}{\bm \sigma'}} = d_1({\bm \sigma},{\bm \sigma'})
251: + d_2({\bm \sigma},{\bm \sigma'}) + d_3({\bm \sigma},{\bm \sigma'})
252: \end{equation}
253: is the total Hamming
254: distance. Note that the choice of the diagonal term
255: $m_{{\bm \sigma}{\bm \sigma}}$ in (\ref{mss}) just accounts for
256: probability conservation ($\sum_{\bm{\sigma}}
257: \dot{p}_{\bm{\sigma}} = 0$) in the mutation part of the
258: evolution equation (\ref{paramuse}).
259:
260: \subsection{Selection and fitness landscape}
261:
262: Whereas the mutational part of the dynamics is fairly well understood
263: at least on the microscopic (molecular) level, the relation of
264: genotype and fitness, which defines the respective selective success,
265: is notoriously complex.
266: Following the standard notion in molecular evolution, we define the
267: {\em fitness function} (or {\em fitness landscape})
268: \begin{equation}
269: f: \bm{\sigma} \mapsto r_{\bm{\sigma}}
270: \end{equation}
271: as a mapping from the configuration space $V= \{A,C,G,T\}^N$ into the
272: real numbers, assigning a reproduction rate (Malthusian fitness value)
273: $r_{\bm{\sigma}}$ to each
274: genotype. Implicitly, the fitness function incorporates all the
275: complicated interactions between the sites. These interactions
276: are typically long-ranged (since RNA strands or proteins fold in three
277: dimensions), highly correlated, and give rise to rather rugged landscapes.
278: Especially in the context of RNA evolution, the construction and
279: characterization of fitness landscapes has motivated numerous studies,
280: see e.g.\ \cite{Sta} for a review.
281:
282: Below we will show how the evolution equation (\ref{paramuse}), with
283: an arbitrary choice of the fitness function, can be adapted to the
284: methods from statistical physics by a reformulation in a quantum
285: chain framework. As an application, we then present a study (including
286: analytical and numerical results) for specific examples from the class
287: of permutation invariant fitness functions. Here, due to equivalence of
288: all sites, the fitness of a given genotype is solely a function of
289: its restricted Hamming distances from the so called {\em wildtype} sequence
290: with optimal fitness which we choose as the reference genotype.
291: This particularly simple class of fitness
292: landscapes is widely used, as a canonical first approximation,
293: especially in {\em multilocus theory}. Also in the context of sequence
294: space evolution, fitness functions of this type
295: have been used in a number of studies on the two-state model
296: \cite{OB,Leut2,Tara,BBW,WBG}. To implement the approach in our
297: four-state model, we fix an arbitrary sequence, denoted by
298: $\bm{\sigma}_{++}$, as
299: the wildtype. We will only consider directional selection here towards a
300: unique genotype with optimal fitness. The fitness of any other
301: sequence is then determined by the restricted Hamming distances
302: $d_i$ relative to $\bm{\sigma}_{++}$.
303: Permutation invariance with respect to the position in the sequence
304: thus leads to a drastic reduction of dimensions. For the four-state
305: model, the effective configuration
306: space forms a tetrahedron in 3d (see Fig.~\ref{select}) and is
307: conveniently represented in Cartesian coordinates which we
308: shall call (following \cite{BBW}) the {\em surplus components}:
309: \begin{eqnarray}\nonumber
310: s_1(\bm{\sigma}) &=& 1 - \frac{2}{N}
311: \Big(d_1(\bm{\sigma},\bm{\sigma}_{++})+d_3(\bm{\sigma},\bm{\sigma}_{++})\Big)\;;
312: \\ \label{surplus}
313: s_2(\bm{\sigma}) &=& 1 - \frac{2}{N}
314: \Big(d_2(\bm{\sigma},\bm{\sigma}_{++})+d_3(\bm{\sigma},\bm{\sigma}_{++})\Big)\;;
315: \\ \nonumber
316: s_3(\bm{\sigma}) &=& 1 - \frac{2}{N}
317: \Big(d_1(\bm{\sigma},\bm{\sigma}_{++})+d_2(\bm{\sigma},\bm{\sigma}_{++})\Big)\;.
318: \end{eqnarray}
319: \begin{figure}[t]
320: \centerline{\epsfysize=50mm \epsfbox{select2.eps}}
321: \caption{Permutation invariant configuration space of the four-state
322: model in surplus coordinates.}
323: \label{select}
324: \end{figure}
325: With this choice, any unstructured random sequence has coordinates
326: $s_i \equiv 0$ (with probability 1 in the limit $N\to \infty$).
327: Any positive value of a surplus component, on the other hand, signals a
328: non-trivial overlap of the sequence with the wildtype $\bm{\sigma}_{++}$.
329: In particular, $s_1$ measures the surplus of sites with purins or pyrimidins
330: as given in $\bm{\sigma}_{++}$ over the purin--pyrimidin mutated sites.
331:
332: Within this frame, a natural class of permutation invariant fitness
333: functions is
334: \begin{equation} \label{fit}
335: f: \bm{\sigma} \mapsto
336: r_{\bm{\sigma}} = N \sum_{i=1}^3 \left[\alpha_i^{} s_i(\bm{\sigma}) +
337: \frac{\gamma_i^{}}{2} s_i^2(\bm{\sigma}) \right]
338: \end{equation}
339: which includes the following special cases
340: \begin{itemize}
341: \item
342: Setting $\alpha_i > 0$ and $\gamma_{i} = 0$, we obtain the purely additive
343: {\em Fujiyama landscape} without genetic interactions. Here, every
344: mutation relative to the wildtype has a fixed deleterious effect,
345: independent of any other mutation that may be present in the sequence.
346: The additive landscape is a canonical zeroth-order approximation, ignoring
347: any kind of genetic interactions. In the context of sequence
348: evolution, this fitness function has been discussed e.g.~in \cite{OB,BBW}.
349: \item
350: With the choice $\alpha_i \ge - \gamma_{i} > 0$, the model
351: corresponds to a concave quadratic fitness function
352: (with directional selection) as it is frequently met
353: in multilocus theory. Due to the gene interactions, existing mutations
354: tend to aggravate further ones, which is called {\em positive epistasis}.
355: \item
356: For $\alpha_i \ge 0$ and $\gamma_i > 0$, we finally obtain a convex fitness
357: function for directional selection with long-range gene interactions and
358: {\em negative epistasis} (existing mutations tend to alleviate further
359: ones). Since we want to have $\bm{\sigma}_{++}$ as unique wildtype
360: sequence and a fitness function which is monotonous in the surplus
361: components, we restrict $f$ to the octant $s_i \ge 0$ and (smoothly)
362: truncate the fitness function by introduction of a step function
363: $\Theta(s_i)$ whenever frequencies of genotypes with $s_i < 0$ are
364: non-zero:
365: \begin{equation} \label{fit2}
366: \tilde{f}: \bm{\sigma} \mapsto
367: r_{\bm{\sigma}} = N \sum_{i=1}^3
368: \left[\left(\alpha_i^{} s_i(\bm{\sigma}) +
369: \frac{\gamma_i^{}}{2} s_i^2(\bm{\sigma}) \right)\Theta(s_i) \right]\;.
370: \end{equation}
371: \end{itemize}
372: The variables $\alpha_i$ and $\gamma_i$ may further be used to
373: distinguish between the effects of the different types of mutations
374: (as defined in Fig \ref{mutfig}) on the fitness. In this article,
375: we will present explicit results for the two following cases:
376: \begin{enumerate}
377: \item
378: For the simplest choice, $\alpha_1=\alpha_2=\alpha_3$ and
379: $\gamma_1=\gamma_2=\gamma_3$, any mutation away from the wildtype has
380: the same effect. Together with the Jukes-Cantor mutation scheme,
381: symmetry here leads to equal values of the surplus components in the
382: mutation--selection equilibrium. The model may thus also be thought
383: of as a two-state model, where any site is only regarded as occupied
384: either with a {\em wildtype} or with a {\em mutant} nucleotide.
385: In contrast to the simple two-state model of \cite{BBW}, however,
386: there is an effectively asymmetric mutation rate between wildtype
387: and mutant in the case considered here.
388: \item
389: In a more refined model, we distinguish between transitions and
390: transversions. In the mutational part, this is done by applying the
391: Kimura 2 parameter mutation scheme. In the fitness function, we take
392: into account that the deleterious effects of the transversions often
393: dominate over those of the transitions: $\alpha_1 > \alpha_{2,3}$
394: and/or $\gamma_1 > \gamma_{2,3}$.
395: \end{enumerate}
396:
397:
398: \section{Quantum chain model}
399:
400: \subsection{Symmetries}
401:
402: Since mutation is a random process that is independent of
403: the fitness values of the genotypes involved, the molecular mutation
404: scheme consequently makes no reference to fitness concepts like the
405: {\em wildtype}. Biological observables measurable from sequence data,
406: such as the surplus components (\ref{surplus}), and also the fitness
407: functions as defined in (\ref{fit}) or (\ref{fit2}), on the other
408: hand, are defined relative to the wildtype sequence. In order to set
409: up these concepts in a common framework, it is convenient to
410: reformulate also the mutational part of the evolution equation in
411: coordinates relative to the wildtype. This may always be done
412: due to certain symmetries inherent in the mutation scheme of
413: Fig.~\ref{mutfig}.
414:
415: The basic symmetry of the mutation scheme, if all three mutation rates
416: $\mu_1, \mu_2, \mu_3$ are pairwise different, is $C_2 \times C_2$
417: (Klein's 4-group), generated by two involutions. If we write the
418: operations in standard permutation notation, we can take as generators
419: the transformations
420: \begin{equation}
421: \begin{pmatrix}
422: A&C&G&T \\ C&A&T&G
423: \end{pmatrix} \quad \text{and} \quad
424: \begin{pmatrix}
425: A&C&G&T \\ G&T&A&C
426: \end{pmatrix}\;,
427: \end{equation}
428: both being the product of two transpositions. This symmetry may
429: now be exploited for a redefinition of the mutation scheme in
430: wildtype coordinates. To this end, we fix, for every site of the
431: wildtype sequence, the element of the 4-group (in the above
432: representation) with the letter of the wildtype nucleotide in the
433: first position (e.g. the string $(T,G,C,A)$ for wildtype nuceotide
434: $T$). An alternative representation of the configuration space in wildtype
435: coordinates as
436: \begin{equation}
437: {\bm \sigma} \in V^\pm \equiv V_1^\pm \times V_2^\pm
438: \times \dots \times V_N^\pm \;;\quad
439: V_i^\pm = \{++,-+,+-,--\}
440: \end{equation}
441: is now given by the mapping, on each site, of the string of
442: labels $(++,-+,+-,--)$ to the symmetry element of 4-group defined
443: above. With this notation, the three types of mutations included in the
444: Kimura 3 ST scheme simply switch the signs of the labels:
445: $\pm\pm \to \mp\pm$ at rate $\mu_1$, $\pm\pm \to \pm\mp$ at rate
446: $\mu_2$, and $\pm\pm \to \mp\mp$ at rate $\mu_3$.
447:
448: Higher symmetries of the mutation model are obtained if mutation rates are
449: equal. For the Kimura 2 parameter scheme, $\mu_1 = \mu_3 \neq \mu_2$,
450: the operation
451: \begin{equation}
452: A \to C \to G \to T \to A \; = \;
453: \begin{pmatrix}
454: A&C&G&T \\ C&G&T&A
455: \end{pmatrix}
456: \end{equation}
457: is also a symmetry and generates a cyclic group $C_4$. Together with
458: the previous $C_2 \times C_2$, this generates a dihedral group, $D_4$,
459: with 8 elements. Finally, if $\mu_1 = \mu_2 = \mu_3$, we additionally
460: get the simple transposition $A \leftrightarrow C$
461: and have the full permutation group $S_4$ as symmetry. Note that
462: $S_4$, which corresponds to the full tetrahedral group with 24
463: elements, is also the symmetry group of the configuration space of
464: permutation invariant configurations visualized in
465: Fig.~\ref{select}. The {\em global} symmetry (with the same
466: transformation acting at each site simultaneously) of our class of
467: mutation-selection models with fitness functions according to
468: (\ref{fit}) is therefore always a subgroup of $S_4$.
469: In particular, the symmetric fitness model with $\alpha_1 = \alpha_2 =
470: \alpha_3$, $\gamma_1 = \gamma_2 = \gamma_3$, and Jukes-Cantor mutation
471: scheme possesses $C_{3v}$ symmetry, or the full tetrahedral symmetry if the
472: linear part in the fitness function vanishes ($\alpha_i = 0$).
473: The transition-transversion model finally, with $\alpha_1 >
474: \alpha_2 = \alpha_3$, or $\gamma_1 > \gamma_2 = \gamma_3$, and Kimura 2
475: parameter mutation has simple $C_2$ symmetry, or $D_4$ symmetry if
476: $\alpha_i \equiv 0$. In the latter case, the combination of
477: $\gamma_2=\gamma_3$ with $\mu_1=\mu_3$ is necessary, not a
478: misprint. Other combinations with global $D_4$ symmetry are $(\gamma_1
479: = \gamma_3; \mu_2=\mu_3)$ and $(\gamma_1=\gamma_2; \mu_1=\mu_2)$.
480:
481: \subsection{Construction}
482:
483: With the above preparations, we may now follow the lines of
484: \cite{BBW,WBG} where the two-state model is treated.
485:
486: In a first step, we represent the $4^N$-dimensional vector space in
487: which we describe the
488: genotype frequencies as the $N$-fold tensor product space
489: $W = \otimes_{j=1}^N W_j$. Hereby, the configuration space $V^\pm$ is
490: canonically embedded in $W$ by the mapping of the elements of
491: $V_i^\pm$ onto the basis vectors
492: $\{e_{j}^{++}, e_{j}^{-+}, e_{j}^{+-}, e_{j}^{--}\}$ of $W_j \simeq \RR^4$.
493: Since the nonlinear part in the differential
494: equations (\ref{paramuse}) only amounts to normalization of the
495: frequencies, a transformation to so-called
496: {\em absolute frequencies} \cite{TM,BBW}
497: \begin{equation}
498: z_{\bm \sigma}^{}(t) = p_{\bm \sigma}^{}(t) \exp\Big( \sum_{\bm \sigma'}
499: r_{\bm \sigma'}^{} \int_0^t p_{\bm \sigma'}^{}(\tau) \,d\tau \Big)
500: \end{equation}
501: then reduces the system to the linear equation
502: \begin{equation} \label{LGS}
503: \dot{z}_{\bm \sigma}^{}(t) = \big({\cal M} + {\cal R}\big)
504: z_{\bm \sigma}^{}(t)
505: \end{equation}
506: where the mutation and reproduction matrices, ${\cal M} =
507: (m_{\bm\sigma \bm\sigma'})$ and ${\cal R} = \text{diag}(r_{\bm\sigma}^{})$,
508: may now be conveniently represented in the frequency space $W$. Defining
509: \begin{equation}
510: \sigma_j^{(\alpha,\beta)} := \left(\otimes^{j-1} \Id_4 \right) \otimes
511: \left(\sigma^\alpha \otimes \sigma^\beta \right)
512: \otimes \left(\otimes^{N-j-1} \Id_4\right)
513: \end{equation}
514: where $\sigma^\alpha$, $\alpha \in \{0,x,z\}$, are the real Pauli matrices and
515: $\sigma^0 \equiv \Id_2$, we find
516: \begin{equation}
517: {\cal M} = \sum_{j=1}^N \left[ \mu_1 \sigma_j^{(x,0)} + \mu_2
518: \sigma_j^{(0,x)} + \mu_3 \sigma_j^{(x,x)} - (\mu_1+\mu_2+\mu_3) \Id\right]
519: \end{equation}
520: for the mutation matrix. The reproduction matrix ${\cal R}$ is, for a
521: general fitness landscape, an element of the algebra generated by
522: $\sigma_j^{(z,0)}$ and $\sigma_j^{(0,z)}$, $1\le j\le N$,
523: \begin{equation}
524: {\cal R} = r_0 \Id + \sum_{k,\ell = 1}^N
525: \sum_{[j_1^{} \dots j_k^{}]} \sum_{[j_1^{} \dots j_\ell^{}]}
526: \varepsilon_{[j_1^{} \dots j_k^{}],[j_1^{} \dots j_\ell^{}]}^{}
527: \prod_{m=1}^k \sigma_{j_m^{}}^{(z,0)} \prod_{n=1}^\ell
528: \sigma_{j_n^{}}^{(0,z)},
529: \end{equation}
530: where $[j_1^{} \dots j_k^{}]$ is an ordered $k$-tupel in $\{1,\dots,N\}$.
531: Now, from a physical point of view, ${\cal H} = {\cal M} + {\cal R}$
532: is (up to a global minus sign) the Hamiltonian of two coupled Ising
533: quantum chains in a tunable transverse magnetic field (the mutation)
534: and general spin-interactions within the chains.
535:
536: Translated to our quantum chain model, the fitness function of the
537: permutation invariant landscape defined in (\ref{fit}) results in a
538: (longitudinal) magnetic field and a mean field spin-interaction. We find
539: ${\cal R } = {\cal R}_\alpha + {\cal R}_\gamma$, where
540: \begin{equation}
541: {\cal R}_\alpha = \sum_{j=1}^N \left[\alpha_1 \sigma_j^{(z,0)}
542: + \alpha_2 \sigma_j^{(0,z)} + \alpha_3 \sigma_j^{(z,z)} \right]
543: \end{equation}
544: and
545: \begin{equation} \label{rgamma}
546: {\cal R}_\gamma = \frac{1}{2N} \sum_{j,k = 1}^N \left[ \gamma_1
547: \sigma_j^{(z,0)}\sigma_k^{(z,0)} + \gamma_2 \sigma_j^{(0,z)}\sigma_k^{(0,z)} +
548: \gamma_3 \sigma_j^{(z,z)}\sigma_k^{(z,z)} \right]
549: \end{equation}
550: Let us stress that, in contrast to most physical applications, the mean
551: field model is a much more natural approach in the biological
552: context where interactions are typically long-range. So, it is a
553: legitimate model here, not an inevitable approximation.
554:
555:
556: \subsection{Biological and physical observables} \label{bpo}
557:
558: In this subsection, we relate the quantities of biological interest,
559: mean and variance of the surplus components and the fitness, to the
560: physical observables. In what follows, we assume the occuring limits
561: to exist.
562:
563: \paragraph{Genotype composition}
564: According to (\ref{LGS}), the Hamiltonian of the quantum chain determines the
565: time evolution of our population of genotypes in an environment that does not
566: constrain the population size. For any genotype-independent
567: regulation of the population size, the relative genotype frequencies
568: are found by {\em statistical} normalization. We therefore define the
569: vector of the genotype composition $|\bm{p}(t) \rangle$ and the
570: equilibrium composition $|0\rangle$ as
571: \begin{equation}
572: |\bm{p}(t) \rangle =
573: \frac{\exp(t{\cal H})
574: |\bm{p}_0\rangle} {\langle \Omega|\exp(t{\cal H})|\bm{p}_0\rangle}
575: \quad ; \quad
576: |0\rangle := \lim_{t\to \infty} |\bm{p}(t) \rangle
577: \end{equation}
578: where $|\bm{p}_0\rangle$ is the initial composition and
579: $4^{-N}|\Omega\rangle$ is the equidistribution of genotypes.
580: Note that the {\em equilibrium composition} of the genotype population
581: just corresponds to the {\em ground state} of the quantum chain on
582: the physical side (with a different `biological' normalization
583: $\langle \Omega|0\rangle = 1$).
584:
585:
586: \paragraph{Fitness} The {\em density of the mean fitness} (or mean
587: fitness per site) of the population is given by the expression
588: \begin{equation}
589: w(t) := N^{-1} \bar{r}(t) =
590: N^{-1} \langle\Omega|{\cal R}|\bm{p}(t)\rangle \;.
591: \end{equation}
592: Since
593: \begin{equation}
594: w := \lim_{t \to \infty} w(t) = N^{-1} \langle \Omega| {\cal R} | 0
595: \rangle = N^{-1} \frac{\langle 0| {\cal H} |0\rangle}{\langle 0| 0
596: \rangle}
597: \end{equation}
598: the {\em equilibrium} mean fitness (per site) is just given by the
599: (unique) largest eigenvalue of ${\cal H}$, corresponding to
600: $|0\rangle$. For an unconstrained population, $w$ also determines the
601: growth rate in the long-time limit. In the physical picture,
602: $(-w)$ is obviously just the {\em ground state energy} (per spin).
603:
604: Using ${\cal M} |\Omega\rangle = 0$, we derive for the time evolution
605: of the mean fitness
606: \begin{equation} \label{zeit}
607: \dot{w}(t) = V_r(t) + N^{-1}
608: \langle \Omega| [{\cal R},{\cal M}] | \bm{p}(t) \rangle
609: \end{equation}
610: where $V_r(t)$ is the {\em variance of fitness} (per site),
611: \begin{equation}
612: V_r(t) = \frac{1}{N}\left(\langle \Omega|{\cal R}^2|\bm{p}(t)\rangle
613: - \langle \Omega|{\cal R}|\bm{p}(t)\rangle^2 \right)\;.
614: \end{equation}
615: In the absence of mutation, (\ref{zeit}) is of course just a special case
616: of Fisher's ``Fundamental Theorem of Natural Selection'' \cite{Fish} which
617: states that the rate of increase in fitness is equal to the genetic
618: variance in fitness. For the mutation-selection models considered
619: here, the relation has the following intuitive interpretation:
620: The change in mean fitness is driven by two independent forces. The
621: first one stems from the change of genotype frequencies due to
622: selection and is proportional to the variance of fitness values
623: present in the population. Since variances are positive, it always
624: tends to increase fitness. The second term on the right hand side of
625: (\ref{zeit}) typically decreases fitness. It measures the population
626: mean of the change in fitness at time $t$ due to the action of mutation.
627: In mutation-selection equilibrium, both terms balance, and the entire
628: residual variance is due to mutation.
629:
630: \paragraph{Surplus} Another quantity that characterizes the genetic
631: order of the population, as it may be measured from sequence data, is
632: the {\em mean surplus}. We define, following and generalizing \cite{BBW},
633: \begin{equation}
634: u_i(t) = \sum_{\bm{\sigma}} s_i(\bm{\sigma}) p_{\bm{\sigma}}^{}(t)
635: \quad ; \quad
636: u_i = \lim_{t \to \infty} u_i(t) \;.
637: \end{equation}
638: In particular,
639: \begin{equation}
640: \#_m(t) := \frac{1}{4} \big(3 - (u_1(t)+u_2(t)+u_3(t))\big)
641: \end{equation}
642: measures the mean number of mutations per site relative to the wildtype while
643: \begin{equation}
644: \#_{tr}(t) := \frac{1}{2} \big( 1 - u_1(t) \big)
645: \end{equation}
646: denotes the mean number of transversions alone.
647: As a {\em biological order parameter}, the mean surplus plays a
648: similar r{\^o}le as the physical magnetization. However, as already
649: noted in \cite{BBW2}, both quantities are quite distinct and in many
650: cases not even easily related. In the language of the quantum chain,
651: the equilibrium mean surplus may be derived as
652: \begin{equation}
653: u_1 = \frac{\langle \Omega|\sum_i\sigma_i^{(z,0)}|0\rangle}{N}
654: \quad ;\quad
655: u_2 = \frac{\langle \Omega|\sum_i\sigma_i^{(0,z)}|0\rangle}{N}
656: \quad ;\quad
657: u_3 = \frac{\langle \Omega|\sum_i\sigma_i^{(z,z)}|0\rangle}{N}
658: \; ,
659: \end{equation}
660: whereas the three-component magnetization is defined as the ground
661: state expectation value
662: \begin{equation}
663: m_1 = \frac{\langle 0|\sum_i\sigma_i^{(z,0)}|0\rangle}
664: {N \langle 0|0\rangle} \quad;\quad
665: m_2 = \frac{\langle 0|\sum_i\sigma_i^{(0,z)}|0\rangle}
666: {N \langle 0|0\rangle} \quad ;\quad
667: m_3 = \frac{\langle 0|\sum_i\sigma_i^{(z,z)}|0\rangle}
668: {N \langle 0|0\rangle} \; .
669: \end{equation}
670: As we will show below, magnetization and surplus can show rather
671: different behaviour especially near phase transitions. The biological
672: and physical phase diagrams, however, coincide if phase transitions
673: (or error thresholds) are defined as nonanalyticity points of the
674: ground state energy (or mean fitness) $w$ in the thermodynamic limit
675: (cf.~the discussion in Section 5).
676:
677: \section{Results}
678:
679: \subsection{Fujiyama model}
680:
681: As in the two-letter case \cite{BBW}, the quantum chain model
682: decomposes into non-interacting one-site Hamiltonians for the
683: additive landscape. The mean fitness and its variance are linear
684: functions in the surplus components. In particular, we obtain from
685: (\ref{zeit})
686: \begin{equation}
687: V_r(t) = \dot{w}(t) + 2\big(
688: (\mu_1 +\mu_3) \alpha_1 u_1(t)
689: + (\mu_2 +\mu_3) \alpha_2 u_2(t) + (\mu_1 +\mu_2) \alpha_3 u_3(t)\big)
690: \;.
691: \end{equation}
692: For Jukes-Cantor mutation, $\mu_1 = \mu_2 = \mu_3 \equiv \mu$, this reduces to
693: \begin{equation}
694: V_r(t) = \left(4 \mu + \frac{\text{d}}{\text{d}t}\right) w(t)
695: \end{equation}
696: and $V_r$ is proportional to the mean fitness in the
697: mutation--selection equilibrium. Exact results are easily
698: found from the solution of the four-dimensional eigenvalue problem of
699: the one-site Hamiltonian. We only give the expression for the mean
700: fitness in the symmetric case, $\alpha_1 = \alpha_2 = \alpha_3 \equiv \alpha$
701: with Jukes-Cantor mutation scheme ($\mu_1 = \mu_2 = \mu_3 \equiv \mu$):
702: \begin{equation}
703: w(t) =
704: \frac{\exp[2t(\alpha+\mu)]\cosh[2tQ]\left(\alpha-2\mu+2Q\tanh[2tQ]\right)
705: -\alpha-4\mu}{1+\exp[2t(\alpha+\mu)]\cosh[2tQ]}
706: \end{equation}
707: where
708: \begin{equation}
709: Q = \sqrt{\mu^2+\alpha^2 -\alpha\mu}
710: \end{equation}
711: and the equidistribution of genotypes is chosen as starting configuration.
712:
713: Means and variances of the fitness and the surplus in
714: mutation--selection balance are shown in Fig.~\ref{finite} below.
715: A plot of the time evolution of fitness is given in Fig.~\ref{time2}.
716: There is clearly no phase transition (resp.~no {\em error threshold}
717: behaviour) for the additive Fujiyama landscape, as expected in view of
718: the complete absence of interactions (resp.\ epistasis).
719:
720:
721: %\begin{equation}
722: %w = \alpha \left(2\sqrt{\left(\frac{\mu}{\alpha}\right)^2 -
723: %\frac{\mu}{\alpha} + 1} - 2\frac{\mu}{\alpha} +1\right)
724: %\end{equation}
725:
726:
727: \subsection{Quadratic fitness model: Equilibrium results}
728:
729: In contrast to the additive case, no simple relation between surplus
730: and fitness is known in the case of the quadratic landscape as
731: long as $t$ or $N$ are kept finite. However, due to the permutation
732: invariance of the Hamiltonian, the individual fitness--surplus
733: relation (\ref{fit}) is recovered in the thermodynamic limit
734: for the corresponding mean values of the equilibrium population.
735: We obtain in analogy to \cite{BBW2}:
736: \begin{equation} \label{surrel}
737: w = \lim_{t \to \infty} w(t) = \sum_{i=1}^3 \left(\alpha_i u_i
738: + \frac{\gamma_i}{2} u_i^2 \right)
739: \end{equation}
740: and, from (\ref{zeit}), for the equilibrium variance of fitness per site
741: \begin{multline} \label{variance}
742: V_r = \lim_{t \to \infty} V_r(t) =
743: 2(\mu_1+\mu_3)\left(\alpha_1 u_1 + \gamma_1 u_1^2\right) +
744: \\
745: 2(\mu_2+\mu_3)\left(\alpha_2 u_2 + \gamma_2 u_2^2\right) +
746: 2(\mu_1+\mu_2)\left(\alpha_3 u_3 + \gamma_3 u_3^2\right)\;.
747: \end{multline}
748: %\begin{equation}
749: %\textswab{h}= \mu_1\sigma^{(x,0)}+\mu_2 \sigma^{(0,x)} +\mu_3 \sigma^{(x,x)} +%\gamma_1 m_1 \sigma^{(z,0)} + \gamma_2 m_2 \sigma^{(0,z)} + \gamma_3 m_3
750: %\sigma^{(z,z)}
751: %\end{equation}
752: The key to the solution in the thermodynamic limit is now the minimum
753: principle of the physical free energy which translates to a maximum
754: principle for the equilibrium mean fitness. Maximizing
755: \begin{equation}
756: \langle \bm{x} | {\cal M} + {\cal R} | \bm{x} \rangle -
757: w \big(\langle \bm{x} |\bm{x} \rangle -1\big)
758: \end{equation}
759: with respect to $w$ and $\bm{x}$, we obtain, taking permutation symmetry of
760: $\bm{x}$ into account, the following variational expression for $w$:
761: \begin{equation} \label{fitm}
762: \begin{align} \nonumber
763: w(\bm{\alpha},\bm{\mu}&,\bm{\gamma}) \;\; =
764: \sup_{m_1,m_2,m_3} \bigg[\alpha_1 m_1 +
765: \alpha_2 m_2 + \alpha_3 m_3 + \frac{\gamma_1}{2} m_1^2 +
766: \frac{\gamma_2}{2} m_2^2 + \frac{\gamma_3}{2} m_3^2 +
767: \\ \nonumber
768: &\frac{\mu_1}{2}
769: \left(\sqrt{(1+m_2)^2-(m_1+m_3)^2}+\sqrt{(1-m_2)^2-(m_1-m_3)^2}-2\right)+
770: \\ \nonumber
771: &\frac{\mu_2}{2}
772: \left(\sqrt{(1+m_1)^2-(m_2+m_3)^2}+\sqrt{(1-m_1)^2-(m_2-m_3)^2}-2\right)+
773: \\
774: &\frac{\mu_3}{2}
775: \left(\sqrt{(1+m_3)^2-(m_1+m_2)^2}+\sqrt{(1-m_3)^2-(m_1-m_2)^2}-2
776: \right)\bigg]
777: \end{align}
778: \end{equation}
779: where $m_i \in [-1,1]$ are the components of the physical
780: magnetization. Let us stress that, from the biological point of view,
781: the translation to the physical framework seems a necessary technical
782: step since we do not know of any variational principle for the
783: biological model which works directly in $L^1$. We now take a closer
784: look at two special cases.
785:
786: \paragraph{Symmetric fitness model} For the symmetric
787: {\em wildtype--mutant} model with $\alpha_i \equiv \alpha$,
788: $\gamma_i \equiv \gamma$ and Jukes-Cantor mutation rate $\mu$,
789: all components of the order parameters are equal,
790: $m_i \equiv m$ and $u_i \equiv u$, respectively.
791: Here, the variational expression (\ref{fitm}) for $w$ leads to the
792: following self-consistency condition for $m$:
793: \begin{equation} \label{sc}
794: m = \frac{1}{3}\left[ 1 + \frac{2(\alpha + \gamma m) - \mu}
795: {\sqrt{(\alpha + \gamma m)^2 - \mu(\alpha + \gamma m) + \mu^2}}\right]\;.
796: \end{equation}
797: This is a quartic equation in $m$ and can be solved using the
798: standard formulas. However, since the explicit solution is rather
799: lengthly, we do not include it here, but give a qualitative
800: discussion instead.
801:
802: Obviously, the relation has a unique real solution for any $\alpha$ and
803: $\mu$ whenever $\gamma$ is {\em negative}. Like in the case of the
804: two-state model, we thus obtain no phase transition for positive
805: epistasis. In the following, we therefore concentrate our discussion
806: on positive $\gamma$ (or negative epistasis). Note that, for
807: calculations in the thermodynamic limit, always the fitness function $f$
808: (\ref{fit}), and hence the reproduction matrix ${\cal R_\gamma}$
809: (\ref{rgamma}), can be used instead of the truncated form $\tilde{f}$
810: (\ref{fit2}), since the frequencies of genotypes with negative surplus
811: vanish. For $\alpha_i \equiv 0$, this is due to spontaneous breaking of
812: the extra $C_2 \times C_2$ symmetry of
813: ${\cal H} = {\cal M} + {\cal R_\gamma}$.
814:
815: In contrast to the two-state model, where a phase transition in the
816: thermodynamic limit is only found for zero external field, it turns
817: out that the present model has phase transitions for a whole range of
818: the linear fitness parameter $\alpha$ when epistasis is negative:
819: For $\tilde{\alpha} := \alpha/\gamma$ in the interval
820: \begin{equation}
821: 0 \le \tilde{\alpha} < \frac{1}{3}
822: \left(\sqrt{\frac{4}{3}}-1\right) \simeq 0.0515668
823: \end{equation}
824: we find a first order phase transition of the system at
825: \begin{equation}
826: \tilde{\mu} := \frac{\mu}{\gamma} = \tilde{\mu}_c = \frac{2}{3}
827: + 2 \tilde{\alpha}
828: \end{equation}
829: with a finite jump in the magnetization from $m_+$ to $m_-$ where
830: \begin{equation}
831: m_\pm = \frac{1}{3}\left(1 \pm
832: \sqrt{1 - 27 \tilde{\alpha}^2 - 18\tilde{\alpha}}\right)\;.
833: \end{equation}
834: From $m$ we derive the mean fitness $w$ using (\ref{fitm}), from $w$
835: we obtain the surplus $u$ via (\ref{surrel}) and, finally, the variance of
836: the fitness $V_r = 12\mu(\alpha u +\gamma u^2)$.
837: Looking at the surplus $u$, we also find a phase transition at
838: $\tilde{\mu}= \tilde{\mu}_c$. As $m$, it vanishes in the disordered
839: phase for $\alpha = 0$. Note however that, since $w$ is continuous,
840: due to the relation (\ref{surrel}), also the surplus is continuous at a phase
841: transition. In \cite{BBW2} it has been shown that these differences of the
842: biological and physical order parameters arise with the change from classical
843: to quantum mechanical probabilities (resp.\ the change from $L^1$ to $L^2$)
844: in translating the biological model into the physical one. We remark
845: that a different, discontinuous behaviour of the biological order
846: parameter at a (physical) first order transition has been observed for
847: the sharply peaked landscape in Eigen's quasispecies model \cite{FP}.
848: Mean fitness and its variance, magnetization, and surplus for different
849: values of $\alpha$ are shown below in Fig.~\ref{JC}.
850:
851: \begin{figure}[th]
852: \centerline{\epsfxsize=65mm \epsfysize=55mm \epsfbox{symfit.ps}
853: \epsfxsize=65mm \epsfysize=55mm \epsfbox{symvarfit.ps}}
854: \centerline{\epsfxsize=65mm \epsfysize=55mm \epsfbox{symsur.ps}
855: \epsfxsize=65mm \epsfysize=55mm \epsfbox{symmag.ps}}
856: \caption{Mean fitness and its variance, surplus and magnetization in
857: the symmetric fitness model for various linear parts of the fitness
858: function in the infinite sites limit.}
859: \label{JC}
860: \end{figure}
861:
862:
863:
864: \paragraph{Transition--transversion model} In our second example, we
865: wish to distinguish mutations between like and unlike nucleotides. In
866: a first step, we retain the symmetric fitness landscape
867: $\gamma_1 = \gamma_2 = \gamma_3 \equiv \gamma$ (for simplicity
868: with vanishing linear part $\alpha = 0$), but let the relative
869: frequencies of transitions and transversions differ by assuming the
870: {\em Kimura 2 parameter} mutation scheme,
871: $\mu_1 = \mu_3 \equiv \mu \neq \mu_2$.
872:
873: \begin{figure}[ht]
874: \centerline{\epsfysize=60mm \epsfbox{nor1.ps}}
875: \caption{Phase diagram of the transition--transversion model with
876: with symmetric fitness landscape and Kimura 2 parameter mutation
877: scheme. Solid and dotted lines correspond to first and second order
878: phase transitions, respectively. The dashed line indicates the
879: Jukes-Cantor mutation scheme.}
880: \label{pd1}
881: \end{figure}
882: In the extended parameter space of the reduced mutation rates
883: $\tilde{\mu} = \mu/\gamma$; $\tilde{\mu}_2 = \mu_2/\gamma$, we now
884: obtain a phase diagram with {\em three} distinct phases
885: (see Fig.~\ref{pd1}).
886: \begin{itemize}
887: \item
888: For $\tilde{\mu}$ and $\tilde{\mu}_2$ sufficiently small,
889: all three surplus components
890: are positive, indicating genetic order with respect to the entire
891: 4-letter alphabet of the nucleotides: {\em ACGT phase}.
892: \item
893: If we increase the mutation rate $\tilde{\mu}_2$ for low $\tilde{\mu}$,
894: the system crosses over to a phase which does no longer distinguish
895: between the different kinds of purins (A,G) and pyrimidins (C,T), but
896: is still ordered with respect to transversions. This is the limiting
897: case described by the two-state model. We call this the {\em PP phase}.
898: \item
899: For higher mutation rates $\tilde{\mu},\tilde{\mu}_2$, we finally enter a
900: completely {\em disordered phase} with vanishing fitness and surplus.
901: \end{itemize}
902: In a second step, we now also let the mutation effects of transitions
903: and transversions differ and assume a fitness landscape
904: with $\gamma_2 = \gamma_3 \equiv \gamma$, but $\gamma_1 \neq \gamma$
905: in general. The changes in the phase diagram for increasing
906: $\tilde{\gamma}_1 = \gamma_1/\gamma$ are shown in Fig.~\ref{pd2}.
907: The phase transitions between the three phases may be first or second
908: order. In general, we obtain the following phase space structure:
909: \begin{itemize}
910: \item
911: Phase transitions between the disordered and PP phase are second order and
912: located on the line $\tilde{\mu} = \tilde{\gamma}_1/2$. This phase
913: transition corresponds to the one also seen in the two-state model \cite{BBW}.
914: \item
915: The phase transition line between the ACGT and PP phases in
916: general changes from first to second order with increasing
917: mutation rate $\tilde\mu_2$ (see Figs.~\ref{pd1}, \ref{pd2}).
918: For the second order transitions we derive, on
919: expanding (\ref{fitm}) to lowest order in $m_2 = m_3$,
920: \begin{equation}
921: \mu = \frac{\gamma_1}{\gamma_1 + 2\gamma}
922: \sqrt{(\gamma_1 + \mu_2)(2\gamma-\mu_2)} \;.
923: \end{equation}
924: Numerically, we find that the first order transitions are
925: located on a straight
926: line up to $\tilde{\mu} = \tilde{\gamma}_1/2$ where the PP phase
927: changes into the disordered phase. The $\tilde{\mu}_2$-interval of
928: first-order transitions decreases for increasing $\tilde{\gamma}_1$.
929: For $\tilde{\gamma}_1 \gtrapprox 8.45$, all phase transitions
930: between the ACGT and PP phases are second order.
931: \item
932: Finally, for $\tilde{\gamma}_1 \le 4$, there are direct first order
933: phase transitions between the ACGT phase and the disordered phase
934: (for $\tilde{\mu}_2$ sufficiently small). For higher values of
935: $\tilde{\gamma}_1$, these two phases are separated by the PP phase.
936: \end{itemize}
937:
938: \begin{figure}[ht]
939: \centerline{\epsfxsize=43mm\epsfysize=35mm \epsfbox{pdg2.ps}
940: \epsfxsize=43mm\epsfysize=35mm \epsfbox{pdg4.ps}
941: \epsfxsize=43mm\epsfysize=35mm \epsfbox{pdg10.ps}}
942: \caption{Phase diagrams for anisotropic fitness landscapes $\gamma_1 >
943: \gamma_2 = \gamma_3 \equiv \gamma$ and Kimura 2 parameter mutation
944: scheme. Solid and dotted lines correspond to first and second order
945: phase transitions, respectively.}
946: \label{pd2}
947: \end{figure}
948: As for the symmetric fitness function discussed above, there are no
949: compact analytic expressions for the fitness or the surplus in the
950: ACGT phase. In the PP phase, however, the following values for
951: the mean fitness and the non-zero components of the mean surplus and the
952: magnetization are found:
953: \begin{equation}
954: w = \frac{\gamma_1}{2} \left(1 - \frac{2\mu}{\gamma_1}\right)^2 \quad ; \quad
955: u_1 = 1 - \frac{2\mu}{\gamma_1} \quad ; \quad
956: m_1 = \sqrt{1- \left(\frac{2\mu}{\gamma_1}\right)^2}\;.
957: \end{equation}
958: The variance in fitness per site, finally, is proportional to the mean
959: fitness in the PP phase: $V_r = 8 \mu w$. Note that all these
960: expressions are independent of the transition rate $\mu_2$ and
961: directly comparable to the results of the two-state model
962: \cite{BBW,WBG} by idebtifying $\{++,+-\}$ with `$+$' and $\{-+,--\}$
963: with `$-$'.
964:
965:
966: \subsection{Quadratic fitness model: Finite sequence length} \label{fs}
967:
968: For the Fujiyama model with independent sites, all the quantities
969: calculated here, means and variances per site in infinite populations,
970: are independent of the assumed length $N$ of the sequences.
971: This is no longer the case for models including epistasis. In this
972: subsection, we therefore present a quick numerical investigation of the
973: symmetric fitness model
974: for finite system sizes and compare the results with those in the
975: thermodynamic limit. Since the frequencies of genotypes with negative
976: values of the surplus no longer vanish for finite sequences, we use
977: the truncated fitness function (\ref{fit2}), with $\gamma_i \equiv
978: \gamma > 0$ and $\alpha_i = 0$ for our calculations.
979:
980: All results are obtained by direct numerical solution of the eigenvalue
981: problem in the $[(N+1)(N+2)(N+3)/6]$-dimensional vector space of
982: permutation invariant population vectors. Numerically precise
983: calculations have been performed up to $N = 60$ (39711-dim.), the results
984: are shown in Fig.~\ref{finite}. It is seen that the mean surplus and
985: the mean and the variance of the fitness rapidly approach the limiting
986: curves and behave qualitatively different from the Fujiyama model
987: even for very small system sizes. We also show the finite-size
988: behaviour of the variance of the surplus $V_s$. Since this quantity
989: vanishes as $1/N$, it is not obtainable from the leading order terms
990: in the thermodynamic
991: limit. In our finite size calculations, we rescale $V_s$ with the
992: sequence length to obtain comparable results. Whereas $V_s$ is
993: monotonously increasing for the additive model (where $N V_s = 1-
994: u^2$), it runs through a maximum for quadratic fitness. Note that this
995: maximum, in contrast to the variance of fitness, is located directly
996: at the error threshold. The behaviour is qualitatively similar to the
997: two-state model \cite{Oli}.
998:
999: \begin{figure}[ht]
1000: \centerline{\epsfxsize=65mm \epsfysize=55mm \epsfbox{fit2.ps}
1001: \epsfxsize=65mm \epsfysize=55mm \epsfbox{varfit.ps}}
1002: \centerline{\epsfxsize=65mm \epsfysize=55mm \epsfbox{sur.ps}
1003: \epsfxsize=65mm \epsfysize=55mm \epsfbox{varsur.ps}}
1004: \caption{Equilibrium behaviour of fitness and surplus of the symmetric
1005: fitness model with finite sequence length. Results for the Fujiyama
1006: model with scaling $\alpha = \gamma/2$ are also shown.}
1007: \label{finite}
1008: \end{figure}
1009: Since there has been some discussion recently on the correct scaling
1010: of fitness values and mutation rates with the length of the sequence (cf
1011: \cite{FP,BG}), let us finally remark that the finite size results in
1012: this and the next section show that our choice, keeping fitness and
1013: mutation rate {\em per site} fixed, is adequate for all quantities
1014: considered here.
1015:
1016:
1017: \subsection{Quadratic fitness model: Time evolution}
1018:
1019: Originally, the error threshold has been defined as an equilibrium
1020: phenomenon (cf \cite{ECS,BG}): For special forms of the fitness
1021: landscape, there is a finite critical value $\mu_c$ of the mutation
1022: rate beyond which genetic order is no longer maintained by selection.
1023: For the four-state model with quadratic fitness, this situation has been
1024: discussed above.
1025: However, for a suitable fitness function, the threshold
1026: is not necessarily connected with high mutation rates.
1027: In this subsection,
1028: we consider the relaxation of a non-equilibrium population to
1029: mutation-selection balance. It turns out that, depending on the
1030: starting configuration, an even stronger threshold effect may be
1031: observed in the time evolution of the fitness and the surplus for
1032: all mutation rates below the critical equilibrium value.
1033:
1034: \paragraph{Zero-mutation limit of the transition-transversion model}
1035: The essence of the threshold phenomenon in the time evolution is
1036: already contained in the selection dynamics alone. In a first step, we
1037: therefore disregard mutation altogether by working in the
1038: zero-mutation limit. Obviously, we then deal with a classical
1039: mean-field model on the physical side. As our starting configuration,
1040: we choose the completely unstructured population with an equidistribution of
1041: genotypes $|\bm{p}_0\rangle = 4^{-N}|\Omega\rangle$.
1042: In this particular situation, some progress is possible also
1043: analytically. Noting that
1044: \begin{equation}
1045: \langle \hat{C} \rangle(t) =
1046: \frac{\langle \Omega|\hat{C} \exp(t {\cal
1047: R})|\Omega\rangle}{\langle \Omega|\exp(t {\cal R})|\Omega\rangle}
1048: = \frac{\text{tr}(\hat{C} \exp(t {\cal R}))}{\text{tr}(\exp(t {\cal R}))}
1049: \end{equation}
1050: for any element $\hat{C}$ of the algebra generated by
1051: $\{\sigma_i^{(z,0)},\sigma_i^{(0,z)}\}$, the biological and physical
1052: pictures coincide in this case. Using the fitness function
1053: of the transition-transversion model with
1054: $\gamma_2 = \gamma_3 \equiv \gamma > 0$, we obtain the
1055: following implicit equations for the time evolution of the surplus
1056: components:
1057: \begin{eqnarray}
1058: u &=& \frac{\sinh(2\gamma t u)} {\cosh(2\gamma t u) +
1059: \exp[ -2\gamma_1 t(2u\coth(2\gamma t u) -1)]}
1060: \\[1mm]
1061: u_1 &=& \frac{\cosh[\gamma t Q(u_1)] - \exp(-2\gamma_1 t u_1)}
1062: {\cosh[\gamma t Q(u_1)] + \exp(-2\gamma_1 t u_1)}
1063: \end{eqnarray}
1064: where
1065: \begin{equation}
1066: Q(u_1) = \sqrt{(1+u_1)^2-\exp(4\gamma_1 t u_1)(1-u_1)^2}\;.
1067: \end{equation}
1068: The resulting dynamical phase diagram is shown in Fig.~\ref{time1}.
1069: As in the equilibrium situation, there are three phases.
1070: Depending on the ratio $\tilde{\gamma}_1 = \gamma_1/\gamma$, the
1071: system directly crosses to an ordered phase after a sharply defined
1072: waiting time $t_c$, or performs two consecutive transitions, entering
1073: the PP phase in the first one.
1074:
1075: \begin{figure}[ht]
1076: \centerline{\epsfxsize=65mm \epsfysize=55mm \epsfbox{zeitpd.ps}
1077: \epsfxsize=65mm \epsfysize=55mm \epsfbox{surzg2.ps}}
1078: \caption{Dynamical phase diagram of the transition-transversion model
1079: for vanishing mutation starting from the equidistribution. (Solid:
1080: first order; dashed: second order transition). Right: Time
1081: evolution of the surplus components for $\tilde{\gamma}_1 = 2$.}
1082: \label{time1}
1083: \end{figure}
1084: As in the equilibrium phase diagram, the dynamical transitions may
1085: be of first or second order.
1086: \begin{itemize}
1087: \item
1088: Second order transitions are located at
1089: $\tilde{t} = \gamma t = 1$ for $\tilde{\gamma} \le 1/4$ and at
1090: $\tilde{t} = 1/\tilde{\gamma}_1$ for the transition from the
1091: disordered phase to the PP phase. The transition from the PP phase to
1092: the ACGT phase is second order above $\tilde{\gamma}_1 \approx 1.9009$
1093: and implicitly given through $2\tilde{t}_c = 1 +
1094: \exp[2\tilde{\gamma}_1(\tilde{t}_c - 1)]$. A similar second order
1095: transition (with a one-component order parameter) has also been
1096: observed in the two-state model \cite{Wag,WBG}.
1097: \item
1098: In an interval around the symmetry point $\gamma_1 = \gamma$, the
1099: system possesses a first order transition (in the sense that there is a
1100: finite jump in the magnetization). Note that, in contrast to the
1101: equilibrium
1102: case, also the surplus and even the mean fitness are discontinous on
1103: this line, giving rise to a rather pronounced threshold effect in the
1104: evolution dynamics (cf.\ the solid line in Fig.~\ref{time2}
1105: for $\tilde{\gamma} = 1$).
1106: \end{itemize}
1107: As for the equilibrium values, we also consider the effect of finite
1108: sequence lengths on the time evolution. Again, calculations are
1109: performed by direct diagonalization of the symmetric fitness model
1110: ($\tilde{\gamma} = 1$). Fig.~\ref{time2} shows how the jump
1111: discontinouity in the mean fitness (internal energy) and the
1112: delta function singularity in the variance of the fitness (specific heat)
1113: are approached by the finite systems. A threshold phenomenon is absent
1114: in the time evolution of the Fujiyama model which is also shown
1115: in Fig.~\ref{time2}.
1116: \begin{figure}[ht]
1117: \centerline{\epsfxsize=65mm \epsfysize=55mm \epsfbox{fitzh0.ps}
1118: \epsfxsize=65mm \epsfysize=55mm \epsfbox{varfitzh0.ps}}
1119: \caption{Time evolution of the equidistribution of genotypes
1120: in the zero mutation-limit of the symmetric fitness model for different
1121: sequence lengths.}
1122: \label{time2}
1123: \end{figure}
1124:
1125:
1126:
1127:
1128: %\begin{figure}[ht]
1129: %\centerline{\epsfxsize=60mm \epsfysize=50mm \epsfbox{zvarfit1.ps}
1130: %\epsfxsize=60mm \epsfysize=50mm \epsfbox{zvarfit2.ps}}
1131: %\caption{Time evolution of the symetric fitness model for different
1132: % starting configurations.}
1133: %\label{time1}
1134: %\end{figure}
1135:
1136:
1137: \paragraph{Finite mutation rates and different starting configurations}
1138: In a last step, we now discuss the influence of the mutation rate and
1139: the starting configuration on the evolution dynamics. Consider first the
1140: time evolution of the equilibrium distribution of genotypes
1141: $4^{-N}|\Omega\rangle$. Although no analytical results are available here,
1142: we may give the following intuitive argument that there is a phase
1143: transition at finite $t = t_c$ for any mutation rate below the
1144: critical equilibrium mutation rate $\mu_c$: Since mutation alone tries to
1145: keep the population in the equilibrium distribution, the evolution
1146: dynamics will be slowed down by mutation for small $t$. In particular,
1147: mean fitness and surplus will remain zero on a finite interval at
1148: least up to the threshold value of the corresponding model with
1149: vanishing mutation. On the other hand, the limiting values of $w$ and
1150: $u$ are finite for $\mu < \mu_c$, giving rise to a non-analytical
1151: point of $w(t)$ and $u(t)$ at some finite $t = t_c$. As shown in the
1152: upper graph of Fig.~\ref{time3}, this behaviour is clearly visible in
1153: numerical results for finite sequence sizes.
1154: \begin{figure}[ht]
1155: \centerline{\epsfxsize=130mm \epsfysize=45mm \epsfbox{zvarfit1a.ps}}
1156: \vspace*{-12mm}
1157: \centerline{\epsfxsize=130mm \epsfysize=90mm \epsfbox{zvarfit2b.ps}}
1158: \caption{Time evolution of the variance of the fitness in the symmetric
1159: fitness model with sequence length $N=60$. Results are shown for
1160: varying mutation rates and two different starting configurations.}
1161: \label{time3}
1162: \end{figure}
1163:
1164: In order to contrast the time evolution of the unstructured population with
1165: an equidistribution of genotypes as starting configuration, we have
1166: also performed calculations for the opposite case of a population with
1167: initially homogeneous phenotypes. Here, at $t=0$, any "individual"
1168: in the population has the same value $s_i = 0$ for the three surplus
1169: components. The result (for finite sequence length $N=60$) is shown
1170: in the lower viewgraph of Fig.~\ref{time3}. As for the
1171: equidistribution of
1172: genotypes, there is a clear threshold effect in the time evolution for
1173: any finite value $0<\mu<\mu_c$ of the mutation rate. The transition
1174: appears to be particularly sharp for small mutation rates. In contrast to the
1175: unstructured case, the critical waiting time $t_c$ for the transition
1176: is no longer monotonously increasing with the mutation rate $\mu$, but
1177: is separated in two regimes: For mutation rates near the equilibrium
1178: threshold value $\mu_c$, the situation is similar to the unstructured
1179: case: Here, single mutants with higher fitness appear in the
1180: population after a short while. Due to the continuing mutation
1181: pressure, however, a certain time is needed for these fitter
1182: individuals to grow to a finite proportion and to dominate the mean
1183: values in the infinite population. For small $\mu$, on the other hand,
1184: the critical waiting time $t_c$ is dominated by the time needed for
1185: mutation to explore the configuration space and to generate
1186: individuals with higher fitness at a sufficient rate.
1187:
1188:
1189: \section{Discussion}
1190:
1191: When in \cite{BBW} a class of models for sequence space evolution was
1192: introduced, using the framework of Ising quantum chains, the calculations
1193: started with four major simplifications of the biological situation.
1194: These are the consideration of a two-state model, the assumption of an
1195: infinite sequence length, the use of simplistic fitness landscapes,
1196: and the restriction on infinite population sizes. In this paper, we
1197: have looked at the first two of these simplifying assumptions.
1198: Finally, an extended discussion of the evolution dynamics of these
1199: models has also been presented. In the following paragraphs, we give
1200: a summary of our findings and an outlook on the remaining open problems.
1201:
1202: \paragraph{Two-state versus four-state models.}
1203: The main concern of this contribution is the generalization of the
1204: modelling framework, introduced in \cite{BBW}, to four states
1205: (corresponding to the four nucleotides) on each site. The
1206: generalization presented makes use of the $C_2 \times C_2$ symmetry
1207: inherent in the {\em Kimura 3 ST} mutation scheme. On the `physical
1208: side' this leads to a model of two coupled Ising quantum chains
1209: (rather than to a four-state Potts model). Compared with the two-state
1210: model, the extension can be thought of as consisting of two steps. In
1211: a first step, we represent the four states on each site by the spin
1212: values of two spins in decoupled chains. Note that already in this
1213: simplified model three phases occur in the phase diagram since the
1214: transition lines of the two decoupled chains will not in general
1215: coincide. The second step consists of the introduction of
1216: a more realistic mutation scheme which also changes the configuration
1217: space topology and the corresponding use of a refined fitness landscape.
1218: Both these extensions lead to a coupling of the chains, and an even
1219: richer phase space structure is found, including first-order transitions.
1220: As may be seen from the introduction of a small linear field term into the
1221: fitness function in subsection 4.2, this change of the transition to
1222: first order leads to an increased robustness of the threshold
1223: phenomena with respect to symmetry-breaking perturbations.
1224:
1225: \paragraph{Finite sequence length.}
1226: Typical sequence lengths of enzymes or viruses are of the order $10^3$
1227: -- $10^4$. While these numbers are certainly far off the typical sizes of
1228: macroscopic systems in physics, they are, in principle, large enough
1229: to successfully supress $1/N$-corrections. However, especially models
1230: with simple fitness landscapes describe -- at best -- the evolution
1231: dynamics in a very restricted configuration space of particularly
1232: `important' sites, disregarding neutral or altogether lethal
1233: mutations. In view of this fact, consideration of finite sequence
1234: lengths is indispensible and calculations in the thermodynamic
1235: limit even seem to be questionable at first sight. In order to clarify
1236: the usefulness of infinite-size methods in this context, we performed
1237: a number of numerical calculations for finite sequence lengths. The
1238: results are quite encouraging. As shown in subsection \ref{fs}, the
1239: characteristic properties of the thermodynamic limit are well visible
1240: even for tiny sequence sizes, such as $N = 10$, and the approximation
1241: is already quantitatively reasonable for sequences of length $60$.
1242:
1243: \paragraph{The fitness landscape.}
1244: The construction of a tractable fitness landscape which nevertheless
1245: comprises the relevant biology is certainly the major task for all
1246: these models. In this contribution, in order to obtain at least some
1247: analytical
1248: results, we have chosen a fitness function from the smooth end of the
1249: landscape zoo. Due to its permutation invariance, the quadratic
1250: fitness function effectively disregards any local variance in
1251: the interaction between sites, but only considers the average epistatic
1252: effect. As such, it is in many respects certainly no more than a
1253: toy-model for evolution. However, the assumption of permutation
1254: invariance of the sites is quite common in evolutionary biology and
1255: comprises a large number of standard models for evolution, such as the
1256: quadratic optimum model or Eigen's original sharply peaked landscape.
1257: The results show that the essential structure responsible
1258: for characteristic effects such as the error threshold is already
1259: contained in this simplified framework and may
1260: also serve as a reference for future work on fitness functions
1261: with increased ruggedness, such as the NK-landscape hierarchy \cite{KL}.
1262: Here, we expect the results for the quadratic fitness model to be
1263: qualitatively stable at least under certain forms of mild ruggedness,
1264: such as the introduction of site-randomness in the fields and
1265: interactions \cite{DK}. Pronounced changes, on the other hand, should
1266: be expected when spin-glass effects come into play.
1267:
1268: \paragraph{Finite population size.}
1269: In going from the deterministic limit to the evolution of finite
1270: populations, the ordinary differential equation (\ref{paramuse}) has
1271: to be replaced by the master equation of a stochastic process which is
1272: no longer covered by the theoretical framework presented in this
1273: article. Due to the complexity of the stochastic equations, analytical results
1274: seem to be out of reach at present for all but the simplest selection
1275: schemes. Monte-Carlo simulations, however, should be possible and
1276: could considerably add to theoretical insight here.
1277:
1278: Although the general picture of the deterministic case should persist
1279: at least for sufficiently large populations, the study of finite
1280: population effects is certainly of importance.
1281: For related models, such as the quasispecies model with the
1282: {\em single peaked} landscape, it is has been found \cite{NS}
1283: that the deterministic
1284: results can be interpreted as the time averages of the stochastic
1285: process for mutation rates outside a certain interval around an error
1286: transition. Directly at the threshold, however, large fluctuations and
1287: a jump in the long-time averages appear in the stochastic system at a critical
1288: mutation rate which seems to be lower by an amount roughly
1289: proportional to $1/\sqrt{N}$ in comparison with the deterministic case.
1290: Mainly because of these expected finite population effects we have
1291: restricted discussions in this article entirely to the phase space
1292: structure of the models and the order of the phase transitions. Any further
1293: details of the transitions, even critical exponents, will presumably
1294: never be visible in real biological systems and thus seem to be
1295: of limited relevance in this context.
1296:
1297: Let us finally remark that, although biological populations are
1298: certainly finite, the consideration of the infinite population limit
1299: is not (only) a technical necessity, but also of direct importance for the
1300: study of the error threshold. That is so because this effect, in distinction
1301: to the phenomenon of Muller's ratchet, is {\em by definition} not due to
1302: genetic drift, but solely due to the form of the fitness function. It
1303: has thus always to be shown that the threshold effect persists even
1304: for infinitly large population sizes.
1305:
1306:
1307: \paragraph{Error threshold behaviour.}
1308:
1309: Since there are more than one and sometimes conflicting definitions of
1310: the error threshold in literature (cf.\ the discussion in \cite{BG}),
1311: let us start this paragraph with a few clarifying remarks. In this
1312: article,
1313: following \cite{BG}, we use the notion of the error threshold as
1314: equivalent to phase transitions. As such, a clear-cut mathematical
1315: definition (as non-analytical points in the mean fitness) is possible
1316: only in the infinite sites (or thermodynamic) limit. However, since
1317: the thermodynamic limit can be considered as an excellent
1318: approximation already for rather small systems, the infinite system
1319: property gives a valid explanation for prominent features which are
1320: observable for finite sequences as well. In our study, we have always
1321: considered sequences of a fixed length and have treated the mutation
1322: rate per site as the variable driving the transition. In comparing
1323: systems of different length, we have scaled the variables such that a
1324: well-defined limit is approached as $N \to \infty$. In particular, the
1325: `critical' mutation rate per site in a finite system quickly converges
1326: to the limiting value $\tilde{\mu}_c$.
1327: Originally, the threshold has been viewed as a limitating factor on
1328: the sequence length \cite{E}. This, however, should not be confusing:
1329: We switch to this latter picture simply by letting the reduced
1330: mutation rate depend linearly on the sequence length,
1331: $\tilde{\mu} \sim N$, and obtain a critical length
1332: $N_c \sim \tilde{\mu}_c$ (for sufficiently large sequences).
1333:
1334: Our results on the error threshold phenomenon fit previous ones for
1335: the two-state case and related models in that negative epistasis is
1336: needed to observe a transition (cf.\ \cite{W,BG}).
1337: Contrary to the two-state case, the threshold corresponds to a
1338: first-order transition for certain parameter ranges and persists for
1339: a sufficiently small linear part in the fitness function. Both, the
1340: equilibrium and the dynamical phase diagram of the
1341: transition-transversion model (with $\alpha_i = 0$),
1342: possess two ordered phases characterized by non-zero values of one or
1343: all three components of the surplus order-parameter and the disordered
1344: phase with zero surplus where selection ceases to operate. The
1345: threshold effect appears to be especially sharp in the evolution
1346: dynamics, where a jump in the mean surplus and fitness and a delta
1347: singularity in the variance of fitness occurs.
1348:
1349: Besides the threshold effect, however, other properties of
1350: mutation-selection models may be studied within the framework
1351: presented. After all, exclusive concentration on phase
1352: transitions is perhaps too much a physicist's point of view on these
1353: systems. The relations between surplus, mutation rate and the variance of
1354: fitness (\ref{zeit}), (\ref{variance}), for example, are valid for the entire
1355: time evolution and arbitrary mutation rates. Depending on the fitness
1356: function applied, they may give rise to characteristic features also
1357: far off the transition point. This is particularly explicit for the
1358: equilibrium variance of fitness which runs through a pronounced
1359: maximum for fitness functions with negative epistasis at a mutation
1360: rate much smaller than the threshold value.
1361:
1362: \section*{Acknowledgments}
1363:
1364: It is our pleasure to thank Ellen Baake and Oliver Redner for numerous
1365: discussions and comments on the manuscript. Financial support from the
1366: German Science Foundation (DFG) is gratefully acknowledged.
1367:
1368: %\appendix{Threshold criterion for the symmetric model}
1369:
1370: %In the
1371:
1372: %\begin{equation}
1373: %f(\bm{\sigma}) := 3N \sum_{n=0}^\infty \left(\frac{c_n^{}}{n}
1374: %s^n(\bm{\sigma}) \right) \;;\quad s_1 = s_2 = s_3 = s \;.
1375: %\end{equation}
1376:
1377: %\begin{equation}
1378: %hkgjkgh
1379: %\end{equation}
1380:
1381:
1382:
1383:
1384:
1385: \begin{thebibliography}{99}
1386: \bibitem{B}
1387: E.\ Baake,
1388: Diploid models on sequence space,
1389: {\it J.\ Biol.\ Syst.\/} {\bf 3} (1995) 343--9.
1390: \bibitem{BBW}
1391: E.\ Baake, M.\ Baake and H.\ Wagner,
1392: Ising quantum chain is equivalent to a model of biological evolution,
1393: {\it Phys.\ Rev.\ Lett.\/} {\bf 78} (1997) 559--62; Erratum:
1394: {\it Phys.\ Rev.\ Lett.\/} {\bf 79} (1997) 1782.
1395: \bibitem{BBW2}
1396: E.\ Baake, M.\ Baake and H.\ Wagner,
1397: Quantum mechanics versus classical propability in biological evolution,
1398: {\it Phys.\ Rev.\/} {\bf E57} (1998) 1191--2.
1399: \bibitem{BG}
1400: E.\ Baake and W.\ Gabriel,
1401: Biological evolution through mutation, selection, and drift: An introductory
1402: review,
1403: {\it Ann.\ Rev.\ Comput.\ Phys.\/} {\bf 7}
1404: ({\em in press}, cond-mat/9907372).
1405: \bibitem{CK}
1406: J.\ Crow and M.\ Kimura,
1407: {\em An Introduction to Population Genetics Theory}, Harper \& Row
1408: (New York 1970).
1409: \bibitem{DK}
1410: N.G.~Duffield and R.~K\"uhn,
1411: The thermodynamics of site-random mean-field quantum spin systems,
1412: {\it J.\ Phys.\/} {\bf A22} (1989) 4643--58.
1413: \bibitem{E}
1414: M.\ Eigen,
1415: Selforganization of matter and the evolution of biological
1416: macromolecules,
1417: {\it Naturwiss.\/} {\bf 58} (1971) 465--523.
1418: \bibitem{ECS}
1419: M.\ Eigen, J.\ McCaskill and P.\ Schuster,
1420: The molecular quasi-species,
1421: {\it J.\ Chem.\ Phys.\/} {\bf 75} (1989) 149--263.
1422: \bibitem{Fish}
1423: R.A.~Fisher,
1424: {\em The Genetical Theory of Natural Selection}, Clarendon Press
1425: (Oxford 1930).
1426: \bibitem{FP}
1427: S.~Franz and L.~Peliti,
1428: Error threshold in simple landscapes,
1429: {\it J.~Phys.\/} {\bf A26} (1993) 4481--7.
1430: \bibitem{FPS}
1431: S.~Franz, L.~Peliti, and M.~Sellitto,
1432: An evolutionary version of the random energy model,
1433: {\it J.\ Phys.\/} {\bf A26} (1993) L1195--9.
1434: \bibitem{Gal}
1435: S.~Galluccio,
1436: Exact solution of the quasispecies model in a sharply-peaked
1437: landscape,
1438: {\it Phys.\ Rev.\/} {\bf E56} (1997) 4526--39.
1439: \bibitem{KL}
1440: S.A.~Kauffmann and S.A.~Levin,
1441: Towards a general theory of adaptive walks on rugged landscapes,
1442: {\it J.\ Theor.\ Biol.\/} {\bf 128} (1987) 11--45.
1443: \bibitem{Kogut}
1444: J.~Kogut,
1445: An introduction to lattice gauge theory and spin systems,
1446: {\it Rev.\ Mod.\ Phys.\/} {\bf 51} (1979) 656--713.
1447: \bibitem{Leut}
1448: I.\ Leuth\"ausser,
1449: An exact correspondence between Eigen's evolution model and a
1450: two-dimensional Ising system,
1451: {\it J.\ Chem.\ Phys.\/} {\bf 84} (1986) 1884--5.
1452: \bibitem{Leut2}
1453: I.~Leuth\"ausser,
1454: Statistical mechanics of Eigen's evolution model,
1455: {\it J.~Stat.~Phys.\/} {\bf 48} (1987) 343--60.
1456: \bibitem{Li}
1457: W.-H.\ Li,
1458: {\it Molecular Evolution}, Sinauer (Sunderland, 1997).
1459: \bibitem{MT}
1460: K.~Malarz and D.~Tiggemann,
1461: Dynamics in Eigen's evolution model,
1462: {\it Int.\ J.\ Mod.\ Phys.\/} {\bf C9} (1997) 481--90.
1463: \bibitem{NS}
1464: M.~Nowak and P.~Schuster,
1465: Error thresholds of replication in finite populations. Mutation
1466: frequencies and the onset of Muller's ratchet.
1467: {\it J.\ Theor.\ Biol.\/} {\bf 137} (1989) 375--95.
1468: \bibitem{OB}
1469: P.~O'Brien,
1470: A genetic model with mutation and selection,
1471: {\em Math.~Biosci.\/} {\bf 73} (1985) 239--51.
1472: \bibitem{Oli}
1473: O.\ Redner,
1474: {\em private communication} (1999).
1475: \bibitem{Sta}
1476: P.\ Stadler,
1477: Landscapes and their correlation functions,
1478: {\em J.\ Math.\ Chem.\/} {\bf 20} (1996) 1--45.
1479: \bibitem{SOWH}
1480: D.\ Swofford, G.\ Olsen, P.\ Waddell and D.\ Hillis,
1481: Phylogenetic inference, in: M.\ Hillis, C.\ Moritz and E.\ Mable (Eds.):
1482: {\em Molecular Systematics}, Sinauer (Sunderland, 1995), pp.\ 407--517.
1483: \bibitem{Tara}
1484: P.~Tarazona,
1485: Error thresholds for molecular quasispecies as phase
1486: transitions: From simple landscapes to spin-glass models,
1487: {\it Phys.\ Rev.\/} {\bf A45} (1992) 6038--50.
1488: \bibitem{TM}
1489: C.J.\ Thompson and J.L.\ McBridge,
1490: On Eigen's theory of the self-organization of matter and the evolution of
1491: biological macromolecules,
1492: {\it Math.\ Biosci.\/} {\bf 21} (1974) 127--42.
1493: \bibitem{Wag}
1494: H.\ Wagner,
1495: {\em Biologische Sequenzraummodelle und Statistische Mechanik},
1496: PhD thesis, University of T\"ubingen, Dissertations Druck
1497: (Darmstadt 1998).
1498: \bibitem{WBG}
1499: H.\ Wagner, E.\ Baake and T.\ Gerisch,
1500: Ising Quantum chain and sequence evolution,
1501: {\it J.\ Stat.\ Phys.\/} {\bf 92} (1998) 1017--52.
1502: \bibitem{W}
1503: T.~Wiehe,
1504: Model dependency of error thresholds: the role of the fitness
1505: functions and contrasts between the finite and infinite sites
1506: models,
1507: {\it Genet.\ Res.\ Camb.\/} {\bf 69} (1997) 127--36.
1508: \end{thebibliography}
1509: \end{document}
1510:
1511:
1512:
1513: