cond-mat0008123/4z.tex
1: \documentclass{article}
2: \usepackage{a4,amsmath,epsfig,amssymb}
3: %\newcommand{\bm}[1]{\mbox{\boldmath $#1$}}
4: \newcommand{\bm}[1]{\boldsymbol #1}
5: \newcommand{\zmpf}[1]{\mbox{\hspace{#1em}}}
6: \newcommand{\Id}{\mbox{$\,$\rm 1\zmpf{-0.62}{\small 1}}}
7: \newcommand{\RR}{\mathbb R}
8: \newcommand{\TT}{\mathbb T}
9: \newcommand{\CC}{\mathbb C}
10: \newcommand{\ZZ}{\mathbb Z}
11: \newcommand{\QQ}{\mathbb Q}
12: 
13: \begin{document}
14: 
15: \title{Four-state quantum chain\\ as a model of sequence evolution}
16: \author{{\sc Joachim Hermisson$^{1,2}$, Holger Wagner$^{3}$ and 
17: Michael Baake$^{1}$} 
18: \\[2mm]
19: ${}^{1}$Institut f\"ur Theoretische Physik, Universit\"at
20: T\"ubingen,\\ Auf der Morgenstelle 14, 72076 T\"ubingen, Germany\\
21: ${}^{2}$Institut f\"ur Theorie der Kondensierten Materie,\\ 
22: Universit\"at Karlsruhe, 76128 Karlsruhe, Germany\\
23: ${}^{3}$Max-Planck-Institut f\"ur Biophysikalische Chemie,\\
24: Am Fa{\ss}berg 11, 37077 G\"ottingen, Germany}
25: \maketitle
26: \begin{abstract}
27: A variety of selection-mutation models for DNA (or RNA) sequences, 
28: well known in molecular evolution, can be translated into a model of coupled
29: Ising quantum chains. This correspondence is used to investigate the 
30: genetic variability and error threshold behaviour in dependence of possible 
31: fitness landscapes. In contrast to the two-state models treated
32: hitherto, the model explicitly takes the four-state nature of the
33: nucleotide alphabet into account and allowes for the distinction of
34: mutation rates for the different base substitutions, as given by
35: standard mutation schemes of molecular phylogeny. As a consequence of
36: this refined treatment, new phase diagrams for the error threshold
37: behaviour are obtained, with appearance of a novel phase in which the
38: nucleotide ordering of the wildtype sequence is only partially conserved.
39: Explicit analytic and numeric results are presented for evolution
40: dynamics and equilibrium behaviour in a number of accessible
41: situations, such as quadratic fitness landscapes and the Kimura 
42: 2 parameter mutation scheme. 
43: \end{abstract}
44: 
45: \section{Introduction}
46: 
47: One prominent phenomenon in the theory of molecular evolution that has 
48: also attracted considerable attention in statistical physics is the 
49: so-called {\em error threshold}. It describes the breakdown of 
50: genetic order in mutation-selection models for mutation rates 
51: surpassing a certain critical value. The prototype model for the
52: description of the error threshold is Eigen's quasispecies model 
53: in sequence space \cite{E,ECS} (which is effectively equivalent to a 
54: coupled mutation-selection model in population genetics, cf \cite{CK}),
55: originally designed for the description of prebiotic RNA
56: evolution. However, the threshold is supposed to be a phenomenon that 
57: should occur in a rather general class of mutation-selection models.
58: 
59: In order to set up a mutation-selection model that is tractable by 
60: analytical (or at least numerical) methods, severe simplifications 
61: of the original biological situation seem to be indispensable. 
62: Analytical approaches generally have to restrict to the treatment 
63: of infinitly large populations and rather simple fitness functions,
64: such as the sharply peaked landscape of Eigens original model.
65: Another common approximation, also used in previous studies of the 
66: quasispecies model, amounts for the simplified 
67: representation of genotypes as binary strings. In the context of
68: molecular evolutionary theory, this may be thought of as representing 
69: DNA or RNA strands by sequences of {\em purins} and {\em pyrimidins},
70: hence with only two states per site, neglecting the fact that genetic 
71: information is really given by a four-letter alphabet. In this
72: article, we present a four-state mutation-selection model
73: which is capable to describe the full nucleotide alphabet and
74: incorporates the standard mutation schemes of molecular phylogeny. 
75: In particular, the phase diagramms are discussed in detail which 
76: are more polymorphic than for the two-state model. This shows that, 
77: for a full understanding of the error threshold behaviour in 
78: molecular evolution, investigations can not be restricted entirely 
79: to the study of two-state models.
80: 
81: One important step towards an understanding of the
82: threshold phenomenon has been its identification with an equilibrium phase
83: transition in physics by the translation of a time-discrete version of
84: the quasispecies model into the transfer matrix of an anisotropic 
85: two-dimensional Ising model \cite{Leut}. This equivalence was further
86: exploited to study various aspects of the error threshold
87: with methods from statistical physics \cite{Leut2,Tara,FPS,FP,MT}. 
88: It turns out, however, that the anisotropy of that model is not so
89: easy to handle and the analysis of the relevant biological quantities 
90: (which correspond to certain surface properties of the Ising model)
91: remains an involved problem. Due to the complications of the model,
92: almost all results obtained so far are approximate or numerical. The 
93: only exact result for the {\em sharply peaked landscape} \cite{Gal}
94: has been worked out via a different analogy to a model of directed
95: polymers, using the specific properties of that very special fitness 
96: landscape.
97: 
98: An alternative approach to the analysis of mutation-selection models
99: and the error threshold which avoids some of the problems of the
100: anisotropic Ising model has been brought up in \cite{BBW,WBG}.   
101: Here, the starting point on the biological side is a slightly changed 
102: model which describes the evolution of a population with overlapping 
103: generations in continuous time. It turns out that, after a
104: reformulation in tensor products, the two-state version of this model 
105: is equivalent to the Hamiltonian of an Ising quantum chain. Thereby, 
106: the change to continuous time in the biological description
107: corresponds to the anisotropic limit that connects the 
108: two-dimensional Ising model and the quantum chain in physics 
109: (cf.~\cite{Kogut}). The quantum chain model is technically easier to 
110: handle, and exact results for two non-trivial fitness landscapes, 
111: namely Onsager's landscape and the quadratic fitness function, have 
112: been worked out \cite{BBW,WBG}. 
113: 
114: Accordingly, we extend this latter approach to a full four-state model
115: in this study. The quantum chain analogy allows to use well-known methods 
116: from statistical mechanics for the solution of the model, so that we do not
117: have to dwell on technical details here. For an extended presentation 
118: of methods (with regard to the two-state model) using techniques from
119: rigorous mean field theory, we refer to \cite{Wag,WBG}. The main focus 
120: is instead on the discussion of the threshold behaviour and in
121: particular the increased complexity of the phase diagram due to the 
122: consideration of the four-state nature of biological information and 
123: the refined schemes of molecular mutation rates. 
124: 
125: In the following section, we start with a presentation of the biological 
126: foundations of our model. Only thereafter, we will introduce the quantum
127: chain model in Section 3. In Section 4, analytical and numerical
128: results are presented for a number of specific four-state models 
129: with permutation invariant fitness landscapes. Also the properties of
130: finite sequences and the evolution dynamics will be studied. 
131: We close with a summary of our results and a discussion of open
132: problems in Section 5.
133: 
134: \section{Biological foundations}
135: 
136: Genetic information is coded in DNA (and RNA) molecules. These are 
137: heteropolymers of four units (nucleotides) which differ in a specific
138: base. The essential aspect of a DNA sequence is captured in
139: a string over a four-letter alphabet
140: \begin{equation}
141: {\bm \sigma} \in V \equiv V_1 \times V_2 \times \dots \times V_N \;;\quad
142: V_i = \{A,C,G,T\}
143: \end{equation}
144: where each letter represents a particular base: $A$ and $G$  for
145: adenine and guanine (the purins), $C$ and $T$ for cytosine and thymine 
146: (the pyrimidins). In RNA sequences, $T$ is replaced by $U$ for uracil.
147: We will therefore treat the $4^N$ different sequences of a fixed, 
148: finite length $N$ as our genotypes (which may be thought of as coding
149: for something, such as a virus or an enzyme). Disregarding
150: environmental effects, we may identify a collection of genotypes with
151: a {\em population} of haploid `individuals'. Evolution then describes
152: the change of the population composition in time. 
153: 
154: A standard model for the evolution of an infinite, asexually 
155: reproducing population under the basic forces of mutation and
156: selection which works in continuous time is given by the following 
157: system of non-linear differential equations \cite{CK}
158: \begin{equation} \label{paramuse}
159: \dot{p}_{\bm{\sigma}}^{}(t) = 
160: \big( r_{\bm{\sigma}}^{} - \bar{r}(t)\big) p_{\bm{\sigma}}^{}(t)
161: + \sum_{\bm{\sigma'}} m_{\bm{\sigma}\bm{\sigma'}} p_{\bm{\sigma'}}(t)\;.
162: \end{equation}
163: Here, $p_{\bm{\sigma}}^{}(t)$ denotes the relative frequency of genotype 
164: ${\bm \sigma}$ at time $t$ with corresponding Malthusian fitness
165: (replication rate minus death rate) $r_{\bm \sigma}^{}$, and
166: \begin{equation}
167: \bar{r}(t) = \sum_{\bm{\sigma}} r_{\bm{\sigma}} p_{\bm{\sigma}}(t)
168: \end{equation}
169: is the {\em mean fitness} of the population. It is the origin of the
170: non-linearity in (\ref{paramuse}). Finally, 
171: $m_{{\bm \sigma}{\bm \sigma'}}$ is the (time independent) rate at which 
172: ${\bm \sigma'}$ mutates to ${\bm \sigma}$. This framework has
173: originally been defined in classical population genetics \cite{CK}. In 
174: the sequence space context, it has been introduced in \cite{B} and has been
175: called the {\it para-muse} ({\em pa}rallel {\em mu}tation-{\em se}lection) 
176: model, since it assumes mutation and selection to act independently
177: and in parallel at each instant of time. 
178: The model ignores recombination and genetic drift due to finite
179: population size. Both assumptions can be considered as fairly reasonable
180: at least in the context of the evolution of viruses or bacteria where 
181: populations can be huge and recombination is absent, or the
182: nucleotides are tightly linked. In the following subsections, the
183: basic processes of mutation and selection shall be described in some detail.
184: 
185: \subsection{Mutation}
186: 
187: We take mutation as a point process acting independently on
188: all sites, ignoring more complicated mechanisms, such as
189: insertions or deletions. Molecular mutation rates shall be chosen 
190: according to the following scheme, known as the {\em Kimura 3 ST
191: model} in molecular phylogeny \cite{Li,SOWH}: 
192: \begin{figure}[ht]
193: \centerline{\epsfysize=27mm \epsfbox{mutation.eps}}
194: \caption{Molecular mutation scheme according to the Kimura 3 ST model.}
195: \label{mutfig}
196: \end{figure}
197: 
198: Within this general setup, a number of simpler models is contained,
199: which treat mutation at different levels of sophistication. In the 
200: simplest approach, the mutation rates between all four nucleotides 
201: are assumed to be equal $(\mu_1 = \mu_2 = \mu_3)$. This is the 
202: so-called {\em Jukes-Cantor mutation scheme}. While this simple 
203: frame already seems to be sufficient for a number of applications, 
204: measurements reveal that there are indeed pronounced differences in 
205: the mutation rates that should be accounted for in more realistic
206: models. In particular, the {\em transitions} between the two purins 
207: (A,G) and the two pyrimidins (C,T) are much more frequent than the 
208: purin--pyrimidin mutations which are called {\em transversions}. This 
209: may range up to relative differences of 
210: $\mu_1 \approx \mu_3 \simeq \mu_2/2$ in the 
211: nucleus and $\mu_1 \approx \mu_3 \simeq \mu_2/40$ in mitochondrial
212: DNA \cite{Li}. A mutation scheme with $\mu_2 > \mu_1 = \mu_3$ is known as the 
213: {\em Kimura 2 parameter model}. The full {\em Kimura 3 ST} scheme,
214: finally, also accounts for the small difference between $\mu_1$ and
215: $\mu_3$, such that $\mu_2 > \mu_1 > \mu_3$.
216: 
217: Implementing this mutation model into the evolution equation
218: (\ref{paramuse}), we obtain the following mutation rates between
219: genotypes ($i \in \{1,2,3\}$)
220: \begin{equation} \label{mss}
221: m_{{\bm \sigma}{\bm \sigma'}} = \left\{ 
222: \begin{array}{rl} 
223: \mu_i, \quad & d_i({\bm \sigma},{\bm \sigma'})
224: = d_{{\bm \sigma}{\bm \sigma'}} = 1
225: \\
226: -N \sum_i \mu_i,\quad   & {\bm \sigma} = {\bm \sigma'}
227: \\
228: 0,\quad       & d_{{\bm \sigma}{\bm \sigma'}} > 1
229: \end{array} \right. \;.
230: \end{equation}
231: Here,
232: \begin{eqnarray} \nonumber
233: d_1({\bm \sigma},{\bm \sigma'}) & = & 
234: \#_{A \rightleftarrows C}({\bm \sigma},{\bm \sigma'}) 
235: + \#_{G \rightleftarrows T}({\bm \sigma},{\bm \sigma'})
236: \\ \label{Hamming}
237: d_2({\bm \sigma},{\bm \sigma'}) & = & 
238: \#_{A \rightleftarrows G} ({\bm \sigma},{\bm \sigma'})
239: + \#_{C \rightleftarrows T}({\bm \sigma},{\bm \sigma'})
240: \\ \nonumber
241: d_3({\bm \sigma},{\bm \sigma'}) & = & 
242: \# _{A \rightleftarrows T}({\bm \sigma},{\bm \sigma'}) 
243: + \#_{C \rightleftarrows G}({\bm \sigma},{\bm \sigma'})
244: \end{eqnarray}
245: are restricted Hamming distances between ${\bm \sigma}$ and ${\bm \sigma'}$.
246: In (\ref{Hamming}), $\#_{X \rightleftarrows Y}({\bm \sigma},{\bm \sigma'})$ 
247: counts the positions at which $X$ and $Y$ are exchanged in $\bm{\sigma}$ and
248: $\bm{\sigma}'$. Finally,
249: \begin{equation}
250: d_{{\bm \sigma}{\bm \sigma'}} = d_1({\bm \sigma},{\bm \sigma'})
251:  + d_2({\bm \sigma},{\bm \sigma'}) + d_3({\bm \sigma},{\bm \sigma'})
252: \end{equation}
253: is the total Hamming 
254: distance. Note that the choice of the diagonal term 
255: $m_{{\bm \sigma}{\bm \sigma}}$ in (\ref{mss}) just accounts for 
256: probability conservation ($\sum_{\bm{\sigma}} 
257: \dot{p}_{\bm{\sigma}} = 0$) in the mutation part of the 
258: evolution equation (\ref{paramuse}).
259: 
260: \subsection{Selection and fitness landscape} 
261: 
262: Whereas the mutational part of the dynamics is fairly well understood 
263: at least on the microscopic (molecular) level, the relation of 
264: genotype and fitness, which defines the respective selective success, 
265: is notoriously complex.  
266: Following the standard notion in molecular evolution, we define the 
267: {\em fitness function} (or {\em fitness landscape})
268: \begin{equation}
269: f: \bm{\sigma} \mapsto r_{\bm{\sigma}} 
270: \end{equation}
271: as a mapping from the configuration space $V= \{A,C,G,T\}^N$ into the 
272: real numbers, assigning a reproduction rate (Malthusian fitness value) 
273: $r_{\bm{\sigma}}$ to each 
274: genotype. Implicitly, the fitness function incorporates all the
275: complicated interactions between the sites. These interactions
276: are typically long-ranged (since RNA strands or proteins fold in three 
277: dimensions), highly correlated, and give rise to rather rugged landscapes. 
278: Especially in the context of RNA evolution, the construction and
279: characterization of fitness landscapes has motivated numerous studies,
280: see e.g.\ \cite{Sta} for a review. 
281: 
282: Below we will show how the evolution equation (\ref{paramuse}), with
283: an arbitrary choice of the fitness function, can be adapted to the
284: methods from statistical physics by a reformulation in a quantum 
285: chain framework. As an application, we then present a study (including
286: analytical and numerical results) for specific examples from the class
287: of permutation invariant fitness functions. Here, due to equivalence of
288: all sites, the fitness of a given genotype is solely a function of 
289: its restricted Hamming distances from the so called {\em wildtype} sequence
290: with optimal fitness which we choose as the reference genotype. 
291: This particularly simple class of fitness 
292: landscapes is widely used, as a canonical first approximation, 
293: especially in {\em multilocus theory}. Also in the context of sequence 
294: space evolution, fitness functions of this type
295: have been used in a number of studies on the two-state model 
296: \cite{OB,Leut2,Tara,BBW,WBG}. To implement the approach in our
297: four-state model, we fix an arbitrary sequence, denoted by 
298: $\bm{\sigma}_{++}$, as
299: the wildtype. We will only consider directional selection here towards a
300: unique genotype with optimal fitness. The fitness of any other
301: sequence is then determined by the restricted Hamming distances 
302: $d_i$ relative to $\bm{\sigma}_{++}$. 
303: Permutation invariance with respect to the position in the sequence
304: thus leads to a drastic reduction of dimensions. For the four-state
305: model, the effective configuration
306: space forms a tetrahedron in 3d (see Fig.~\ref{select}) and is 
307: conveniently represented in Cartesian coordinates which we 
308: shall call (following \cite{BBW}) the {\em surplus components}:
309: \begin{eqnarray}\nonumber
310: s_1(\bm{\sigma}) &=& 1 - \frac{2}{N}
311: \Big(d_1(\bm{\sigma},\bm{\sigma}_{++})+d_3(\bm{\sigma},\bm{\sigma}_{++})\Big)\;;
312: \\ \label{surplus}
313: s_2(\bm{\sigma}) &=& 1 - \frac{2}{N}
314: \Big(d_2(\bm{\sigma},\bm{\sigma}_{++})+d_3(\bm{\sigma},\bm{\sigma}_{++})\Big)\;;
315: \\ \nonumber
316: s_3(\bm{\sigma}) &=& 1 - \frac{2}{N}
317: \Big(d_1(\bm{\sigma},\bm{\sigma}_{++})+d_2(\bm{\sigma},\bm{\sigma}_{++})\Big)\;.
318: \end{eqnarray} 
319: \begin{figure}[t]
320: \centerline{\epsfysize=50mm \epsfbox{select2.eps}}
321: \caption{Permutation invariant configuration space of the four-state
322:   model in surplus coordinates.}
323: \label{select}
324: \end{figure}
325: With this choice, any unstructured random sequence has coordinates 
326: $s_i \equiv 0$ (with probability 1 in the limit $N\to \infty$).
327: Any positive value of a surplus component, on the other hand, signals a 
328: non-trivial overlap of the sequence with the wildtype $\bm{\sigma}_{++}$. 
329: In particular, $s_1$ measures the surplus of sites with purins or pyrimidins 
330: as given in $\bm{\sigma}_{++}$ over the purin--pyrimidin mutated sites.
331: 
332: Within this frame, a natural class of permutation invariant fitness 
333: functions is
334: \begin{equation} \label{fit}
335: f: \bm{\sigma} \mapsto
336: r_{\bm{\sigma}} = N \sum_{i=1}^3 \left[\alpha_i^{} s_i(\bm{\sigma}) + 
337: \frac{\gamma_i^{}}{2} s_i^2(\bm{\sigma}) \right]
338: \end{equation}
339: which includes the following special cases
340: \begin{itemize}
341: \item 
342: Setting $\alpha_i > 0$ and $\gamma_{i} = 0$, we obtain the purely additive 
343: {\em Fujiyama landscape} without genetic interactions. Here, every 
344: mutation relative to the wildtype has a fixed deleterious effect, 
345: independent of any other mutation that may be present in the sequence. 
346: The additive landscape is a canonical zeroth-order approximation, ignoring
347: any kind of genetic interactions. In the context of sequence
348: evolution, this fitness function has been discussed e.g.~in \cite{OB,BBW}. 
349: \item 
350: With the choice $\alpha_i \ge - \gamma_{i} > 0$, the model 
351: corresponds to a concave quadratic fitness function 
352: (with directional selection) as it is frequently met
353: in multilocus theory. Due to the gene interactions, existing mutations 
354: tend to aggravate further ones, which is called {\em positive epistasis}. 
355: \item
356: For $\alpha_i \ge 0$ and $\gamma_i > 0$, we finally obtain a convex fitness 
357: function for directional selection with long-range gene interactions and
358: {\em negative epistasis} (existing mutations tend to alleviate further 
359: ones). Since we want to have $\bm{\sigma}_{++}$ as unique wildtype
360: sequence and a fitness function which is monotonous in the surplus
361: components, we restrict $f$ to the octant $s_i \ge 0$ and (smoothly) 
362: truncate the fitness function by introduction of a step function 
363: $\Theta(s_i)$ whenever frequencies of genotypes with $s_i < 0$ are 
364: non-zero:
365: \begin{equation} \label{fit2}
366: \tilde{f}: \bm{\sigma} \mapsto
367: r_{\bm{\sigma}} = N \sum_{i=1}^3 
368: \left[\left(\alpha_i^{} s_i(\bm{\sigma}) + 
369: \frac{\gamma_i^{}}{2} s_i^2(\bm{\sigma}) \right)\Theta(s_i) \right]\;.
370: \end{equation}
371: \end{itemize}
372: The variables $\alpha_i$ and $\gamma_i$ may further be used to 
373: distinguish between the effects of the different types of mutations 
374: (as defined in Fig \ref{mutfig}) on the fitness. In this article, 
375: we will present explicit results for the two following cases:
376: \begin{enumerate}
377: \item 
378: For the simplest choice, $\alpha_1=\alpha_2=\alpha_3$ and 
379: $\gamma_1=\gamma_2=\gamma_3$, any mutation away from the wildtype has 
380: the same effect. Together with the Jukes-Cantor mutation scheme,
381: symmetry here leads to equal values of the surplus components in the 
382: mutation--selection equilibrium. The model may thus also be thought
383: of as a two-state model, where any site is only regarded as occupied 
384: either with a {\em wildtype} or with a {\em mutant} nucleotide. 
385: In contrast to the simple two-state model of \cite{BBW}, however, 
386: there is an effectively asymmetric mutation rate between wildtype 
387: and mutant in the case considered here.
388: \item
389: In a more refined model, we distinguish between transitions and
390: transversions. In the mutational part, this is done by applying the 
391: Kimura 2 parameter mutation scheme. In the fitness function, we take 
392: into account that the deleterious effects of the transversions often 
393: dominate over those of the transitions: $\alpha_1 > \alpha_{2,3}$ 
394: and/or $\gamma_1 > \gamma_{2,3}$. 
395: \end{enumerate}
396: 
397: 
398: \section{Quantum chain model}
399: 
400: \subsection{Symmetries}
401: 
402: Since mutation is a random process that is independent of 
403: the fitness values of the genotypes involved, the molecular mutation
404: scheme consequently makes no reference to fitness concepts like the
405: {\em wildtype}. Biological observables measurable from sequence data,
406: such as the surplus components (\ref{surplus}), and also the fitness 
407: functions as defined in (\ref{fit}) or (\ref{fit2}), on the other
408: hand, are defined relative to the wildtype sequence. In order to set 
409: up these concepts in a common framework, it is convenient to
410: reformulate also the mutational part of the evolution equation in
411: coordinates relative to the wildtype. This may always be done  
412: due to certain symmetries inherent in the mutation scheme of 
413: Fig.~\ref{mutfig}.
414: 
415: The basic symmetry of the mutation scheme, if all three mutation rates 
416: $\mu_1, \mu_2, \mu_3$ are pairwise different, is $C_2 \times C_2$ 
417: (Klein's 4-group), generated by two involutions. If we write the 
418: operations in standard permutation notation, we can take as generators 
419: the transformations
420: \begin{equation}
421: \begin{pmatrix}
422: A&C&G&T \\ C&A&T&G
423: \end{pmatrix} \quad \text{and} \quad
424: \begin{pmatrix}
425: A&C&G&T \\ G&T&A&C
426: \end{pmatrix}\;,
427: \end{equation}
428: both being the product of two transpositions. This symmetry may 
429: now be exploited for a redefinition of the mutation scheme in 
430: wildtype coordinates. To this end, we fix, for every site of the
431: wildtype sequence, the element of the 4-group (in the above 
432: representation) with the letter of the wildtype nucleotide in the
433: first position (e.g. the string $(T,G,C,A)$ for wildtype nuceotide
434: $T$). An alternative representation of the configuration space in wildtype
435: coordinates as
436: \begin{equation}
437: {\bm \sigma} \in V^\pm \equiv V_1^\pm \times V_2^\pm
438:  \times \dots \times V_N^\pm \;;\quad
439: V_i^\pm = \{++,-+,+-,--\}
440: \end{equation}
441: is now given by the mapping, on each site, of the string of 
442: labels $(++,-+,+-,--)$ to the symmetry element of 4-group defined 
443: above. With this notation, the three types of mutations included in the 
444: Kimura 3 ST scheme simply switch the signs of the labels: 
445: $\pm\pm \to \mp\pm$ at rate $\mu_1$, $\pm\pm \to \pm\mp$ at rate 
446: $\mu_2$, and $\pm\pm \to \mp\mp$ at rate $\mu_3$.
447: 
448: Higher symmetries of the mutation model are obtained if mutation rates are
449: equal. For the Kimura 2 parameter scheme, $\mu_1 = \mu_3 \neq \mu_2$,
450: the operation 
451: \begin{equation}
452: A \to C \to G \to T \to A \; = \; 
453: \begin{pmatrix}
454: A&C&G&T \\ C&G&T&A
455: \end{pmatrix}
456: \end{equation}
457: is also a symmetry and generates a cyclic group $C_4$. Together with
458: the previous $C_2 \times C_2$, this generates a dihedral group, $D_4$,
459: with 8 elements. Finally, if $\mu_1 = \mu_2 = \mu_3$, we additionally
460: get the simple transposition $A \leftrightarrow C$
461: and have the full permutation group $S_4$ as symmetry. Note that
462: $S_4$, which corresponds to the full tetrahedral group with 24
463: elements, is also the symmetry group of the configuration space of
464: permutation invariant configurations visualized in
465: Fig.~\ref{select}. The {\em global} symmetry (with the same
466: transformation acting at each site simultaneously) of our class of
467: mutation-selection models with fitness functions according to
468: (\ref{fit}) is therefore always a subgroup of $S_4$. 
469: In particular, the symmetric fitness model with $\alpha_1 = \alpha_2 =
470: \alpha_3$, $\gamma_1 = \gamma_2 = \gamma_3$, and Jukes-Cantor mutation
471: scheme possesses $C_{3v}$ symmetry, or the full tetrahedral symmetry if the
472: linear part in the fitness function vanishes ($\alpha_i = 0$).
473: The transition-transversion model finally, with $\alpha_1 >
474: \alpha_2 = \alpha_3$, or $\gamma_1 > \gamma_2 = \gamma_3$, and Kimura 2
475: parameter mutation has simple $C_2$ symmetry, or $D_4$ symmetry if
476: $\alpha_i \equiv 0$. In the latter case, the combination of
477: $\gamma_2=\gamma_3$ with $\mu_1=\mu_3$ is necessary, not a
478: misprint. Other combinations with global $D_4$ symmetry are $(\gamma_1
479: = \gamma_3; \mu_2=\mu_3)$ and $(\gamma_1=\gamma_2; \mu_1=\mu_2)$. 
480: 
481: \subsection{Construction}
482: 
483: With the above preparations, we may now follow the lines of
484: \cite{BBW,WBG} where the two-state model is treated.
485: 
486: In a first step, we represent the $4^N$-dimensional vector space in
487: which we describe the
488: genotype frequencies as the $N$-fold tensor product space
489: $W = \otimes_{j=1}^N W_j$. Hereby, the configuration space $V^\pm$ is 
490: canonically embedded in $W$ by the mapping of the elements of 
491: $V_i^\pm$ onto the basis vectors 
492: $\{e_{j}^{++}, e_{j}^{-+}, e_{j}^{+-}, e_{j}^{--}\}$ of $W_j \simeq \RR^4$.
493: Since the nonlinear part in the differential 
494: equations (\ref{paramuse}) only amounts to normalization of the 
495: frequencies, a transformation to so-called
496: {\em absolute frequencies} \cite{TM,BBW}
497: \begin{equation}
498: z_{\bm \sigma}^{}(t) = p_{\bm \sigma}^{}(t) \exp\Big( \sum_{\bm \sigma'} 
499: r_{\bm \sigma'}^{} \int_0^t p_{\bm \sigma'}^{}(\tau) \,d\tau \Big)  
500: \end{equation}
501: then reduces the system to the linear equation
502: \begin{equation} \label{LGS}
503: \dot{z}_{\bm \sigma}^{}(t) = \big({\cal M} + {\cal R}\big) 
504: z_{\bm \sigma}^{}(t) 
505: \end{equation}
506: where the mutation and reproduction matrices, ${\cal M} = 
507: (m_{\bm\sigma \bm\sigma'})$ and ${\cal R} = \text{diag}(r_{\bm\sigma}^{})$, 
508: may now be conveniently represented in the frequency space $W$. Defining
509: \begin{equation}
510: \sigma_j^{(\alpha,\beta)} := \left(\otimes^{j-1} \Id_4 \right) \otimes 
511: \left(\sigma^\alpha \otimes \sigma^\beta \right)
512: \otimes \left(\otimes^{N-j-1} \Id_4\right)
513: \end{equation}
514: where $\sigma^\alpha$, $\alpha \in \{0,x,z\}$, are the real Pauli matrices and 
515: $\sigma^0 \equiv \Id_2$, we find
516: \begin{equation}
517: {\cal M} = \sum_{j=1}^N \left[ \mu_1 \sigma_j^{(x,0)} + \mu_2 
518: \sigma_j^{(0,x)} + \mu_3 \sigma_j^{(x,x)} - (\mu_1+\mu_2+\mu_3) \Id\right]
519: \end{equation}
520: for the mutation matrix. The reproduction matrix ${\cal R}$ is, for a 
521: general fitness landscape, an element of the algebra generated by
522: $\sigma_j^{(z,0)}$ and $\sigma_j^{(0,z)}$, $1\le j\le N$, 
523: \begin{equation}
524: {\cal R} = r_0 \Id + \sum_{k,\ell = 1}^N
525: \sum_{[j_1^{} \dots j_k^{}]} \sum_{[j_1^{} \dots j_\ell^{}]} 
526: \varepsilon_{[j_1^{} \dots j_k^{}],[j_1^{} \dots j_\ell^{}]}^{}
527: \prod_{m=1}^k \sigma_{j_m^{}}^{(z,0)} \prod_{n=1}^\ell
528: \sigma_{j_n^{}}^{(0,z)},
529: \end{equation}
530: where $[j_1^{} \dots j_k^{}]$ is an ordered $k$-tupel in $\{1,\dots,N\}$.
531: Now, from a physical point of view, ${\cal H} = {\cal M} + {\cal R}$
532: is (up to a global minus sign) the Hamiltonian of two coupled Ising 
533: quantum chains in a tunable transverse magnetic field (the mutation)
534: and general spin-interactions within the chains. 
535: 
536: Translated to our quantum chain model, the fitness function of the
537: permutation invariant landscape defined in (\ref{fit}) results in a 
538: (longitudinal) magnetic field and a mean field spin-interaction. We find 
539: ${\cal R } = {\cal R}_\alpha + {\cal R}_\gamma$, where
540: \begin{equation}
541: {\cal R}_\alpha = \sum_{j=1}^N \left[\alpha_1 \sigma_j^{(z,0)} 
542: + \alpha_2 \sigma_j^{(0,z)} + \alpha_3 \sigma_j^{(z,z)} \right]
543: \end{equation}
544: and
545: \begin{equation} \label{rgamma}
546: {\cal R}_\gamma = \frac{1}{2N} \sum_{j,k = 1}^N \left[ \gamma_1
547: \sigma_j^{(z,0)}\sigma_k^{(z,0)} + \gamma_2 \sigma_j^{(0,z)}\sigma_k^{(0,z)} +
548: \gamma_3 \sigma_j^{(z,z)}\sigma_k^{(z,z)} \right]
549: \end{equation}
550: Let us stress that, in contrast to most physical applications, the mean 
551: field model is a much more natural approach in the biological 
552: context where interactions are typically long-range. So, it is a
553: legitimate model here, not an inevitable approximation.
554: 
555: 
556: \subsection{Biological and physical observables} \label{bpo}
557: 
558: In this subsection, we relate the quantities of biological interest,
559: mean and variance of the surplus components and the fitness, to the 
560: physical observables. In what follows, we assume the occuring limits
561: to exist. 
562: 
563: \paragraph{Genotype composition}
564: According to (\ref{LGS}), the Hamiltonian of the quantum chain determines the
565: time evolution of our population of genotypes in an environment that does not 
566: constrain the population size. For any genotype-independent 
567: regulation of the population size, the relative genotype frequencies
568: are found by {\em statistical} normalization. We therefore define the
569: vector of the genotype composition $|\bm{p}(t) \rangle$ and the 
570: equilibrium composition $|0\rangle$ as    
571: \begin{equation}
572: |\bm{p}(t) \rangle = 
573: \frac{\exp(t{\cal H})
574: |\bm{p}_0\rangle} {\langle \Omega|\exp(t{\cal H})|\bm{p}_0\rangle}
575: \quad ; \quad
576: |0\rangle := \lim_{t\to \infty} |\bm{p}(t) \rangle 
577: \end{equation}
578: where $|\bm{p}_0\rangle$ is the initial composition and
579: $4^{-N}|\Omega\rangle$ is the equidistribution of genotypes.
580: Note that the {\em equilibrium composition} of the genotype population
581: just corresponds to the {\em ground state} of the quantum chain on 
582: the physical side (with a different `biological' normalization 
583: $\langle \Omega|0\rangle = 1$).
584: 
585: 
586: \paragraph{Fitness} The {\em density of the mean fitness} (or mean
587: fitness per site) of the population is given by the expression  
588: \begin{equation}
589: w(t) := N^{-1} \bar{r}(t) =  
590: N^{-1} \langle\Omega|{\cal R}|\bm{p}(t)\rangle \;.
591: \end{equation}
592: Since
593: \begin{equation}
594: w := \lim_{t \to \infty} w(t) = N^{-1} \langle \Omega| {\cal R} | 0
595: \rangle = N^{-1} \frac{\langle 0| {\cal H} |0\rangle}{\langle 0| 0
596:   \rangle}
597: \end{equation}
598: the {\em equilibrium} mean fitness (per site) is just given by the 
599: (unique) largest eigenvalue of ${\cal H}$, corresponding to
600: $|0\rangle$. For an unconstrained population, $w$ also determines the 
601: growth rate in the long-time limit. In the physical picture, 
602: $(-w)$ is obviously just the {\em ground state energy} (per spin). 
603: 
604: Using ${\cal M} |\Omega\rangle = 0$, we derive for the time evolution
605: of the mean fitness
606: \begin{equation} \label{zeit}
607: \dot{w}(t) = V_r(t) + N^{-1} 
608: \langle \Omega| [{\cal R},{\cal M}] | \bm{p}(t) \rangle
609: \end{equation}
610: where $V_r(t)$ is the {\em variance of fitness} (per site),
611: \begin{equation}
612: V_r(t) = \frac{1}{N}\left(\langle \Omega|{\cal R}^2|\bm{p}(t)\rangle
613: - \langle \Omega|{\cal R}|\bm{p}(t)\rangle^2 \right)\;.
614: \end{equation}
615: In the absence of mutation, (\ref{zeit}) is of course just a special case
616: of Fisher's ``Fundamental Theorem of Natural Selection'' \cite{Fish} which
617: states that the rate of increase in fitness is equal to the genetic
618: variance in fitness. For the mutation-selection models considered
619: here, the relation has the following intuitive interpretation:
620: The change in mean fitness is driven by two independent forces. The
621: first one stems from the change of genotype frequencies due to
622: selection and is proportional to the variance of fitness values
623: present in the population. Since variances are positive, it always
624: tends to increase fitness. The second term on the right hand side of 
625: (\ref{zeit}) typically decreases fitness. It measures the population
626: mean of the change in fitness at time $t$ due to the action of mutation.
627: In mutation-selection equilibrium, both terms balance, and the entire 
628: residual variance is due to mutation.  
629: 
630: \paragraph{Surplus} Another quantity that characterizes the genetic 
631: order of the population, as it may be measured from sequence data, is 
632: the {\em mean surplus}. We define, following and generalizing \cite{BBW},
633: \begin{equation}
634: u_i(t) = \sum_{\bm{\sigma}} s_i(\bm{\sigma}) p_{\bm{\sigma}}^{}(t)
635: \quad ; \quad
636: u_i = \lim_{t \to \infty} u_i(t) \;. 
637: \end{equation} 
638: In particular, 
639: \begin{equation}
640: \#_m(t) := \frac{1}{4} \big(3 - (u_1(t)+u_2(t)+u_3(t))\big)
641: \end{equation}
642: measures the mean number of mutations per site relative to the wildtype while
643: \begin{equation}
644: \#_{tr}(t) := \frac{1}{2} \big( 1 - u_1(t) \big) 
645: \end{equation}
646: denotes the mean number of transversions alone.
647: As a {\em biological order parameter}, the mean surplus plays a 
648: similar r{\^o}le as the physical magnetization. However, as already 
649: noted in \cite{BBW2}, both quantities are quite distinct and in many 
650: cases not even easily related. In the language of the quantum chain, 
651: the equilibrium mean surplus may be derived as
652: \begin{equation}
653: u_1 = \frac{\langle \Omega|\sum_i\sigma_i^{(z,0)}|0\rangle}{N}
654: \quad ;\quad 
655: u_2 = \frac{\langle \Omega|\sum_i\sigma_i^{(0,z)}|0\rangle}{N}
656: \quad ;\quad 
657: u_3 = \frac{\langle \Omega|\sum_i\sigma_i^{(z,z)}|0\rangle}{N}
658: \; , 
659: \end{equation}
660: whereas the three-component magnetization is defined as the ground 
661: state expectation value
662: \begin{equation}
663: m_1 = \frac{\langle 0|\sum_i\sigma_i^{(z,0)}|0\rangle}
664: {N \langle 0|0\rangle} \quad;\quad 
665: m_2 = \frac{\langle 0|\sum_i\sigma_i^{(0,z)}|0\rangle}
666: {N \langle 0|0\rangle} \quad ;\quad
667: m_3 = \frac{\langle 0|\sum_i\sigma_i^{(z,z)}|0\rangle}
668: {N \langle 0|0\rangle} \; .
669: \end{equation}
670: As we will show below, magnetization and surplus can show rather
671: different behaviour especially near phase transitions. The biological 
672: and physical phase diagrams, however, coincide if phase transitions
673: (or error thresholds) are defined as nonanalyticity points of the 
674: ground state energy (or mean fitness) $w$ in the thermodynamic limit
675: (cf.~the discussion in Section 5). 
676: 
677: \section{Results}
678: 
679: \subsection{Fujiyama model}
680: 
681: As in the two-letter case \cite{BBW}, the quantum chain model 
682: decomposes into non-interacting one-site Hamiltonians for the 
683: additive landscape. The mean fitness and its variance are linear
684: functions in the surplus components. In particular, we obtain from
685: (\ref{zeit})
686: \begin{equation}
687: V_r(t) = \dot{w}(t) + 2\big(
688: (\mu_1 +\mu_3) \alpha_1 u_1(t)
689: + (\mu_2 +\mu_3) \alpha_2 u_2(t) + (\mu_1 +\mu_2) \alpha_3 u_3(t)\big)
690: \;.
691: \end{equation}
692: For Jukes-Cantor mutation, $\mu_1 = \mu_2 = \mu_3 \equiv \mu$, this reduces to
693: \begin{equation}
694: V_r(t) = \left(4 \mu + \frac{\text{d}}{\text{d}t}\right) w(t) 
695: \end{equation}
696: and $V_r$ is proportional to the mean fitness in the
697: mutation--selection equilibrium. Exact results are easily
698: found from the solution of the four-dimensional eigenvalue problem of
699: the one-site Hamiltonian. We only give the expression for the mean
700: fitness in the symmetric case, $\alpha_1 = \alpha_2 = \alpha_3 \equiv \alpha$
701: with Jukes-Cantor mutation scheme ($\mu_1 = \mu_2 = \mu_3 \equiv \mu$):
702: \begin{equation}
703: w(t) = 
704: \frac{\exp[2t(\alpha+\mu)]\cosh[2tQ]\left(\alpha-2\mu+2Q\tanh[2tQ]\right)
705: -\alpha-4\mu}{1+\exp[2t(\alpha+\mu)]\cosh[2tQ]}
706: \end{equation}
707: where
708: \begin{equation}
709: Q = \sqrt{\mu^2+\alpha^2 -\alpha\mu} 
710: \end{equation}
711: and the equidistribution of genotypes is chosen as starting configuration.
712: 
713: Means and variances of the fitness and the surplus in
714: mutation--selection balance are shown in Fig.~\ref{finite} below. 
715: A plot of the time evolution of fitness is given in Fig.~\ref{time2}. 
716: There is clearly no phase transition (resp.~no {\em error threshold} 
717: behaviour) for the additive Fujiyama landscape, as expected in view of
718: the complete absence of interactions (resp.\ epistasis).
719: 
720: 
721: %\begin{equation}
722: %w = \alpha \left(2\sqrt{\left(\frac{\mu}{\alpha}\right)^2 - 
723: %\frac{\mu}{\alpha} + 1} - 2\frac{\mu}{\alpha} +1\right) 
724: %\end{equation}
725: 
726: 
727: \subsection{Quadratic fitness model: Equilibrium results}
728: 
729: In contrast to the additive case, no simple relation between surplus
730: and fitness is known in the case of the quadratic landscape as
731: long as $t$ or $N$ are kept finite. However, due to the permutation
732: invariance of the Hamiltonian, the individual fitness--surplus
733: relation (\ref{fit}) is recovered in the thermodynamic limit
734: for the corresponding mean values of the equilibrium population. 
735: We obtain in analogy to \cite{BBW2}:
736: \begin{equation} \label{surrel}
737: w = \lim_{t \to \infty} w(t) = \sum_{i=1}^3 \left(\alpha_i u_i 
738: + \frac{\gamma_i}{2} u_i^2 \right)
739: \end{equation}
740: and, from (\ref{zeit}), for the equilibrium variance of fitness per site
741: \begin{multline} \label{variance}
742: V_r = \lim_{t \to \infty} V_r(t) = 
743: 2(\mu_1+\mu_3)\left(\alpha_1 u_1 + \gamma_1 u_1^2\right) +
744: \\
745: 2(\mu_2+\mu_3)\left(\alpha_2 u_2 + \gamma_2 u_2^2\right) + 
746: 2(\mu_1+\mu_2)\left(\alpha_3 u_3 + \gamma_3 u_3^2\right)\;.
747: \end{multline}
748: %\begin{equation}
749: %\textswab{h}= \mu_1\sigma^{(x,0)}+\mu_2 \sigma^{(0,x)} +\mu_3 \sigma^{(x,x)} +%\gamma_1 m_1 \sigma^{(z,0)} + \gamma_2 m_2 \sigma^{(0,z)} + \gamma_3 m_3 
750: %\sigma^{(z,z)}
751: %\end{equation}
752: The key to the solution in the thermodynamic limit is now the minimum
753: principle of the physical free energy which translates to a maximum
754: principle for the equilibrium mean fitness. Maximizing
755: \begin{equation}
756: \langle \bm{x} | {\cal M} + {\cal R} | \bm{x} \rangle - 
757: w \big(\langle \bm{x} |\bm{x}  \rangle -1\big)
758: \end{equation}
759: with respect to $w$ and $\bm{x}$, we obtain, taking permutation symmetry of 
760: $\bm{x}$ into account, the following variational expression for $w$:
761: \begin{equation} \label{fitm}
762: \begin{align} \nonumber
763: w(\bm{\alpha},\bm{\mu}&,\bm{\gamma}) \;\; = 
764: \sup_{m_1,m_2,m_3} \bigg[\alpha_1 m_1 + 
765: \alpha_2 m_2 + \alpha_3 m_3 + \frac{\gamma_1}{2} m_1^2 +
766: \frac{\gamma_2}{2} m_2^2 + \frac{\gamma_3}{2} m_3^2 +
767: \\ \nonumber
768: &\frac{\mu_1}{2}
769: \left(\sqrt{(1+m_2)^2-(m_1+m_3)^2}+\sqrt{(1-m_2)^2-(m_1-m_3)^2}-2\right)+
770: \\ \nonumber
771: &\frac{\mu_2}{2}
772: \left(\sqrt{(1+m_1)^2-(m_2+m_3)^2}+\sqrt{(1-m_1)^2-(m_2-m_3)^2}-2\right)+
773: \\
774: &\frac{\mu_3}{2}
775: \left(\sqrt{(1+m_3)^2-(m_1+m_2)^2}+\sqrt{(1-m_3)^2-(m_1-m_2)^2}-2
776: \right)\bigg]
777: \end{align}
778: \end{equation}
779: where $m_i \in [-1,1]$ are the components of the physical
780: magnetization. Let us stress that, from the biological point of view, 
781: the translation to the physical framework seems a necessary technical 
782: step since we do not know of any variational principle for the 
783: biological model which works directly in $L^1$. We now take a closer 
784: look at two special cases.
785: 
786: \paragraph{Symmetric fitness model} For the symmetric 
787: {\em wildtype--mutant} model with $\alpha_i \equiv \alpha$, 
788: $\gamma_i \equiv \gamma$ and Jukes-Cantor mutation rate $\mu$, 
789: all components of the order parameters are equal, 
790: $m_i \equiv m$ and $u_i \equiv u$, respectively. 
791: Here, the variational expression (\ref{fitm}) for $w$ leads to the 
792: following self-consistency condition for $m$:
793: \begin{equation} \label{sc}
794: m = \frac{1}{3}\left[ 1 + \frac{2(\alpha + \gamma m) - \mu}
795: {\sqrt{(\alpha + \gamma m)^2 - \mu(\alpha + \gamma m) + \mu^2}}\right]\;.
796: \end{equation}
797: This is a quartic equation in $m$ and can be solved using the
798: standard formulas. However, since the explicit solution is rather
799: lengthly, we do not include it here, but give a qualitative
800: discussion instead.
801: 
802: Obviously, the relation has a unique real solution for any $\alpha$ and
803: $\mu$ whenever $\gamma$ is {\em negative}. Like in the case of the 
804: two-state model, we thus obtain no phase transition for positive 
805: epistasis. In the following, we therefore concentrate our discussion 
806: on positive $\gamma$ (or negative epistasis). Note that, for
807: calculations in the thermodynamic limit, always the fitness function $f$
808: (\ref{fit}), and hence the reproduction matrix ${\cal R_\gamma}$
809: (\ref{rgamma}), can be used instead of the truncated form $\tilde{f}$
810: (\ref{fit2}), since the frequencies of genotypes with negative surplus 
811: vanish. For $\alpha_i \equiv 0$, this is due to spontaneous breaking of 
812: the extra $C_2 \times C_2$ symmetry of 
813: ${\cal H} = {\cal M} + {\cal R_\gamma}$. 
814: 
815: In contrast to the two-state model, where a phase transition in the
816: thermodynamic limit is only found for zero external field, it turns
817: out that the present model has phase transitions for a whole range of
818: the linear fitness parameter $\alpha$ when epistasis is negative: 
819: For $\tilde{\alpha} := \alpha/\gamma$ in the interval
820: \begin{equation}
821: 0 \le \tilde{\alpha} < \frac{1}{3}
822: \left(\sqrt{\frac{4}{3}}-1\right) \simeq 0.0515668
823: \end{equation}
824: we find a first order phase transition of the system at
825: \begin{equation}
826: \tilde{\mu} := \frac{\mu}{\gamma} = \tilde{\mu}_c = \frac{2}{3}
827:  + 2 \tilde{\alpha}
828: \end{equation}
829: with a finite jump in the magnetization from $m_+$ to $m_-$ where
830: \begin{equation}
831: m_\pm = \frac{1}{3}\left(1 \pm
832: \sqrt{1 - 27 \tilde{\alpha}^2 - 18\tilde{\alpha}}\right)\;.
833: \end{equation}
834: From $m$ we derive the mean fitness $w$ using (\ref{fitm}), from $w$
835: we obtain the surplus $u$ via (\ref{surrel}) and, finally, the variance of
836: the fitness $V_r = 12\mu(\alpha u +\gamma u^2)$.
837: Looking at the surplus $u$, we also find a phase transition at 
838: $\tilde{\mu}= \tilde{\mu}_c$. As $m$, it vanishes in the disordered
839: phase for $\alpha = 0$. Note however that, since $w$ is continuous,
840: due to the relation (\ref{surrel}), also the surplus is continuous at a phase 
841: transition. In \cite{BBW2} it has been shown that these differences of the
842: biological and physical order parameters arise with the change from classical
843: to quantum mechanical probabilities (resp.\ the change from $L^1$ to $L^2$) 
844: in translating the biological model into the physical one. We remark
845: that a different, discontinuous behaviour of the biological order 
846: parameter at a (physical) first order transition has been observed for 
847: the sharply peaked landscape in Eigen's quasispecies model \cite{FP}.
848: Mean fitness and its variance, magnetization, and surplus for different
849: values of $\alpha$ are shown below in Fig.~\ref{JC}.  
850: 
851: \begin{figure}[th]
852: \centerline{\epsfxsize=65mm \epsfysize=55mm \epsfbox{symfit.ps} 
853: \epsfxsize=65mm  \epsfysize=55mm \epsfbox{symvarfit.ps}}
854: \centerline{\epsfxsize=65mm \epsfysize=55mm \epsfbox{symsur.ps} 
855: \epsfxsize=65mm  \epsfysize=55mm \epsfbox{symmag.ps}}
856: \caption{Mean fitness and its variance, surplus and magnetization in
857:   the symmetric fitness model for various linear parts of the fitness
858:   function in the infinite sites limit.}
859: \label{JC}
860: \end{figure}
861: 
862: 
863: 
864: \paragraph{Transition--transversion model} In our second example, we
865: wish to distinguish mutations between like and unlike nucleotides. In 
866: a first step, we retain the symmetric fitness landscape 
867: $\gamma_1 = \gamma_2 = \gamma_3 \equiv \gamma$ (for simplicity
868: with vanishing linear part $\alpha = 0$), but let the relative
869: frequencies of transitions and transversions differ by assuming the 
870: {\em Kimura 2 parameter} mutation scheme, 
871: $\mu_1 = \mu_3 \equiv \mu \neq \mu_2$. 
872: 
873: \begin{figure}[ht]
874: \centerline{\epsfysize=60mm \epsfbox{nor1.ps}}
875: \caption{Phase diagram of the transition--transversion model with 
876: with symmetric fitness landscape and Kimura 2 parameter mutation
877: scheme. Solid and dotted lines correspond to first and second order
878: phase transitions, respectively. The dashed line indicates the  
879: Jukes-Cantor mutation scheme.}
880: \label{pd1}
881: \end{figure}
882: In the extended parameter space of the reduced mutation rates
883: $\tilde{\mu} = \mu/\gamma$; $\tilde{\mu}_2 = \mu_2/\gamma$, we now 
884: obtain a phase diagram with {\em three} distinct phases 
885: (see Fig.~\ref{pd1}).
886: \begin{itemize}
887: \item
888: For $\tilde{\mu}$ and $\tilde{\mu}_2$ sufficiently small, 
889: all three surplus components 
890: are positive, indicating genetic order with respect to the entire 
891: 4-letter alphabet of the nucleotides: {\em ACGT phase}.
892: \item
893: If we increase the mutation rate $\tilde{\mu}_2$ for low $\tilde{\mu}$, 
894: the system crosses over to a phase which does no longer distinguish 
895: between the different kinds of purins (A,G) and pyrimidins (C,T), but 
896: is still ordered with respect to transversions. This is the limiting
897: case described by the two-state model. We call this the {\em PP phase}.
898: \item
899: For higher mutation rates $\tilde{\mu},\tilde{\mu}_2$, we finally enter a
900: completely {\em disordered phase} with vanishing fitness and surplus.
901: \end{itemize}
902: In a second step, we now also let the mutation effects of transitions
903: and transversions differ and assume a fitness landscape
904: with $\gamma_2 = \gamma_3 \equiv \gamma$, but $\gamma_1 \neq \gamma$
905: in general. The changes in the phase diagram for increasing 
906: $\tilde{\gamma}_1 = \gamma_1/\gamma$ are shown in Fig.~\ref{pd2}. 
907: The phase transitions between the three phases may be first or second 
908: order. In general, we obtain the following phase space structure:
909: \begin{itemize}
910: \item
911: Phase transitions between the disordered and PP phase are second order and
912: located on the line $\tilde{\mu} = \tilde{\gamma}_1/2$. This phase
913: transition corresponds to the one also seen in the two-state model \cite{BBW}.
914: \item
915: The phase transition line between the ACGT and PP phases in 
916: general changes from first to second order with increasing 
917: mutation rate $\tilde\mu_2$ (see Figs.~\ref{pd1}, \ref{pd2}). 
918: For the second order transitions we derive, on 
919: expanding (\ref{fitm}) to lowest order in $m_2 = m_3$,
920: \begin{equation}
921: \mu = \frac{\gamma_1}{\gamma_1 + 2\gamma}
922: \sqrt{(\gamma_1 + \mu_2)(2\gamma-\mu_2)} \;.
923: \end{equation}
924: Numerically, we find that the first order transitions are 
925: located on a straight 
926: line up to $\tilde{\mu} = \tilde{\gamma}_1/2$ where the PP phase
927: changes into the disordered phase. The $\tilde{\mu}_2$-interval of
928: first-order transitions decreases for increasing $\tilde{\gamma}_1$. 
929: For $\tilde{\gamma}_1 \gtrapprox 8.45$, all phase transitions 
930: between the ACGT and PP phases are second order.
931: \item
932: Finally, for $\tilde{\gamma}_1 \le 4$, there are direct first order 
933: phase transitions between the ACGT phase and the disordered phase 
934: (for $\tilde{\mu}_2$ sufficiently small). For higher values of 
935: $\tilde{\gamma}_1$, these two phases are separated by the PP phase.
936: \end{itemize}
937: 
938: \begin{figure}[ht]
939: \centerline{\epsfxsize=43mm\epsfysize=35mm \epsfbox{pdg2.ps}
940: \epsfxsize=43mm\epsfysize=35mm \epsfbox{pdg4.ps}
941: \epsfxsize=43mm\epsfysize=35mm \epsfbox{pdg10.ps}}
942: \caption{Phase diagrams for anisotropic fitness landscapes $\gamma_1 > 
943: \gamma_2 = \gamma_3 \equiv \gamma$ and Kimura 2 parameter mutation
944: scheme. Solid and dotted lines correspond to first and second order
945: phase transitions, respectively.}
946: \label{pd2}
947: \end{figure}
948: As for the symmetric fitness function discussed above, there are no
949: compact analytic expressions for the fitness or the surplus in the
950: ACGT phase. In the PP phase, however, the following values for 
951: the mean fitness and the non-zero components of the mean surplus and the 
952: magnetization are found: 
953: \begin{equation}
954: w = \frac{\gamma_1}{2} \left(1 - \frac{2\mu}{\gamma_1}\right)^2  \quad ; \quad 
955: u_1 = 1 - \frac{2\mu}{\gamma_1} \quad ; \quad 
956: m_1 = \sqrt{1- \left(\frac{2\mu}{\gamma_1}\right)^2}\;. 
957: \end{equation}
958: The variance in fitness per site, finally, is proportional to the mean
959: fitness in the PP phase: $V_r = 8 \mu w$. Note that all these
960: expressions are independent of the transition rate $\mu_2$ and
961: directly comparable to the results of the two-state model
962: \cite{BBW,WBG} by idebtifying $\{++,+-\}$ with `$+$' and $\{-+,--\}$
963: with `$-$'. 
964: 
965: 
966: \subsection{Quadratic fitness model: Finite sequence length} \label{fs}
967: 
968: For the Fujiyama model with independent sites, all the quantities
969: calculated here, means and variances per site in infinite populations, 
970: are independent of the assumed length $N$ of the sequences.
971: This is no longer the case for models including epistasis. In this 
972: subsection, we therefore present a quick numerical investigation of the 
973: symmetric fitness model
974: for finite system sizes and compare the results with those in the 
975: thermodynamic limit. Since the frequencies of genotypes with negative 
976: values of the surplus no longer vanish for finite sequences, we use
977: the truncated fitness function (\ref{fit2}), with $\gamma_i \equiv
978: \gamma > 0$ and $\alpha_i = 0$ for our calculations.
979: 
980: All results are obtained by direct numerical solution of the eigenvalue 
981: problem in the $[(N+1)(N+2)(N+3)/6]$-dimensional vector space of 
982: permutation invariant population vectors. Numerically precise
983: calculations have been performed up to $N = 60$ (39711-dim.), the results
984: are shown in Fig.~\ref{finite}. It is seen that the mean surplus and
985: the mean and the variance of the fitness rapidly approach the limiting
986: curves and behave qualitatively different from the Fujiyama model
987: even for very small system sizes. We also show the finite-size
988: behaviour of the variance of the surplus $V_s$. Since this quantity 
989: vanishes as $1/N$, it is not obtainable from the leading order terms 
990: in the thermodynamic 
991: limit. In our finite size calculations, we rescale $V_s$ with the 
992: sequence length to obtain comparable results. Whereas $V_s$ is
993: monotonously increasing for the additive model (where $N V_s = 1-
994: u^2$), it runs through a maximum for quadratic fitness. Note that this 
995: maximum, in contrast to the variance of fitness, is located directly
996: at the error threshold. The behaviour is qualitatively similar to the
997: two-state model \cite{Oli}.
998: 
999: \begin{figure}[ht]
1000: \centerline{\epsfxsize=65mm \epsfysize=55mm \epsfbox{fit2.ps} 
1001: \epsfxsize=65mm  \epsfysize=55mm \epsfbox{varfit.ps}}
1002: \centerline{\epsfxsize=65mm \epsfysize=55mm \epsfbox{sur.ps} 
1003: \epsfxsize=65mm  \epsfysize=55mm \epsfbox{varsur.ps}}
1004: \caption{Equilibrium behaviour of fitness and surplus of the symmetric
1005:   fitness model with finite sequence length. Results for the Fujiyama
1006:   model with scaling $\alpha = \gamma/2$ are also shown.}
1007: \label{finite}
1008: \end{figure}
1009: Since there has been some discussion recently on the correct scaling 
1010: of fitness values and mutation rates with the length of the sequence (cf
1011: \cite{FP,BG}), let us finally remark that the finite size results in
1012: this and the next section show that our choice, keeping fitness and
1013: mutation rate {\em per site} fixed, is adequate for all quantities
1014: considered here. 
1015: 
1016: 
1017: \subsection{Quadratic fitness model: Time evolution}
1018: 
1019: Originally, the error threshold has been defined as an equilibrium 
1020: phenomenon (cf \cite{ECS,BG}): For special forms of the fitness
1021: landscape, there is a finite critical value $\mu_c$ of the mutation 
1022: rate beyond which genetic order is no longer maintained by selection. 
1023: For the four-state model with quadratic fitness, this situation has been 
1024: discussed above.
1025: However, for a suitable fitness function, the threshold
1026: is not necessarily connected with high mutation rates. 
1027: In this subsection, 
1028: we consider the relaxation of a non-equilibrium population to 
1029: mutation-selection balance. It turns out that, depending on the
1030: starting configuration, an even stronger threshold effect may be 
1031: observed in the time evolution of the fitness and the surplus for
1032: all mutation rates below the critical equilibrium value.
1033: 
1034: \paragraph{Zero-mutation limit of the transition-transversion model}
1035: The essence of the threshold phenomenon in the time evolution is
1036: already contained in the selection dynamics alone. In a first step, we
1037: therefore disregard mutation altogether by working in the
1038: zero-mutation limit. Obviously, we then deal with a classical
1039: mean-field model on the physical side. As our starting configuration, 
1040: we choose the completely unstructured population with an equidistribution of
1041: genotypes $|\bm{p}_0\rangle = 4^{-N}|\Omega\rangle$. 
1042: In this particular situation, some progress is possible also
1043: analytically. Noting that
1044: \begin{equation}
1045: \langle \hat{C} \rangle(t) = 
1046: \frac{\langle \Omega|\hat{C} \exp(t {\cal
1047:     R})|\Omega\rangle}{\langle \Omega|\exp(t {\cal R})|\Omega\rangle}
1048:  = \frac{\text{tr}(\hat{C} \exp(t {\cal R}))}{\text{tr}(\exp(t {\cal R}))}
1049: \end{equation}
1050: for any element $\hat{C}$ of the algebra generated by
1051: $\{\sigma_i^{(z,0)},\sigma_i^{(0,z)}\}$, the biological and physical
1052: pictures coincide in this case. Using the fitness function
1053: of the transition-transversion model with 
1054: $\gamma_2 = \gamma_3 \equiv \gamma > 0$, we obtain the
1055: following implicit equations for the time evolution of the surplus 
1056: components:
1057: \begin{eqnarray}
1058: u &=& \frac{\sinh(2\gamma t u)} {\cosh(2\gamma t u) + 
1059: \exp[ -2\gamma_1 t(2u\coth(2\gamma t u) -1)]}
1060: \\[1mm]
1061: u_1 &=& \frac{\cosh[\gamma t Q(u_1)] - \exp(-2\gamma_1 t u_1)} 
1062: {\cosh[\gamma t Q(u_1)] + \exp(-2\gamma_1 t u_1)}
1063: \end{eqnarray}
1064: where
1065: \begin{equation}
1066: Q(u_1) = \sqrt{(1+u_1)^2-\exp(4\gamma_1 t u_1)(1-u_1)^2}\;.
1067: \end{equation}
1068: The resulting dynamical phase diagram is shown in Fig.~\ref{time1}. 
1069: As in the equilibrium situation, there are three phases.
1070: Depending on the ratio $\tilde{\gamma}_1 = \gamma_1/\gamma$, the 
1071: system directly crosses to an ordered phase after a sharply defined 
1072: waiting time $t_c$, or performs two consecutive transitions, entering 
1073: the PP phase in the first one.  
1074: 
1075: \begin{figure}[ht]
1076: \centerline{\epsfxsize=65mm \epsfysize=55mm \epsfbox{zeitpd.ps}
1077: \epsfxsize=65mm \epsfysize=55mm \epsfbox{surzg2.ps}}
1078: \caption{Dynamical phase diagram of the transition-transversion model
1079:   for vanishing mutation starting from the equidistribution. (Solid: 
1080:   first order; dashed: second order transition). Right: Time 
1081:   evolution of the surplus components for $\tilde{\gamma}_1 = 2$.}
1082: \label{time1}
1083: \end{figure}
1084: As in the equilibrium phase diagram, the dynamical transitions may 
1085: be of first or second order. 
1086: \begin{itemize}
1087: \item
1088: Second order transitions are located at
1089: $\tilde{t} = \gamma t = 1$ for $\tilde{\gamma} \le 1/4$ and at 
1090: $\tilde{t} = 1/\tilde{\gamma}_1$ for the transition from the
1091: disordered phase to the PP phase. The transition from the PP phase to
1092: the ACGT phase is second order above $\tilde{\gamma}_1 \approx 1.9009$
1093: and implicitly given through $2\tilde{t}_c = 1 +
1094: \exp[2\tilde{\gamma}_1(\tilde{t}_c - 1)]$. A similar second order 
1095: transition (with a one-component order parameter) has also been
1096: observed in the two-state model \cite{Wag,WBG}. 
1097: \item
1098: In an interval around the symmetry point $\gamma_1 = \gamma$, the
1099: system possesses a first order transition (in the sense that there is a
1100: finite jump in the magnetization). Note that, in contrast to the 
1101: equilibrium
1102: case, also the surplus and even the mean fitness are discontinous on
1103: this line, giving rise to a rather pronounced threshold effect in the
1104: evolution dynamics (cf.\ the solid line in Fig.~\ref{time2} 
1105: for $\tilde{\gamma} = 1$). 
1106: \end{itemize}
1107: As for the equilibrium values, we also consider the effect of finite
1108: sequence lengths on the time evolution. Again, calculations are
1109: performed by direct diagonalization of the symmetric fitness model 
1110: ($\tilde{\gamma} = 1$). Fig.~\ref{time2} shows how the jump 
1111: discontinouity in the mean fitness (internal energy) and the 
1112: delta function singularity in the variance of the fitness (specific heat)
1113: are approached by the finite systems. A threshold phenomenon is absent 
1114: in the time evolution of the Fujiyama model which is also shown 
1115: in Fig.~\ref{time2}.
1116: \begin{figure}[ht]
1117: \centerline{\epsfxsize=65mm \epsfysize=55mm \epsfbox{fitzh0.ps} 
1118: \epsfxsize=65mm  \epsfysize=55mm \epsfbox{varfitzh0.ps}}
1119: \caption{Time evolution of the equidistribution of genotypes
1120:   in the zero mutation-limit of the symmetric fitness model for different
1121:   sequence lengths.}
1122: \label{time2}
1123: \end{figure}
1124: 
1125: 
1126: 
1127: 
1128: %\begin{figure}[ht]
1129: %\centerline{\epsfxsize=60mm \epsfysize=50mm \epsfbox{zvarfit1.ps} 
1130: %\epsfxsize=60mm  \epsfysize=50mm \epsfbox{zvarfit2.ps}}
1131: %\caption{Time evolution of the symetric fitness model for different
1132: %  starting configurations.}
1133: %\label{time1}
1134: %\end{figure}
1135: 
1136: 
1137: \paragraph{Finite mutation rates and different starting configurations}
1138: In a last step, we now discuss the influence of the mutation rate and
1139: the starting configuration on the evolution dynamics. Consider first the
1140: time evolution of the equilibrium distribution of genotypes
1141: $4^{-N}|\Omega\rangle$. Although no analytical results are available here, 
1142: we may give the following intuitive argument that there is a phase 
1143: transition at finite $t = t_c$ for any mutation rate below the
1144: critical equilibrium mutation rate $\mu_c$: Since mutation alone tries to
1145: keep the population in the equilibrium distribution, the evolution
1146: dynamics will be slowed down by mutation for small $t$. In particular,
1147: mean fitness and surplus will remain zero on a finite interval at
1148: least up to the threshold value of the corresponding model with
1149: vanishing mutation. On the other hand, the limiting values of $w$ and
1150: $u$ are finite for $\mu < \mu_c$, giving rise to a non-analytical
1151: point of $w(t)$ and $u(t)$ at some finite $t = t_c$. As shown in the
1152: upper graph of Fig.~\ref{time3}, this behaviour is clearly visible in 
1153: numerical results for finite sequence sizes. 
1154: \begin{figure}[ht]
1155: \centerline{\epsfxsize=130mm \epsfysize=45mm \epsfbox{zvarfit1a.ps}}
1156: \vspace*{-12mm}
1157: \centerline{\epsfxsize=130mm  \epsfysize=90mm \epsfbox{zvarfit2b.ps}}
1158: \caption{Time evolution of the variance of the fitness in the symmetric 
1159: fitness model with sequence length $N=60$. Results are shown for
1160: varying mutation rates and two different starting configurations.}
1161: \label{time3}
1162: \end{figure}
1163: 
1164: In order to contrast the time evolution of the unstructured population with
1165: an equidistribution of genotypes as starting configuration, we have
1166: also performed calculations for the opposite case of a population with
1167: initially homogeneous phenotypes. Here, at $t=0$, any "individual" 
1168: in the population has the same value $s_i = 0$ for the three surplus 
1169: components. The result (for finite sequence length $N=60$) is shown 
1170: in the lower viewgraph of Fig.~\ref{time3}. As for the
1171: equidistribution of
1172: genotypes, there is a clear threshold effect in the time evolution for 
1173: any finite value $0<\mu<\mu_c$ of the mutation rate. The transition 
1174: appears to be particularly sharp for small mutation rates. In contrast to the
1175: unstructured case, the critical waiting time $t_c$ for the transition
1176: is no longer monotonously increasing with the mutation rate $\mu$, but
1177: is separated in two regimes: For mutation rates near the equilibrium 
1178: threshold value $\mu_c$, the situation is similar to the unstructured
1179: case: Here, single mutants with higher fitness appear in the
1180: population after a short while. Due to the continuing mutation
1181: pressure, however, a certain time is needed for these fitter
1182: individuals to grow to a finite proportion and to dominate the mean
1183: values in the infinite population. For small $\mu$, on the other hand, 
1184: the critical waiting time $t_c$ is dominated by the time needed for
1185: mutation to explore the configuration space and to generate 
1186: individuals with higher fitness at a sufficient rate. 
1187: 
1188: 
1189: \section{Discussion}
1190: 
1191: When in \cite{BBW} a class of models for sequence space evolution was 
1192: introduced, using the framework of Ising quantum chains, the calculations 
1193: started with four major simplifications of the biological situation. 
1194: These are the consideration of a two-state model, the assumption of an
1195: infinite sequence length, the use of simplistic fitness landscapes,
1196: and the restriction on infinite population sizes. In this paper, we
1197: have looked at the first two of these simplifying assumptions. 
1198: Finally, an extended discussion of the evolution dynamics of these 
1199: models has also been presented. In the following paragraphs, we give 
1200: a summary of our findings and an outlook on the remaining open problems.
1201: 
1202: \paragraph{Two-state versus four-state models.}
1203: The main concern of this contribution is the generalization of the
1204: modelling framework, introduced in \cite{BBW}, to four states
1205: (corresponding to the four nucleotides) on each site. The
1206: generalization presented makes use of the $C_2 \times C_2$ symmetry
1207: inherent in the {\em Kimura 3 ST} mutation scheme. On the `physical 
1208: side' this leads to a model of two coupled Ising quantum chains
1209: (rather than to a four-state Potts model). Compared with the two-state
1210: model, the extension can be thought of as consisting of two steps. In
1211: a first step, we represent the four states on each site by the spin
1212: values of two spins in decoupled chains. Note that already in this
1213: simplified model three phases occur in the phase diagram since the
1214: transition lines of the two decoupled chains will not in general 
1215: coincide. The second step consists of the introduction of 
1216: a more realistic mutation scheme which also changes the configuration 
1217: space topology and the corresponding use of a refined fitness landscape.
1218: Both these extensions lead to a coupling of the chains, and an even
1219: richer phase space structure is found, including first-order transitions.
1220: As may be seen from the introduction of a small linear field term into the
1221: fitness function in subsection 4.2, this change of the transition to
1222: first order leads to an increased robustness of the threshold
1223: phenomena with respect to symmetry-breaking perturbations.
1224: 
1225: \paragraph{Finite sequence length.}
1226: Typical sequence lengths of enzymes or viruses are of the order $10^3$
1227: -- $10^4$. While these numbers are certainly far off the typical sizes of
1228: macroscopic systems in physics, they are, in principle, large enough
1229: to successfully supress $1/N$-corrections. However, especially models 
1230: with simple fitness landscapes describe -- at best -- the evolution 
1231: dynamics in a very restricted configuration space of particularly 
1232: `important' sites, disregarding neutral or altogether lethal
1233: mutations. In view of this fact, consideration of finite sequence 
1234: lengths is indispensible and calculations in the thermodynamic
1235: limit even seem to be questionable at first sight. In order to clarify
1236: the usefulness of infinite-size methods in this context, we performed
1237: a number of numerical calculations for finite sequence lengths. The
1238: results are quite encouraging. As shown in subsection \ref{fs}, the 
1239: characteristic properties of the thermodynamic limit are well visible 
1240: even for tiny sequence sizes, such as $N = 10$, and the approximation 
1241: is already quantitatively reasonable for sequences of length $60$.   
1242: 
1243: \paragraph{The fitness landscape.}
1244: The construction of a tractable fitness landscape which nevertheless 
1245: comprises the relevant biology is certainly the major task for all
1246: these models. In this contribution, in order to obtain at least some 
1247: analytical 
1248: results, we have chosen a fitness function from the smooth end of the 
1249: landscape zoo. Due to its permutation invariance, the quadratic
1250: fitness function effectively disregards any local variance in
1251: the interaction between sites, but only considers the average epistatic
1252: effect. As such, it is in many respects certainly no more than a 
1253: toy-model for evolution. However, the assumption of permutation
1254: invariance of the sites is quite common in evolutionary biology and
1255: comprises a large number of standard models for evolution, such as the
1256: quadratic optimum model or Eigen's original sharply peaked landscape.
1257: The results show that the essential structure responsible
1258: for characteristic effects such as the error threshold is already
1259: contained in this simplified framework and may 
1260: also serve as a reference for future work on fitness functions
1261: with increased ruggedness, such as the NK-landscape hierarchy \cite{KL}.
1262: Here, we expect the results for the quadratic fitness model to be
1263: qualitatively stable at least under certain forms of mild ruggedness,
1264: such as the introduction of site-randomness in the fields and
1265: interactions \cite{DK}. Pronounced changes, on the other hand, should
1266: be expected when spin-glass effects come into play.
1267: 
1268: \paragraph{Finite population size.}
1269: In going from the deterministic limit to the evolution of finite 
1270: populations, the ordinary differential equation (\ref{paramuse}) has 
1271: to be replaced by the master equation of a stochastic process which is
1272: no longer covered by the theoretical framework presented in this
1273: article. Due to the complexity of the stochastic equations, analytical results
1274: seem to be out of reach at present for all but the simplest selection
1275: schemes. Monte-Carlo simulations, however, should be possible and
1276: could considerably add to theoretical insight here.
1277: 
1278: Although the general picture of the deterministic case should persist
1279: at least for sufficiently large populations, the study of finite
1280: population effects is certainly of importance.
1281: For related models, such as the quasispecies model with the
1282: {\em single peaked} landscape, it is has been found \cite{NS}
1283: that the deterministic 
1284: results can be interpreted as the time averages of the stochastic
1285: process for mutation rates outside a certain interval around an error 
1286: transition. Directly at the threshold, however, large fluctuations and 
1287: a jump in the long-time averages appear in the stochastic system at a critical
1288: mutation rate which seems to be lower by an amount roughly
1289: proportional to $1/\sqrt{N}$ in comparison with the deterministic case.
1290: Mainly because of these expected finite population effects we have
1291: restricted discussions in this article entirely to the phase space 
1292: structure of the models and the order of the phase transitions. Any further 
1293: details of the transitions, even critical exponents, will presumably 
1294: never be visible in real biological systems and thus seem to be
1295: of limited relevance in this context.
1296: 
1297: Let us finally remark that, although biological populations are
1298: certainly finite, the consideration of the infinite population limit
1299: is not (only) a technical necessity, but also of direct importance for the 
1300: study of the error threshold. That is so because this effect, in distinction
1301: to the phenomenon of Muller's ratchet, is {\em by definition} not due to 
1302: genetic drift, but solely due to the form of the fitness function. It
1303: has thus always to be shown that the threshold effect persists even
1304: for infinitly large population sizes. 
1305: 
1306: 
1307: \paragraph{Error threshold behaviour.} 
1308: 
1309: Since there are more than one and sometimes conflicting definitions of
1310: the error threshold in literature (cf.\ the discussion in \cite{BG}), 
1311: let us start this paragraph with a few clarifying remarks. In this 
1312: article,
1313: following \cite{BG}, we use the notion of the error threshold as
1314: equivalent to phase transitions. As such, a clear-cut mathematical
1315: definition (as non-analytical points in the mean fitness) is possible
1316: only in the infinite sites (or thermodynamic) limit. However, since
1317: the thermodynamic limit can be considered as an excellent
1318: approximation already for rather small systems, the infinite system
1319: property gives a valid explanation for prominent features which are
1320: observable for finite sequences as well. In our study, we have always
1321: considered sequences of a fixed length and have treated the mutation
1322: rate per site as the variable driving the transition. In comparing
1323: systems of different length, we have scaled the variables such that a
1324: well-defined limit is approached as $N \to \infty$. In particular, the 
1325: `critical' mutation rate per site in a finite system quickly converges
1326: to the limiting value $\tilde{\mu}_c$. 
1327: Originally, the threshold has been viewed as a limitating factor on 
1328: the sequence length \cite{E}. This, however, should not be confusing: 
1329: We switch to this latter picture simply by letting the reduced 
1330: mutation rate depend linearly on the sequence length, 
1331: $\tilde{\mu} \sim N$, and obtain a critical length 
1332: $N_c \sim \tilde{\mu}_c$ (for sufficiently large sequences).
1333: 
1334: Our results on the error threshold phenomenon fit previous ones for
1335: the two-state case and related models in that negative epistasis is
1336: needed to observe a transition (cf.\ \cite{W,BG}).
1337: Contrary to the two-state case, the threshold corresponds to a
1338: first-order transition for certain parameter ranges and persists for
1339: a sufficiently small linear part in the fitness function. Both, the
1340: equilibrium and the dynamical phase diagram of the 
1341: transition-transversion model (with $\alpha_i = 0$), 
1342: possess two ordered phases characterized by non-zero values of one or 
1343: all three components of the surplus order-parameter and the disordered 
1344: phase with zero surplus where selection ceases to operate. The
1345: threshold effect appears to be especially sharp in the evolution
1346: dynamics, where a jump in the mean surplus and fitness and a delta
1347: singularity in the variance of fitness occurs.
1348: 
1349: Besides the threshold effect, however, other properties of
1350: mutation-selection models may be studied within the framework
1351: presented. After all, exclusive concentration on phase
1352: transitions is perhaps too much a physicist's point of view on these
1353: systems. The relations between surplus, mutation rate and the variance of
1354: fitness (\ref{zeit}), (\ref{variance}), for example, are valid for the entire
1355: time evolution and arbitrary mutation rates. Depending on the fitness 
1356: function applied, they may give rise to characteristic features also 
1357: far off the transition point. This is particularly explicit for the
1358: equilibrium variance of fitness which runs through a pronounced
1359: maximum for fitness functions with negative epistasis at a mutation
1360: rate much smaller than the threshold value.
1361: 
1362: \section*{Acknowledgments}
1363: 
1364: It is our pleasure to thank Ellen Baake and Oliver Redner for numerous
1365: discussions and comments on the manuscript. Financial support from the
1366: German Science Foundation (DFG) is gratefully acknowledged.
1367: 
1368: %\appendix{Threshold criterion for the symmetric model}
1369: 
1370: %In the 
1371: 
1372: %\begin{equation}
1373: %f(\bm{\sigma}) := 3N \sum_{n=0}^\infty \left(\frac{c_n^{}}{n} 
1374: %s^n(\bm{\sigma}) \right) \;;\quad  s_1 = s_2 = s_3 = s \;.     
1375: %\end{equation}
1376: 
1377: %\begin{equation}
1378: %hkgjkgh
1379: %\end{equation}
1380: 
1381: 
1382: 
1383: 
1384: 
1385: \begin{thebibliography}{99}
1386: \bibitem{B}
1387: E.\ Baake,
1388:    Diploid models on sequence space,
1389:    {\it J.\ Biol.\ Syst.\/} {\bf 3} (1995) 343--9.
1390: \bibitem{BBW} 
1391: E.\ Baake, M.\ Baake and H.\ Wagner,
1392:    Ising quantum chain is equivalent to a model of biological evolution,
1393:    {\it Phys.\ Rev.\ Lett.\/} {\bf 78} (1997) 559--62; Erratum: 
1394:    {\it Phys.\ Rev.\ Lett.\/} {\bf 79} (1997) 1782.
1395: \bibitem{BBW2}
1396: E.\ Baake, M.\ Baake and H.\ Wagner, 
1397:    Quantum mechanics versus classical propability in biological evolution,
1398:    {\it Phys.\ Rev.\/} {\bf E57} (1998) 1191--2.
1399: \bibitem{BG}
1400: E.\ Baake and W.\ Gabriel,
1401:    Biological evolution through mutation, selection, and drift: An introductory
1402:    review,
1403:    {\it Ann.\ Rev.\ Comput.\ Phys.\/} {\bf 7} 
1404:    ({\em in press}, cond-mat/9907372).
1405: \bibitem{CK}
1406: J.\ Crow and M.\ Kimura,
1407:    {\em An Introduction to Population Genetics Theory}, Harper \& Row
1408:    (New York 1970).
1409: \bibitem{DK}
1410: N.G.~Duffield and R.~K\"uhn,
1411:    The thermodynamics of site-random mean-field quantum spin systems,
1412:    {\it J.\ Phys.\/} {\bf A22} (1989) 4643--58.
1413: \bibitem{E}
1414: M.\ Eigen,
1415:    Selforganization of matter and the evolution of biological
1416:    macromolecules,
1417:    {\it Naturwiss.\/} {\bf 58} (1971) 465--523.
1418: \bibitem{ECS}
1419: M.\ Eigen, J.\ McCaskill and P.\ Schuster,
1420:    The molecular quasi-species,
1421:    {\it J.\ Chem.\ Phys.\/} {\bf 75} (1989) 149--263.
1422: \bibitem{Fish}
1423: R.A.~Fisher,
1424:    {\em The Genetical Theory of Natural Selection}, Clarendon Press
1425:    (Oxford 1930).
1426: \bibitem{FP}
1427: S.~Franz and L.~Peliti,
1428:    Error threshold in simple landscapes,
1429:    {\it J.~Phys.\/} {\bf A26} (1993) 4481--7.
1430: \bibitem{FPS}
1431: S.~Franz, L.~Peliti, and M.~Sellitto,
1432:    An evolutionary version of the random energy model,
1433:    {\it J.\ Phys.\/} {\bf A26} (1993) L1195--9.
1434: \bibitem{Gal}
1435: S.~Galluccio,
1436:    Exact solution of the quasispecies model in a sharply-peaked
1437:    landscape,
1438:    {\it Phys.\ Rev.\/} {\bf E56} (1997) 4526--39. 
1439: \bibitem{KL}
1440: S.A.~Kauffmann and S.A.~Levin,
1441:    Towards a general theory of adaptive walks on rugged landscapes,
1442:    {\it J.\ Theor.\ Biol.\/} {\bf 128} (1987) 11--45.
1443: \bibitem{Kogut}
1444: J.~Kogut,
1445:    An introduction to lattice gauge theory and spin systems,
1446:    {\it Rev.\ Mod.\ Phys.\/} {\bf 51} (1979) 656--713.
1447: \bibitem{Leut}
1448: I.\ Leuth\"ausser,
1449:    An exact correspondence between Eigen's evolution model and a 
1450:    two-dimensional Ising system,
1451:    {\it J.\ Chem.\ Phys.\/} {\bf 84} (1986) 1884--5.
1452: \bibitem{Leut2}
1453: I.~Leuth\"ausser,
1454:    Statistical mechanics of Eigen's evolution model,
1455:    {\it J.~Stat.~Phys.\/} {\bf 48} (1987) 343--60.
1456: \bibitem{Li}
1457: W.-H.\ Li,
1458:    {\it Molecular Evolution}, Sinauer (Sunderland, 1997).
1459: \bibitem{MT}
1460: K.~Malarz and D.~Tiggemann,
1461:    Dynamics in Eigen's evolution model,
1462:    {\it Int.\ J.\ Mod.\ Phys.\/} {\bf C9} (1997) 481--90.
1463: \bibitem{NS}
1464: M.~Nowak and P.~Schuster,
1465:    Error thresholds of replication in finite populations. Mutation
1466:    frequencies and the onset of Muller's ratchet.
1467:    {\it J.\ Theor.\ Biol.\/} {\bf 137} (1989) 375--95.
1468: \bibitem{OB}
1469: P.~O'Brien,
1470:    A genetic model with mutation and selection,
1471:    {\em Math.~Biosci.\/} {\bf 73} (1985) 239--51.
1472: \bibitem{Oli}
1473: O.\ Redner,
1474:    {\em private communication} (1999).
1475: \bibitem{Sta}
1476: P.\ Stadler,
1477:    Landscapes and their correlation functions,
1478:    {\em J.\ Math.\ Chem.\/} {\bf 20} (1996) 1--45.
1479: \bibitem{SOWH}
1480: D.\ Swofford, G.\ Olsen, P.\ Waddell and D.\ Hillis,
1481:    Phylogenetic inference, in: M.\ Hillis, C.\ Moritz and E.\ Mable (Eds.): 
1482:    {\em Molecular Systematics}, Sinauer (Sunderland, 1995), pp.\ 407--517.
1483: \bibitem{Tara}
1484: P.~Tarazona,
1485:    Error thresholds for molecular quasispecies as phase
1486:    transitions: From simple landscapes to spin-glass models,
1487:    {\it Phys.\ Rev.\/} {\bf A45} (1992) 6038--50.
1488: \bibitem{TM}
1489: C.J.\ Thompson and J.L.\ McBridge,
1490:    On Eigen's theory of the self-organization of matter and the evolution of 
1491:    biological macromolecules,
1492:    {\it Math.\ Biosci.\/} {\bf 21} (1974) 127--42.
1493: \bibitem{Wag}
1494: H.\ Wagner,
1495:    {\em Biologische Sequenzraummodelle und Statistische Mechanik},
1496:    PhD thesis, University of T\"ubingen, Dissertations Druck
1497:    (Darmstadt 1998).
1498: \bibitem{WBG}
1499: H.\ Wagner, E.\ Baake and T.\ Gerisch,
1500:    Ising Quantum chain and sequence evolution,
1501:    {\it J.\ Stat.\ Phys.\/} {\bf 92} (1998) 1017--52.
1502: \bibitem{W}
1503: T.~Wiehe,
1504:    Model dependency of error thresholds: the role of the fitness
1505:    functions and contrasts between the finite and infinite sites
1506:    models,
1507:    {\it Genet.\ Res.\ Camb.\/} {\bf 69} (1997) 127--36.
1508: \end{thebibliography}
1509: \end{document}
1510: 
1511: 
1512: 
1513: