0501:q-bio0501028/final.tex

1: \documentstyle[prl,aps,epsfig]{revtex}

2: \renewcommand{\huge}{\large}

3: \renewcommand{\LARGE}{\large}

4: \renewcommand{\Large}{\large}

5: \thispagestyle{empty}

6: \begin{document}

7: \def\be{\begin{equation}}

8: \def\ee{\end{equation}}

9: \def\bea{\begin{eqnarray}}

10: \def\eea{\end{eqnarray}}

11: \def\bml{\begin{mathletters}}

12: \def\eml{\end{mathletters}}

13: \def\l{\label}

14: \def\b{\bullet}

15: \def\eqn#1{(~\ref{eq:#1}~)}

16: \def\no{\nonumber}

17: \def\av#1{{\langle  #1 \rangle}}

18:

19: %=============================================================================

20: %=============================================================================

21: \title{Evolutionary trajectories in rugged fitness landscapes}

22:

23: \author{Kavita Jain and Joachim Krug}

24: \address{Institut

25: f\"ur Theoretische Physik, Universit\"at zu K\"oln, Z\"ulpicher Strasse 77,

26: 50937 K\"oln, Germany}

27: \maketitle

28: \widetext

29:

30: \begin{abstract}

31: We consider the evolutionary trajectories traced out by an infinite population

32: undergoing mutation-selection dynamics in static, uncorrelated random

33: fitness landscapes. Starting from the population that consists of a single

34: genotype, the most populated genotype \textit{jumps} from a

35: local fitness maximum to another and eventually reaches the global maximum.

36: We use a strong selection limit, which reduces the dynamics beyond the

37: first time step to the competition between independent mutant subpopulations,

38: to study the dynamics of this model and of a simpler one-dimensional model

39: which ignores the geometry of the sequence space.

40: We find that the fit genotypes that appear along a trajectory are a subset

41: of suitably defined fitness \textit{records}, and exploit several results

42: from the record theory for non-identically distributed random variables.

43: The genotypes that contribute to the trajectory are those records that are

44: not \textit{bypassed} by superior records arising further away from the

45: initial population. Several conjectures concerning the statistics of

46: bypassing are extracted from numerical simulations. In particular, for the

47: one-dimensional model, we propose a simple relation between the bypassing

48: probability and the dynamic exponent which describes the scaling of the

49: typical evolution time with genome size. The latter can

50: be determined exactly in terms of the extremal properties of the

51: fitness distribution.

52: \vskip0.5cm

53: \noindent PACS numbers: {87.10.+e, 87.23.Kg, 05.40.-a}

54: \end{abstract}

55:

56: %=============================================================================

57: %INTRODUCTION

58: %=============================================================================

59: \section{Introduction}

60: \l{intro}

61:

62:

63: The episodic nature of biological evolution has provided motivation for much

64: work on the modeling of evolutionary dynamics in the statistical physics

65: community \cite{Sneppen95,Sibani98}; see

66: \cite{Peliti97,Baake99,Drossel01} for review. Evolution displays

67: a \textit{punctuated} pattern, with epochs of no or slow change

68: interspersed with

69: bursts of (relatively) rapid activity, on various levels ranging from

70: the fossil record \cite{Eldredge89,Gould93} to experiments with microbial

71: populations \cite{Lenski94,Elena96,Burch99,Elena03}. Punctuated behavior is

72: also seen in simulations of \textit{in vitro} evolution

73: of RNA molecules \cite{Schuster02}, optimization algorithms

74: \cite{Rujan88,vanNimwegen97}, and artificial life \cite{Adami95}.

75: It has been recognized for a long time that one possible

76: scenario that is consistent with punctuated dynamics is the evolution

77: of a population in a static fitness landscape with many peaks.

78: In this picture, the two modes of evolution correspond to the extended

79: periods of residence of the population at a local fitness maximum

80: and the exploration of a new, higher lying peak, respectively,

81: which entails the rapid crossing of a valley of lower fitness

82: \cite{Newman85,Lande85}.

83: In essence, this is the phenomenon of quantum evolution described by

84: the paleontologist G.G. Simpson sixty

85: years ago \cite{Simpson44}, and more recently referred to in

86: macroevolutionary theory

87: as punctuated gradualism \cite{Eldredge89}.

88: If the population is large, so that the distribution of individuals over the

89: various phenotypes or

90: genotypes can be modeled by a continuous field, the transition between

91: fitness peaks described above displays analogies with physical processes

92: such as quantum tunneling \cite{Ebeling84}, variable range hopping

93: \cite{Zhang86} or noise-driven barrier crossing.

94:

95: Since this metastable behavior seems ubiquitous in nature, it may be

96: worthwhile to study a simple model where this can be

97: analysed in detail. A convenient mathematical framework

98: to address this issue is the quasispecies model which was originally

99: introduced to describe large populations of self-replicating

100: macromolecules \cite{Eigen71,Eigen89}. The quasispecies model is a

101: mutation-selection model whose steady state has been studied in great detail.

102: For various choices of fitness landscapes, it exhibits

103: the phenomenon of error threshold in which beyond a

104: critical mutation rate, the population delocalises over the whole sequence

105: space.

106:

107: In this paper, we address the question of punctuated \textit{dynamics} in the

108: quasispecies model. To avoid complications associated

109: with the error threshold phenomenon \cite{Baake99,Eigen89}, we work

110: in a strong selection limit inspired by the zero temperature limit

111: of the statistical mechanics of disordered systems \cite{Krug02}

112: (for a different kind of strong selection

113: limit used in population genetics see e.g. \cite{Woodcock96}).

114: In this limit, the location of the population in the space of genotypes

115: can be identified with the most populated genotype at all times,

116: and the evolutionary trajectories can be represented in a particularly

117: transparent manner \cite{Krug03}.

118: Since the population is located at a single genotype at any time, the

119: evolutionary trajectory changes in a stepwise manner. To generate an

120: evolutionary

121: trajectory, a localized population is placed at a randomly chosen point

122: in the space of all possible genotypes. Due to infinite population, in the

123: next time step, all genotypes get occupied with nonzero population.

124: As we describe later in detail, it turns out that

125: the fit genotypes typically receive small initial

126: population. Strong selection then reduces the problem to competition

127: between these fit subpopulations struggling to overcome the poor

128: initial conditions.

129:

130:

131: In the next section, we describe

132: the quasispecies model and derive the reduced representation

133: that arises in the strong selection limit. The bulk of the

134: paper is then devoted to the analysis of the strong selection dynamics, which

135: turns out to be rather closely related to the mathematical theory of records

136: \cite{Glick78,Nevzorov87,Arnold98,Nevzorov01}; a relation between

137: biological evolution and record statistics has been proposed

138: previously \cite{Sibani98,Kauffman87}, see also \cite{Krug04}.

139:

140: %=============================================================================

141: %Model

142: %=============================================================================

143: \section{Quasispecies dynamics in the strong selection limit}

144: \l{model}

145:

146: The quasispecies model is defined on the space of genotypes represented by

147: sequences $\sigma \equiv \{ \sigma_{1},...,\sigma_{N} \} $,

148: where each of the $N$ letters $\sigma_{i}$ is taken from an alphabet of size

149: $\ell \geq 2$. The number of individuals of genotype $\sigma$

150: at time $t$ is represented

151: by a real variable $Z(\sigma,t)$ that obeys the discrete time

152: evolution equation

153: \be

154: Z(\sigma,t+1)= \sum_{{\sigma}^{\prime}}

155: p(\sigma^{\prime} \rightarrow \sigma) \; W(\sigma^{\prime})

156: \; Z(\sigma^{\prime},t).

157: \l{full}

158: \ee

159: Here the fitness $W(\sigma)$ is defined \cite{Peliti97} as

160: the expected number of offspring produced by an

161: individual carrying sequence $\sigma$,

162: and $p(\sigma^{\prime} \rightarrow \sigma)$ is the probability that

163: genotype $\sigma$ is created as offspring of genotype $\sigma'$ due to

164: copying errors in the genome. A simple choice for the latter

165: corresponds to independent point mutations occurring with

166: probability $\mu$ per generation, so that

167: $p(\sigma^{\prime} \rightarrow \sigma)=

168: (\mu/(\ell-1))^{d(\sigma^{\prime},\sigma)}

169: (1-\mu)^{N-d(\sigma^{\prime},\sigma)}$,

170: where

171: \be

172: d(\sigma^{\prime},\sigma) = \sum_{i=1}^N (1 - \delta_{\sigma_i,\sigma_i'})

173: \;\;, \l{Hamming}

174: \ee

175: is the Hamming distance between sequences $\sigma^{\prime}$ and $\sigma$.

176:

177: A constraint of fixed population size could be enforced by dividing

178: the right hand side of (\ref{full}) by the population averaged fitness

179: $\langle W \rangle$. This introduces an (inessential)

180: nonlinearity into the problem \cite{Peliti97}, which is why we prefer to work

181: with the

182: unnormalized population variables $Z(\sigma,t)$.

183: It is clear that within the quasispecies framework, which requires an

184: infinite population from the outset, the actual population size cannot

185: play any role.

186:

187: To derive the strong selection limit \cite{Krug02}, we

188: write $Z(\sigma,t)=e^{\kappa E(\sigma,t)}$,

189: $W(\sigma)=e^{\kappa F(\sigma)}$

190: and $\mu=e^{-\kappa}$ where $\kappa$ is the inverse selective

191: temperature \cite{Peliti97,Franz97}

192: and $E(\sigma,t)$ and $F(\sigma)$ are logarithmic population and

193: fitness variables, respectively. Throughout this paper we take

194: the fitness landscape to be completely uncorrelated, which implies that

195: the fitnesses $F(\sigma)$ are independent, identically

196: distributed (i.i.d.) quenched random variables chosen

197: from a distribution $p(F)$

198: with support on interval $[F_{\mathrm{min}},F_{\mathrm{max}}]$.

199: The strong selection limit corresponds to $\kappa \rightarrow \infty$,

200: which yields

201: \be

202: E(\sigma, t+1)= \mbox{max}_{\sigma^\prime} \left[ E(\sigma^\prime,t)+

203: F(\sigma^\prime)-d(\sigma,\sigma^\prime) \right] \l{fullss} \;\;.

204: \ee

205: Starting with an initial condition

206: $Z(\sigma,0)=\delta_{\sigma,\sigma^{(0)}}$ in which only a

207: single, randomly chosen

208: sequence $\sigma^{(0)}$ has nonzero population, at the next time step each

209: sequence gets seeded with the logarithmic population

210: \be

211: \l{sslimit2}

212: E(\sigma,1) = F(\sigma^{(0)}) - d(\sigma,\sigma^{(0)}).

213: \ee

214: In terms of the original variables $Z(\sigma,t)$, this

215: corresponds to a population distribution that decays exponentially

216: with increasing Hamming distance from $\sigma^{(0)}$.

217: Remarkably, it turns out that for the subsequent time evolution the

218: mutations are unimportant for genotypes with

219: high fitness. Since we are primarily interested in such genotypes, the

220: dynamics of the model can be approximated by allowing each genotype

221: to reproduce with its intrinsic rate $F(\sigma)$ for $t > 1$.

222: In terms of the logarithmic variables, this leads to

223: \be

224: E(\sigma,t) = E(\sigma,1)+(t-1) F(\sigma). \l{sslimit1}

225: \ee

226: Thus, (\ref{fullss}) reduces to a problem of non-interacting sequences whose

227: population is growing linearly in time.

228: The approximation of ignoring mutations after the seeding

229: stage was tested numerically \cite{Krug03} and found to be in

230: very good agreement with the full model described by (\ref{fullss}).

231:

232: A further simplification can be made by noting that

233: due to (\ref{sslimit2}), the population $E(\sigma,1)$

234: is same for all the sequences that lie

235: on a shell of constant Hamming distance $d(\sigma^{(0)},\sigma)$ away from the

236: sequence $\sigma^{(0)}$. Since we will be interested in the sequence with

237: the largest

238: population at any instant, only the sequence with the largest

239: fitness within a shell needs to be considered. Labeling a shell by its

240: Hamming distance $k=0,1,...,N$ from the  sequence $\sigma^{(0)}$, we arrive

241: at the {\it shell model} \cite{Krug03}

242: in which the population $E(k,t)$ of the fittest sequence

243: in the shell $k$ with a total number of $\alpha_{k}$ sequences obeys

244: \be

245: \l{lineq}

246: E(k,t)=-k + F(k) t \l{shell}.

247: \ee

248: Here $F(k)$ is the maximum of $\alpha_k$

249: i.i.d. random variables drawn from the fitness

250: distribution $p(F)$; hence the $F(k)$ are independent

251: but non-identically distributed random variables.

252: To arrive at the simple form (\ref{shell}), we have redefined the

253: population as $E(k,t)+F(k)-F(0)$.

254: The number of sequences in shell $k$ is given by the expression

255: \be

256: \l{alphak}

257: \alpha_{k}= {N \choose k} \; (\ell-1)^{k},

258: \ee

259: as can be seen by noting that

260: there are ${N \choose k}$ ways of choosing $k$ letters at which a

261: sequence $\sigma$ differs from $\sigma^{(0)}$, and each

262: of these $k$ letters can take $\ell-1$ values.

263: For large $N$, the majority of sequences is contained in a belt of width

264: $\sim \sqrt{N}$

265: around the distance $k_{\mathrm{max}} = N (\ell -1)/\ell$

266: where (\ref{alphak}) is peaked.

267:

268: We will also consider an {\it i.i.d. model} for which the population $E(k,t)$

269: evolves according to (\ref{shell}) but the fitnesses $F(k)$ are i.i.d.

270: random variables. This choice corresponds to a one-dimensional sequence space

271: and we will see that our results are sensitive to the geometry of the

272: sequence space. The i.i.d. model is closely related to the

273: zero temperature limit of the problem of

274: directed polymers in the presence of columnar defects studied in

275: \cite{Krug93} and a version of the parabolic

276: Anderson model \cite{Gaertner04}.

277:

278: Figure \ref{lines} illustrates the geometric picture of the evolutionary

279: process (\ref{shell})

280: that emerges after these simplifications and approximations.

281: At a given time $t$, the most populated genotype

282: $k^\ast$ for which $E(k^\ast,t) = \max_k \{ E(k,t) \}$ leads until it is

283: overtaken by a sequence $k'^\ast$ and so on.

284: While the last leader is obviously located at the global fitness maximum, the

285: identity of the previous leaders is non-deterministic.

286: The inset of

287: Fig.~\ref{lines} shows the punctuated evolution of the leading

288: sequence $k^{\ast}$.

289: Our main interest in the present work is in the statistical properties

290: of these leadership changes or evolutionary jumps.

291: Initially, the sequence $\sigma^{(0)}$ with population $E(0,t)$ leads.

292: A shell $k'$ can overtake the currently leading shell

293: $k < k'$ at the crossing time $T({k,k^{\prime}})$ given by

294: \be

295: T(k',k)= \frac{k'- k} {F(k')-F(k)} \;\;, \l{Tkkprime}

296: \ee

297: which is positive provided $F(k') > F(k)$. In

298: Fig.~\ref{lines}, only the shells $k=1, 2, 6, 8, 15$

299: can overtake $k=0$ (and only $k=2, 6, 8, 15$ can overtake $k=1$, ...)

300: since $F(15) > F(8) > F(6) > F(2) > F(1) > F(0)$. Such a set of fitness in

301: which each value is progressively

302: larger than all the previous ones defines a sequence of {\it{records}}.

303: However, in order

304: to appear in the evolutionary trajectory, it is not sufficient to be a record.

305: As shown in Fig.~\ref{lines}, the shell $k=2$ is

306: bypassed by shell $k=8$ since

307: $T(8,1)=\mbox{min}_{k > 1} T(k,1)$. In general, if the current

308: leader is in shell $k$, then the next leader is in the shell $k'$ for which

309: the crossing time (\ref{Tkkprime}) is \textit{minimized}

310: \cite{Rujan88}. Thus, in principle, the properties of leadership

311: changes can be formulated in terms of the extremal statistics of the

312: matrix of crossing times. However, as will be explained in more detail in

313: Sect.\ref{mft}, this minimization problem is too cumbersome

314: to handle analytically and is further complicated by the presence of

315: strong correlations among the crossing times.

316:

317: The nontrivial dynamics of the change in leadership described above is

318: due to the competition between the initial population at a sequence and its

319: fitness. As discussed above, a potential leader must be a record and hence

320: must occur at a large Hamming distance away from the sequence $\sigma^{(0)}$

321: but the initial population at such a sequence is small

322: due to (\ref{sslimit2}). Thus, the disadvantage due to poor initial condition

323: may be overcome by a better fitness. Further, since

324: the fitness of the leader is correlated to its Hamming distance from the

325: sequence $\sigma^{(0)}$, the fitness $F(k^*)$ also shows punctuated evolution.

326:

327: The rest of the paper is organised as follows.

328: In Sections~\ref{record} and \ref{jump}, we define records and jumps precisely

329: and study some of their statistical properties, both for the shell model

330: and for the i.i.d. model. In Section~\ref{approach}, we

331: describe the dynamics of the approach to the global maximum of the fitness

332: landscape. A conjecture

333: concerning the probability of bypassing in the i.i.d. model,

334: based on the results obtained in Sections~\ref{jump} and \ref{approach}

335: along with numerical evidence, is presented in Section~\ref{beta-z}.

336: In Section~\ref{mft}, we introduce and discuss a further simplification of the

337: problem described by (\ref{shell}), which highlights the strong

338: interdependence of the crossing times.

339: Finally, we summarise our results and conclude

340: with a brief discussion of the possible biological relevance of this work

341: in Section~\ref{conclude}.

342:

343: %=============================================================================

344: %RECORDS

345: %=============================================================================

346: \section{Record statistics in sequence space}

347: \l{record}

348:

349: In a sequence $\{ X_{k} \}$ of random variables, an upper (or lower)

350: record is said to occur at $m$ if $X_{m} > X_{k}$ (or $X_{m} < X_{k}$)

351: for all $k < m$ \cite{Glick78,Nevzorov87,Arnold98,Nevzorov01}. Since only the upper

352: records are pertinent to our study, we shall henceforth refer to them as

353: records. In the following subsections, we study some characteristics of

354: these records; in particular, we find the mean number of records and

355: the typical

356: spacing between them. The record statistics of i.i.d. variables is

357: well studied

358: and we briefly review some of the known results as they are useful for

359: later discussion. We study the record statistics in the shell model, for which

360: the random variables are not identically distributed, in some detail.

361: An appealing general feature of record statistics is that the results

362: are independent of the underlying probability distribution $p(F)$,

363: which therefore does not need to be specified in this section.

364:

365: %=============================================================================

366: \subsection{Number of records}

367:

368: In this subsection, we calculate the average number

369: ${\cal{R}}_{\mathrm{shell}}$ of

370: records in the shell model.

371: For i.i.d. random variables, it is well known that the average number of records

372: among $N$ variables is ${\cal{R}}_{\mathrm{iid}} \approx \ln N$ for large $N$. To see

373: this, note that the probability

374: $\tilde{P}_{\mathrm{iid}}(k)$ that the $k$'th random variable is a record is equal to the

375: probability

376: that it is the largest among the first $k$ random variables.

377: Since the location of the global

378: maximum is uniformly distributed for i.i.d. variables, it follows that

379: $\tilde{P}_{\mathrm{iid}}(k)=1/k$ for any choice of $p(F)$.

380: Summing $\tilde{P}_{\mathrm{iid}}(k)$ up to $k=N$ yields the average number

381: ${\cal{R}}_{\mathrm{iid}}$ of records to be $\ln N$. In fact, it

382: is known that the probability distribution of the number of records is

383: a Poisson distribution with mean $\ln N$ \cite{Sibani98}.

384:

385: The record statistics for non-i.i.d. variables is much less studied.

386: A class of models in which records are obtained from a sequence $\{ X_{k} \}$

387: where each $X_{k}$ itself is the maximum of a set of $\alpha_{k}$

388: i.i.d. random variables was considered by Nevzorov \cite{Nevzorov84}.

389: Such models have been used, for instance, in an (unsuccessful) attempt to

390: account for the frequent occurrence of Olympic records due to an

391: increasing world

392: population \cite{Yang75}. The i.i.d. model is a special case

393: corresponding to $\alpha_{k}=1$ for all $k$ and

394: the shell model is obtained by using $\alpha_k$ given by (\ref{alphak}).

395:

396:

397: To analyze the Nevzorov model,

398: we define a binary random variable $Y_{k}$ which takes value $1$

399: if a record occurs in the $k$'th set and $0$ otherwise. Since the distribution

400: $p_{k}(F)$ of the maximum of $\alpha_{k}$ i.i.d. random variables is given by \cite{David70}

401: \be

402: \l{pk}

403: p_{k}(F)= \alpha_{k} \; p(F) \; \left( \int^{F}_{F_{\mathrm{min}}} p(x) dx \right)^{\alpha_{k}-1},

404: \ee

405: we have

406: \be

407: \mbox{Prob}(Y_{k}=1)=\int_{F_{\mathrm{min}}}^{F_{\mathrm{max}}} dF \;

408: \prod_{i=0}^{k-1} q_{i}(F) \;\; p_{k}(F)=

409: \frac{\alpha_{k}}{(\alpha_{0}+...+\alpha_{k})} \;\;,\l{PYk1}

410: \ee

411: where $q_{k}(F)$ is the cumulative distribution corresponding to $p_{k}(F)$.

412: The above equation expresses the fact that the $k$'th record value is a

413: global maximum amongst $\sum_{j=0}^{k} \alpha_{j}$ variables

414: and there are $\alpha_{k}$ ways in which it can occur in the $k$'th set.

415: Further, the joint probability $\mbox{Prob}(Y_{k_{1}}=1,Y_{k_{2}}=1)$ for

416: $k_{1} < k_{2}$ is given by

417: \be

418: \mbox{Prob}(Y_{k_{1}}=1,Y_{k_{2}}=1)= \int_{F_{\mathrm{min}}}^{F_{\mathrm{max}}} dF \;

419: \prod_{i=0}^{k_{1}-1} q_{i}(F) \;\;  p_{k_{1}}(F)

420: \int_{F}^{F_{\mathrm{max}}} dG \;\prod_{j=k_{1}+1}^{k_{2}-1} q_{j}(G) \;\;

421: p_{k_{2}}(G)=

422: \frac{\alpha_{k_{1}} \; \alpha_{k_{2}}}{(\alpha_{0}+...+\alpha_{k_{1}}) \;

423: (\alpha_{0}+...+\alpha_{k_{2}})} \;\;. \l{PYk12}

424: \ee

425: In a similar manner, it can be shown that

426: $\mbox{Prob}(Y_{k_{1}}=1,...,Y_{k_{m}}=1) =

427: \prod_{j=1}^{m} \mbox{Prob}(Y_{k_{j}}=1)$ for any $m \ge 2$.

428: Thus, the $Y_{k}$'s are

429: independent, non-identically distributed variables

430: \cite{Nevzorov87,Arnold98,Nevzorov01}.

431:

432: For the shell model, due to (\ref{alphak}) and (\ref{PYk1}), the probability

433: $\tilde{P}_{\mathrm{shell}}(k,N)$ that a record occurs in the $k$'th shell is given by

434: \be

435: \tilde{P}_{\mathrm{shell}}(k,N)=\mbox{Prob}(Y_{k}=1) \approx 1- \left( \frac{1}{\ell-1} \right)

436: \frac{a}{1-a} \;\;\;\;,\;\;\;\; a= \frac{k}{N} < \frac{\ell-1}{\ell} =

437: \frac{k_{\mathrm{max}}}{N} \;\;,

438: \l{Ptildeshell}

439: \ee

440: where we have used that

441: ${N \choose k-m} / {N \choose k} \approx [a/(1-a)]^{m}$ for $k, N \gg 1$ with

442: $k/N$ fixed. Since it is easier to break records in the beginning, the

443: probability to find a record is near unity for

444: $k \ll N$. However, it vanishes beyond $k_{\mathrm{max}}$ because

445: the global maximum typically occurs in the shell $k_{\mathrm{max}}$.

446: The average

447: number ${\cal{R}}_{\mathrm{shell}}$ of records can be obtained by simply

448: integrating $\tilde{P}_{\mathrm{shell}}(k,N)$ over $k$ and we find

449: \be

450: {\cal{R}}_{\mathrm{shell}} \approx \frac{(\ell-\ln\ell-1)}{\ell-1} \; N \;\;.

451: \ee

452: Thus, ${\cal{R}}_{\mathrm{shell}}$

453: increases with $\ell$ and as $\ell \rightarrow \infty$,

454: ${\cal{R}}_{\mathrm{shell}} \rightarrow N$ because the population in each shell

455: becomes infinite.

456: Since the $Y_{k}$'s are independent random variables with finite mean

457: and variance, the number of records satisfies the central limit theorem

458: and becomes Gaussian for large $N$. Specifically, for large $N$,

459: the variance of the number of records is

460: \be

461: \l{variance} {\cal{V}}_{\mathrm{shell}} = \sum_{k=0}^N

462: \mbox{Prob}(Y_k = 1)[1 - \mbox{Prob}(Y_k = 1)]

463: \approx

464: \frac{N}{\ell - 1}

465: \left( \frac{\ell + 1}{\ell - 1} \ln \ell - 2 \right) \;\;,

466: \ee

467: which is maximal for $\ell = 5$ and vanishes for large $\ell$.

468: The ratio ${\cal{V}}_{\mathrm{shell}}/{\cal{R}}_{\mathrm{shell}}$ is always

469: considerably smaller

470: than unity, which implies that the distribution is much sharper than the

471: log-Poisson distribution obtained in the i.i.d. case.

472:

473: %=============================================================================

474: \subsection{Inter-record spacings}

475: \l{recordspace}

476:

477: For the discussion of the spacings between records, it is convenient

478: to label the records ``backwards'' in time, with $r_1$ denoting

479: the position of the last record (i.e., the global maximum), $r_2 < r_1$

480: the penultimate record, and so on.

481: In this way the pathologies associated with the fact that

482: the expected waiting time for the next record to occur is

483: infinite in the i.i.d. case \cite{Glick78} can be avoided.

484: The probability $\tilde{P}(r_{j})$ that the $j$'th record

485: occurs at location $r_{j}$

486: can be found for the Nevzorov model in a manner analogous to

487: (\ref{PYk12}). We obtain

488: \be

489: \tilde{P}(r_{j})=

490: \sum_{r_{1},..., r_{j-1}} \;

491: \prod_{k=0,...,j-1} \frac{\alpha_{r_{k+1}}}

492: {\alpha_{0}+...+\alpha_{{r_{k}-1}}}  \;\;,\l{recloc}

493: \ee

494: with $N \geq r_{1} >...> r_{j-1} > r_j$ and $r_{0}=N+1$. The factors on the

495: right hand side of the above equation simply reflect the fact that

496: the record at location $r_{k+1}$ remains the maximum

497: until the next record occurs at $r_{k} > r_{k+1}$.

498:

499: For the i.i.d. model, using $\alpha_{k}=1$ for all $k$ in (\ref{recloc}),

500: the average location $\av{r_{j}}=\sum r_{j} \tilde{P}(r_{j})$ can be

501: calculated. One finds that the average inter-record distance

502: $\tilde{\Delta}_{\mathrm{iid}}(j)=\av{r_{j}}-\av{r_{j+1}}$

503: between the $j$'th and $(j+1)$'th record behaves as \cite{Arnold98,Nevzorov01}

504: \be

505: \tilde{\Delta}_{\mathrm{iid}}(j) \approx \frac{N}{2} \left( \frac{1}{2} \right)^{j}

506: \;\;,\;\;j=1,2,... \;\;. \l{interreciid}

507: \ee

508: A simple argument, useful for later discussion, can also be employed

509: to obtain the above equation.

510: The record labeled by $j=1$, being the global maximum, is equally

511: likely to occur

512: anywhere between $1$ and $N$. Thus, on average, it is located at $N/2$.

513: Similarly, the record labeled by $j=2$ is a global maximum in the

514: range $[1,r_1)$ with

515: uniformly distributed location, which gives the average location

516: $\av{r_{2}}=(1/2) \av{r_1} = N/4$. Repeating this argument, we obtain the result in

517: (\ref{interreciid}).

518:

519: For the shell model, since the most likely position of the

520: global maximum is in the shell with the largest number $\alpha_k$ of

521: sequences, we have $\av{r_{1}} = k_{\mathrm{max}}$.

522: For sake of simplicity, we consider binary sequences ($\ell=2$)

523: in the following but the scaling behavior obtained below holds

524: for $\ell > 2$ as well. We find

525: that for $j \ge 2$, the average location $\av{r_{j}}$ of the

526: $j$'th record is given by

527: \be

528: \av{r_{j}}=\av{r_{1}} -\frac{1}{\pi} \sqrt{\frac{N}{2}}

529: \left( \frac{2}{\sqrt{\pi}} \right)^{j-2}

530: \int_{-\infty}^{\infty} dx_{j-1} \;

531: \frac{e^{-x_{j-1}^{2}}}{\mbox{erfc}(x_{j-1})} \; \int_{x_{j-1}}^{\infty}

532: dx_{j-2} \; \frac{e^{-x_{j-2}^{2}}}{\mbox{erfc}(x_{j-2})}  \; ... \;

533: \int_{x_{2}}^{\infty}

534: dx_{1} \; \frac{e^{-2 x_{1}^{2}}}{\mbox{erfc}(x_{1})}  \;\;,\;\;j \ge 2 \;\;,

535: \l{rj}

536: \ee

537: where we have used Eqs.(\ref{cnr})-(\ref{avgcnr}) and

538: performed a Gaussian integral. The average location $\av{r_{2}}$

539: of the second record is given by

540: $\av{r_{2}}=\av{r_{1}}- 2.0064 \sqrt{N}/ \pi \sqrt{2}$. Thus, the second

541: record (and in fact, $j$'th record for $j$ of order unity) can be found

542: within ${\cal{O}}(\sqrt{N})$ distance of the global maximum since

543: $\alpha_{k}$ has width $\sim \sqrt{N}$ about $k_{\mathrm{max}}$.

544: For $j > 2$,

545: after repeated integration by parts, (\ref{rj}) can be rewritten as

546: \be

547: \av{r_{j}}=\av{r_{1}}-\frac{1}{\pi} \sqrt{\frac{N}{2}} \sum_{m=1}^{j-2}

548: \frac{(\ln 2)^{j-2-m}}{(j-2-m)!} \;\; G(m) \;\;,\;\;j > 2 \;\;,\l{sspacing}

549: \ee

550: where

551: \be

552: G(m)= (-1)^{m} \int_{-\infty}^{\infty} dx \;

553: \frac{e^{-2 x^{2}}}{\mbox{erfc}(x)} \;

554: \frac{\left( \ln {\mbox{erfc(x)}} \right)^{m}-

555: \left( \ln {2} \right)^{m}}{m!} \;\;.

556: \ee

557: As outlined in Appendix {\ref{appendix1}, the integral $G(m)$ can be

558: estimated by the saddle point method and we find

559: $G(m) \approx \sqrt{ \pi m/2}$ for large $m$. Since the leading contribution

560: to the sum in (\ref{sspacing}) comes from the $m=j-2$ term, we have

561: \be

562: \tilde{\Delta}_{{\mathrm{shell}}}(j) \approx

563: \sqrt {\frac{N}{4 \pi j}} \;\;,\;\; j \gg 1 \;\;. \l{interrecshell}

564: \ee

565: Thus, while the inter-record spacing decays exponentially with $j$ for

566: the i.i.d. model, it falls as a power law for the shell model.

567: The spacing between the first few records [$j = {\cal{O}}(1)$]

568: is of order $\sqrt{N}$, while for the bulk of the records with $j =

569: {\cal{O}}(N)$ the spacing is of order unity; this is consistent with

570: the vanishing of the record occurrence probability (\ref{Ptildeshell})

571: near $k=k_{\mathrm{max}}$.

572:

573:

574: %=============================================================================

575: %JUMPS

576: %=============================================================================

577: \section{Jump statistics}

578: \l{jump}

579:

580: As we described in Sect.\ref{model}, in our model, evolutionary jumps

581: are a subset of records and if a jump occurs at

582: $k^{\prime}$, the next jump is said to occur

583: at $k > k^{\prime}$ if (i) $F(k)$ is a record (ii) the overtaking time

584: $T(k, k^{\prime})= \mbox{min}_{j \geq k^{\prime}}

585: \{ T(j, k^{\prime}) \}$. By convention, the first jump and the first record

586: occurs at $k=0$. Due to the second condition, some of the records can get

587: bypassed and fail to appear in the set of jumps.

588: In this section, we find the mean number of jumps and the

589: inter-jump spacing for both the i.i.d and the shell model.

590: In contrast to the properties of records, the statistics of jumps

591: depends explicitly on the fitness distribution $p(F)$ and we will consider

592: distributions for which the tail behavior corresponds to the

593: three universality classes of

594: standard extreme value theory \cite{Sornette00} and which also

595: appear in a very similar form in the theory of records

596: \cite{Nevzorov87,Arnold98,Nevzorov01}.

597:

598: %=============================================================================

599: \subsection{Mean number of jumps}

600: \l{meanjump}

601:

602: We begin by discussing numerical results for the average number of

603: jumps in the i.i.d. model.

604: It was found in \cite{Krug03} that the average number ${\cal{J}}_{\mathrm{iid}}$

605: of jumps grows as $\beta \ln N$ where the

606: prefactor $\beta < 1$, and was conjectured to be

607: \be

608: \beta \approx \cases { 1/2   & {$\;,\;\;p(F) \sim e^{-F}$} \cr

609: (\delta-1)/(2 \delta -1) & {$\;,\;\;p(F) \sim F^{-1-\delta}, \;\; \delta \geq 1$}  \cr

610: (2 + \nu)/(3 + 2 \nu)  & {$\;,\;\;p(F) \sim (F_{\mathrm{max}}-F)^{\nu}, \;\; \nu > -1$}

611: } \;\;. \l{beta1d}

612: \ee

613: Figure~\ref{iidjump} shows ${\cal{J}}_{\mathrm{iid}}$ increasing linearly with

614: $\ln N$ and slope $\beta$ for

615: some distributions in accordance with (\ref{beta1d}).

616: Thus, in the i.i.d. model

617: the jumps can be viewed as ``diluted'' records, in the sense that the mean

618: number of

619: records and jumps differ only up to a prefactor and

620: the probability $P_{\mathrm{iid}}(k,N)$ that a jump occurs at $k$

621: is given by $\beta /k$. In this picture, the probability for a given

622: record to be bypassed is simply $1-\beta$. However, bypassing is not

623: completely random, as the variance of the number of jumps is found

624: consistently to be smaller than the mean. This implies a certain amount

625: of ``anti-bunching'' among the jumps, which can also be detected

626: by the direct measurement of correlation

627: $C_{\mathrm{iid}}(k_{1},k_{2})=P_{\mathrm{iid}}(k_{1},k_{2})-

628: P_{\mathrm{iid}}(k_{1}) P_{\mathrm{iid}}(k_{2})$

629: where $P_{\mathrm{iid}}(k_{1},k_{2})$ is the joint distribution of

630: having a jump at $k_{1}$ and $k_{2}$, as shown in the inset

631: of Fig.~\ref{rspace}.

632: Some further discussion of the conjecture (\ref{beta1d}) will be provided

633: in Sect.\ref{beta-z}.

634:

635: Estimates for the mean number of jumps in the shell model were also

636: given in \cite{Krug03}, but the range of sequence lengths was too limited

637: to allow for a definite statement. Here we present the results of our

638: simulations for large values of $N$ obtained using the approximation

639: described below. Since the shell fitness $F(k)$ is the

640: maximum of $\alpha_k$ i.i.d. random variables drawn from the distribution

641: $p(F)$, it can be obtained from a uniform random variable

642: $u$ using the relation

643: \be

644: \l{uniform}

645: \int_{F_{\mathrm{min}}}^{F(k)} p(x) dx =u^{1/\alpha_{k}}.

646: \ee

647: However, the binomial coefficient ${N \choose k}$ increases

648: exponentially with $N$ for

649: large $k$ and the distribution of $u^{1/\alpha_{k}}$ approaches a

650: delta-function centred at unity for $k \gg 1$ making it difficult to determine

651: the distribution $p_{k}(F)$ of $F(k)$ accurately when $N$ is large.

652: For the exponential fitness distribution, the relation (\ref{uniform})

653: becomes

654: \be

655: \l{Fexp}

656: F(k)=-\ln (1- e^{\ln u/\alpha_{k}}) \simeq - \ln [\ln(1/u)]+\ln(\alpha_{k}).

657: \ee

658: Since the last expression only involves the logarithms of binomial

659: coefficients, $F(k)$ can be easily generated up to large values of $N$.

660: While a similar approximation can be employed for other

661: distributions with unbounded tails, we have not been able

662: to obtain reliable results for bounded distributions.

663:

664: In Fig.~\ref{sjump}, we show simulation results for the

665: probability $P_{\mathrm{shell}}(k,N)$ that a jump occurs in

666: the $k$'th shell, for a binary alphabet

667: ($\ell=2$) and two different values of $N$. The data collapses

668: onto the scaling form

669: \be

670: P_{\mathrm{shell}}(k,N) \approx  N^{-1/2} f(k/N) \;\;\;,\;\;

671: x < \frac{\ell-1}{\ell} = \frac{k_{\mathrm{max}}}{N} \;\;,

672: \l{scaling_jumps}

673: \ee

674: where the scaling function behaves as $f(x) \sim x^{-1/2}$ for small $x$.

675: This has the interesting consequence that

676: $P_{\mathrm{shell}} \sim 1/\sqrt{k}$ is independent

677: of $N$, and hence the number of jumps grows as $\sqrt{k}$

678: for $k \ll k_{\mathrm{max}}$.

679: The scaling function appears to be independent of the alphabet size $\ell$,

680: which therefore only changes the cutoff at $k = k_{\mathrm{max}}$ \cite{Krug04}.

681: Thus, the average number ${\cal{J}}_{\mathrm{shell}}$ of jumps obtained by

682: integrating $P_{\mathrm{shell}}(k,N)$ over $k$ grows with $N$ as

683: \be

684: {\cal{J}}_{\mathrm{shell}} \sim \sqrt{\frac{(\ell -1) N}{\ell}} \;\;\;.

685: \ee

686: Our simulations indicate that the above dependence of

687: ${\cal{J}}_{\mathrm{shell}}$ on $N$ is also true for

688: normal-distributed fitness and we expect it to hold

689: for all distributions decaying more rapidly than

690: any power law \cite{Krug04}. Further, the distribution of the number of jumps

691: is a Gaussian (as in the case of records) with both mean and variance

692: scaling as $\sqrt{N}$.

693:

694: For the power law case, we find that ${\cal{J}}_{\mathrm{shell}} \to 1$ for

695: large $N$.

696: As we shall see in Section~\ref{approach}, the globally fittest sequence

697: takes over the leadership in a time of order unity in this case,

698: which explains the above behavior of ${\cal{J}}_{\mathrm{shell}}$.

699: In summary, unlike the i.i.d. model, most of the records are

700: bypassed in the shell model both for exponential and power law distributions.

701:

702:

703: %=============================================================================

704: \subsection{Inter-jump spacings}

705: \l{inter-jump}

706:

707: We now turn to a discussion of the inter-jump spacings

708: $\Delta(j)$

709: defined in analogy to the inter-record spacings.

710: We denote by $s_j$ the position of the $j$'th jump, with

711: $j=1$ referring to the last jump, which is also the last

712: record (the global maximum), and define the inter-jump spacing

713: as $\Delta(j) = \av{s_{j}} - \av{s_{j+1}}$ where $\av{s_{j}}$ is the average

714: location of the $j$'th jump.

715: For the i.i.d. model, an approximate calculation of

716: $\Delta_{\mathrm{iid}}(j)$ can be carried out by assuming jumps to be

717: randomly diluted records.

718: The position $s_j$ of the $j$'th jump equals the position

719: $r_k$ of the $k$'th record, where $k \geq j$ because of the possibility of

720: bypassing. If the record $k+1$ is not bypassed, then $s_{j+1} = r_{k+1}$

721: and $s_{j+1} = (1/2) s_j$ on average due to the argument

722: given after (\ref{interreciid}) for the inter-record spacings.

723: In the diluted record picture,

724: this is true with probability $\beta$. With probability $\beta (1 - \beta)$,

725: record $k+1$ is bypassed and $s_{j+1} = r_{k+2}$ so that

726: $s_{j+1} = (1/4) s_j$ on the average. Similarly, with

727: probability $\beta (1 - \beta)^l$, $l$ records are bypassed

728: and the ratio $s_{j+1}/s_j = (1/2)^{l+1}$ on average. Summing over

729: all possibilities, we find that $\av{s_{j+1}} = b \av{s_j}$ with

730: \be

731: \l{bgeom}

732: b = \sum_{l = 0}^\infty 2^{-(l+1)} \beta (1 - \beta)^l = \frac{\beta}{1 + \beta}.

733: \ee

734: Taking into account that the sum over all inter-jump spacings adds up to

735: $\av{s_1} = \av{r_1} = N/2$, we obtain

736: \be

737: \Delta_{\mathrm{iid}}(j) \approx \frac{N}{2} (1- b) b^{j-1}

738: \;\;,\;\;j=1,2,... \;\;.\l{interjumpiid}

739: \ee

740: The numerical data shown in Fig.~\ref{rspace} supports the general

741: form of this expression, but with the coefficient

742: $b \approx \beta/2$ rather than $\beta/(1+\beta)$. This is another

743: indication of correlations that are not accounted for in the random

744: dilution picture.

745:

746: For the shell model, as shown in Fig.~\ref{sjump}, the data

747: for $\Delta_{\mathrm{shell}}(j)$ for various $N$ collapses onto a

748: monotonically decreasing curve if we assume $\Delta_{\mathrm{shell}}(j)$

749: to be of the scaling form

750: \be

751: \Delta_{\mathrm{shell}}(j) \approx

752: \sqrt{N} \; h(j/\sqrt{N}) \;\;. \l{interjumpshell}

753: \ee

754: The scaling with $\sqrt{N}$ follows naturally from the scaling form

755: (\ref{scaling_jumps}) for the jump occurrence probability. The latter

756: implies that the density of jumps on the $k$-axis is of order $1/\sqrt{N}$

757: for finite $k/N$, hence the spacing is of order $\sqrt{N}$, and the

758: argument of the scaling function is $j/\sqrt{N}$ because the total number

759: of jumps is also of order $\sqrt{N}$. However, the tail of the scaling

760: function $h$ does not obey the scaling form (\ref{interjumpshell}) and

761: approaches zero with increasing $N$. Thus, while for finite $k/N$, the jumps

762: are roughly equally spaced with spacing $\sqrt{N}$, the spacing is of

763: ${\cal{O}} (N^{s})$ with $s < 1/2$ for $k/N \rightarrow 0$.

764:

765: %=============================================================================

766: %APPROACH

767: %=============================================================================

768:

769: \section{Approach to the global fitness maximum}

770: \l{approach}

771:

772: So far, we have discussed the statistics on the $k^{\ast}$ axis of the

773: inset of Fig.~\ref{lines} and now we focus on the temporal statistics.

774: As explained in Section~\ref{model}, a fit sequence $k^{\ast}$ leads till

775: it is

776: overtaken by an even fitter one and eventually the globally fittest sequence

777: emerges as the leader at typical time $T^{\ast}$.

778: In this section, we find this typical time $T^{\ast}$ required

779: for the population to reach the global maximum of the fitness landscape

780: and the distribution of the evolution times to jump from one local maximum to

781: another.

782:

783: %=============================================================================

784: \subsection{Dynamic scaling}

785:

786: The location $k^{\ast}(t)$ of the most populated sequence at time $t$ for

787: which the logarithmic population is

788: $E({k^{\ast}},t)=\mbox{max}_{k} \{ E(k,t) \}$ increases with time till

789: the global maximum is reached. In Appendix~\ref{appendix2}, the

790: distribution $P_t(k^{\ast})$ is explicitly calculated for the

791: i.i.d. model in the limit of infinite genome size ($N \to \infty$).

792: This distribution is usually of the scaling form

793: \be

794: \l{Ptscale}

795: P_t(k^\ast) = t^{-1/z} \Phi(k^\ast/t^{1/z}) \;, \l{Ptkast}

796: \ee

797: where the scaling function $\Phi$ depends on the underlying fitness

798: distribution

799: and the dynamic exponent $z$ is given by

800: \be

801: z= \cases { 1   & {$\;,\;\;p(F) \sim e^{-F}$} \cr

802:   (2 + \nu)/(1 + \nu)  & {$\;,\;\;p(F) \sim (F_{\mathrm{max}}-F)^{\nu}, \;\; \nu > -1$} \cr

803:      (\delta-1)/\delta & {$\;,\;\;p(F) \sim F^{-1-\delta}, \;\; \delta \geq 1$}

804: } \;\;, \l{1dz}

805: \ee

806: for the three classes of fitness distributions introduced in Sect.\ref{jump}.

807: The corresponding behavior $\overline{k^{\ast}(t)} \sim t^{1/z}$

808: of the mean location has been derived previously using Flory-type arguments

809: \cite{Zhang86,Krug93} and can also be seen using (\ref{Ptkast}). As we have

810: already discussed, the global maximum is reached

811: when $\overline{k^{\ast}(t)} \approx N/2$, which defines a total evolution

812: time $T^\ast \sim N^z$. One thus expects a scaling form

813: \be

814: \overline{k^{\ast} (t,N)} \approx t^{1/z} \varphi (t/T^{\ast}) \;\;,

815: \ee

816: where the scaling function $\varphi(x)$ is a constant for $x \ll 1$ and

817: decays as $x^{-1/z}$ for $x \gg 1$ \cite{remark}.

818:

819: An alternative approach \cite{Krug02,Krug03} to estimating

820: the typical time $T^{\ast}$ required by the population

821: to reach the global maximum starts from the observation that

822: $T^\ast$ is of the order of the time $T_1$ at which

823: the globally fittest sequence at typical location $r_{1} = s_1$ and fitness

824: $F(r_1)$

825: overtakes the penultimate leader with respective quantities $s_2$ and

826: $F(s_2)$ (refer Sect.\ref{inter-jump}). From (\ref{Tkkprime}), we have

827: \be

828: T_1 =  \frac{s_1 - s_2}{F(s_1)-F(s_2)} \;\;,

829: \l{Tast}

830: \ee

831: where the inter-jump spacing in the numerator is given by

832: (\ref{interjumpiid}).

833: The estimation of the denominator involves a subtlety -- in previous

834: works \cite{Krug02,Krug03}, it was assumed that $F(s_1) - F(s_2)$ is of the

835: order of the \textit{fitness gap} $\epsilon$, which was defined as

836: the difference between the values of the global maximum

837: and the second largest fitness of the fitness landscape.

838: Because the second largest fitness does not necessarily appear in the

839: record sequence,

840: $\epsilon$ is only a lower bound on $F(r_1) - F(r_2)$, which in turn is clearly

841: a lower bound on the fitness difference of interest,

842: $\epsilon \leq F(r_1) - F(r_2) \leq F(s_1) - F(s_2)$. However, our

843: explicit calculations show that $F(r_1) - F(r_2)$ is of the

844: same order as $\epsilon$; moreover, at least for the i.i.d. model, we know

845: that at most a few

846: records are bypassed between $s_1$ and $s_2$, and hence the assumption

847: that $F(s_1) - F(s_2) \sim \epsilon$ seems justified.

848:

849: The calculation of the distribution of fitness gap $\epsilon$ is a

850: standard exercise in

851: extreme value statistics \cite{David70}.

852: In a system with a total number $S$ of sequences, the typical value of the

853: fitness gap increases as $S^{1/\delta}$ for the unbounded power

854: law distribution, decreases as $S^{-1/(1+\nu)}$ for the bounded distribution,

855: and is of order unity for the exponential distribution.

856: For the i.i.d. model, using

857: $s_{1} - s_{2} \sim N$ [see (\ref{interjumpiid})] and $S=N$ in

858: (\ref{Tast}), we recover $T_1 \sim T^{\ast} \sim N^{z}$ with $z$ given in

859: (\ref{1dz}). The other fitness distributions can be treated in a similar

860: manner.

861:

862: For the shell model, $S=\ell^{N}$ and due to (\ref{interjumpshell}), the

863: numerator is of order $\sqrt{N}$ so that $T^{\ast} \sim \sqrt{N}$ for

864: the case of exponentially distributed fitness. Presumably, the time $T_{j}$

865: at which the $j$'th jump occurs is also of order $\sqrt{N}$ for

866: $j \sim {\cal{O}}(1)$. Since the total number of jumps is of the same order,

867: it follows that initially there are many, quick jumps followed by few

868: jumps that take ${\cal{O}} (\sqrt{N})$ time.

869: This result agrees qualitatively with that seen

870: in experiments (discussed later) concerning the pace of evolution which is

871: initially rapid and later slows down considerably.

872: For the bounded distributions, $T^{\ast}$ increases exponentially with $N$

873: whereas it \textit{decreases} exponentially for the

874: fat-tailed power law distributions. The latter result implies

875: that, for large $N$, the global maximum

876: takes over in a single time step, which explains why the mean number of

877: jumps tends to unity  for the power

878: law distributions (see Sect.\ref{meanjump}).

879:

880:

881: %=============================================================================

882: \subsection{Universal tails of the evolution time distribution}

883:

884:

885: In the last subsection, we found the typical time $T^{\ast}$ to

886: reach the global maximum and now we

887: consider the distribution $P(T_{1},N)$ of the time $T_{1}$

888: at which the final jump occurs. For the i.i.d. model, since the

889: typical $T_{1}$ also grows

890: as $N^{z}$, we may expect the normalised distribution

891: $P(T_{1},N)$ to be of the scaling form

892: $P(T_{1},N) \approx N^{-z} g_{1}(T_{1}/N^{z})$.

893: In general, for $j \sim {\cal{O}}(1)$, the distribution $P(T_{j},N)$ of the time

894: $T_{j}$ at which the $j$'th jump occurs is of the scaling form

895: \be

896: \l{Tjscale}

897: P(T_{j},N) \approx N^{-z} g_{j}(T_{j}/N^{z}) \;\;.

898: \ee

899: Although the dynamic exponent

900: $z$ depends on the underlying fitness distribution, we shall now show

901: that the tail of the

902: distribution $P(T_{j},N)$ is universal. The events contributing to

903: large $T_{1}$ are the ones for which $F(s_1) - F(s_2)$ is small;

904: in these cases we expect the general bound $F(s_1) - F(s_2) \geq F(r_1) - F(r_2)$

905: to be saturated, i.e. the second record is not bypassed and $s_2 = r_2$.

906: Thus, using (\ref{Tast}), we obtain

907: \be

908: \l{PT1}

909: P(T_{1},N) \approx \left| \frac{d \epsilon_1}{d T_1} \right|

910: \mbox{Prob}(\epsilon_1 = \Delta_{\mathrm{iid}}(1)/T_1) \approx

911: \frac{\Delta_{\mathrm{iid}}(1)}{T_{1}^2}  \;

912: \mbox{Prob}(\epsilon_{1}=0) \;\;,

913: \ee

914: for large $T_1$,

915: where $\epsilon_{1}=F(r_{1})-F(r_{2})$.

916: The probability distribution of $\epsilon_1$ can be obtained along

917: the lines of the derivation of the distribution of the fitness gap

918: $\epsilon$ in \cite{Krug02}, and it is found that

919: the probability for a near-vanishing difference between two successive record

920: values is nonzero for any $p(F)$.

921: We conclude that $P(T_{1},N)$ has a

922: power law tail with

923: exponent $-2$ for any underlying fitness distribution \cite{Krug03}.

924: This is an example of the generation of a power law through a change

925: of variables (from $\epsilon_1$ to $T_1 \sim 1/\epsilon_1$) as described

926: in \cite{Sornette00}.

927:

928: Similarly, the events contributing to the tail of $P(T_{2},N)$

929: are the ones in which the record at $r_{2}$ is the penultimate leader and

930: the record at $r_{3}$ is the leader previous to it. Thus, we demand that

931: none of these two records should be bypassed; in particular,

932: $r_2$ should not be bypassed, which requires

933: that $T(r_{1}, r_{2}) > T(r_{2},r_{3})$.

934: This condition can be written as

935: $\epsilon_{1} < C \epsilon_{2}$, where

936: $\epsilon_2 = F(r_{2})-F(r_{3})$ and $C$ is the average value

937: of $(r_1-r_2)/(r_2-r_3)$, a number of order unity. Thus we obtain

938: \be

939: P(T_{2},N) \approx  \frac{\Delta_{\mathrm{iid}}(2)}{T_{2}^2}

940: \int_{0}^{C \Delta_{\mathrm{iid}}(2)/T_2} d \epsilon_{1} \;

941: \mbox{Prob}(\epsilon_{1}, \epsilon_{2},N)  \;

942: \approx \frac{C \Delta_{\mathrm{iid}}(2)^2}{T_2^3}

943: \mbox{Prob}(\epsilon_{1}=0, \epsilon_{2}=0,N).

944: \ee

945: Since Prob$(\epsilon_{1}=0, \epsilon_{2}=0,N)$ can be shown to be

946: nonzero, we conclude

947: that $P(T_{2},N) \sim T_2^{-3}$ for large $T_2$.

948: Extending the above arguments in a similar fashion to the next evolution

949: times, we find that the scaling functions in (\ref{Tjscale}) behave as

950: $g_{j}(x) \sim x^{-1-j}$ for $x \gg 1$.

951: Interestingly, this implies that the expected time $\av{T_j}$ is finite

952: for $j \geq 2$ and infinite for $j = 1$.

953: In Fig.~\ref{evol}, the prediction

954: $P(T_{j},N) \sim T^{-1-j}$ for $j=1,2,3$ is compared with data

955: obtained using Monte Carlo simulations for the i.i.d. model.

956: The behavior of the universal tails of $P(T_{j})$ discussed above

957: is true for the shell model as well.

958:

959:

960: %=============================================================================

961: %BETA-Z

962: %=============================================================================

963: \section{Bypassing probability and dynamic exponent: a conjecture}

964: \l{beta-z}

965:

966: As we have seen in the previous sections, the jump statistics are not

967: analytically tractable due to the constraint of minimal overtaking time.

968: For the i.i.d. model, the jumps differ from the records only up to a prefactor

969: $\beta$ conjectured to be given by  (\ref{beta1d}).

970: Comparing the expressions in Eqs.(\ref{beta1d}) and (\ref{1dz}), we observe

971: that

972: the bypassing probability $1- \beta$ appears to be related to the

973: dynamic exponent $z$ by the following universal relation

974: \be

975: \beta = (1-\beta) \; z \;\;.

976: \l{relation}

977: \ee

978: A derivation of this relation (which eludes us so far) would constitute a

979: proof of the conjecture (\ref{beta1d}).

980:

981: Interestingly, the relation (\ref{relation}) can be interpreted in terms of a

982: kind

983: of \textit{duality} between the $k$- and the $t$-axis of the inset of

984: Fig.~\ref{lines}. So far we have identified each jump with the position on the

985: $E$-axis where the line that takes over the leadership when the jump

986: occurs originates; but we may just as well identify the jump with the

987: corresponding crossing time $T(k,k^{\prime})$ at which the leadership

988: shifts from $k'$ to $k$.

989: Clearly, there is a one-to-one correspondence between the jumps defined on

990: two axes and the average number ${\cal{J}}_{\mathrm{iid}}^{\prime}$ of jumps

991: on the time axis is equal to ${\cal{J}}_{\mathrm{iid}}$ discussed in earlier

992: sections.

993: Thus, ${\cal{J}}_{\mathrm{iid}}^{\prime}={\cal{J}}_{\mathrm{iid}}

994: \approx \beta \ln {N}$ and

995: since the

996: typical time $T^{\ast}$ to reach the global maximum scales as $N^{z}$, we have

997: ${\cal{J}}_{\mathrm{iid}}^{\prime} \approx \beta^{\prime}

998: \ln {T^{\ast}}$ with $\beta^{\prime}= 1- \beta$ due to the conjecture

999: (\ref{relation}).

1000: This leads us to expect that the probability

1001: $P_{\mathrm{iid}}(t,N)$ that a jump occurs at time $t$ decays as

1002: $\beta^{\prime}/t$ for $t \ll N^{z}$ and as

1003: $1/t^{2}$ for $t \gg N^{z}$; the latter behavior is the universal

1004: $t^{-2}$ tail explained in Section~{\ref{approach}}.

1005: We conclude that

1006: the sum of the jump probabilities along the $k$- and the $t$-axes should

1007: sum up to a universal function,

1008: {\it i.e.}

1009: \be

1010: P_{\mathrm{iid}}(t=X,N)+P_{\mathrm{iid}}(k=X,N) = 1/X \;\;,

1011: \ee

1012: for any choice of fitness distribution.

1013: The numerical evidence supporting this claim is shown in Fig.~\ref{sum}.

1014: Furthermore, in analogy to the jump spacing $\Delta_{\mathrm{iid}}(j)$ along

1015: the $k$-axis,

1016: one can also consider the quantity

1017: $\Delta_{\mathrm{iid}}^{\prime}(j) = \langle T_j \rangle -

1018: \langle T_{j+1} \rangle$

1019: which is the spacing

1020: between the successive jumps on the time-axis. Replacing $N$ by $T^{\ast}$

1021: and $b = \beta/2$ by $\beta'/2 = (1-\beta)/2$ in (\ref{interjumpiid}), we

1022: expect

1023: \be

1024: \Delta_{\mathrm{iid}}^{\prime}(j) \sim N^{z}

1025: \left( \frac{1-\beta}{2} \right)^{j-1} \;\;,\;\;j=1,2,...  \;\;. \l{cnjctre}

1026: \ee

1027: Numerical results consistent with this expression

1028: are shown in the inset of Fig.~\ref{sum} for some distributions.

1029: The deviations seen in the data for the first jump ($j=1$) reflect the

1030: fact that,

1031: because of the $1/T_1^2$-tail derived in Eq.(\ref{PT1}), the average of $T_1$

1032: is not defined and hence grows with the number of disorder realizations.

1033:

1034: %=============================================================================

1035: %RECORD TIMES

1036: %=============================================================================

1037: \section{A Model Based on Record Times}

1038: \l{mft}

1039:

1040: In this section, we introduce a further simplification of the i.i.d. model.

1041: As we have discussed already, a sequence $k$ may occur in the set of

1042: jumps provided $F(k)$ is a record. Thus, it is sufficient to consider

1043: only the subset of sequences whose fitness is a record.

1044: Here it is convenient to label the records forward in time, so we

1045: denote by $R_j$ the location of the $j$'th record with $j=1$ labeling

1046: the first record ($R_1 = 1$ by convention), $R_2 > R_1$ the second record,

1047: and so on.  Note that

1048: there are two sources of randomness in the problem -- one arising from the

1049: record locations $R_j$ and the other due to record

1050: values $F(R_j)$. For exponentially distributed fitness, it is known that

1051: the differences between

1052: successive record values are independent and exponentially distributed

1053: random variables \cite{Tata69}. Thus, the fitness of two

1054: successive records differs by unity on average.

1055: These considerations allow us to

1056: eliminate the randomness associated with the record values by replacing

1057: the i.i.d. model of (\ref{shell}}) with exponentially distributed fitness by

1058: a simpler model for which the population evolves as

1059: \be

1060: \tilde E(j,t)=-R_{j}+ j t.

1061: \label{mfmodel}

1062: \ee

1063: Like in the original i.i.d. model, we find numerically that the

1064: average number $\tilde {\cal{J}}$ of jumps grows logarithmically with $N$

1065: with a prefactor $\tilde \beta \approx 0.63$ (see Fig.~\ref{mftjump}).

1066: This is distinctly different from the value $\beta \approx 1/2$ found for

1067: the i.i.d. model with exponential fitness distribution, indicating that the

1068: randomness in the record values is relevant. Somewhat surprisingly, in

1069: contrast to the conjectured values of $\beta$ for the full i.i.d. problem

1070: given in (\ref{beta1d}), $\tilde \beta$ does not

1071: seem to be a simple rational number.

1072:

1073: Our primary motivation for introducing this simplified model is to gain further

1074: insight into the mathematical structure of bypassing.

1075: For the model defined by (\ref{mfmodel}),

1076: the crossing time $\tilde T(j,j^{\prime})$ at which the line associated

1077: with the $j$'th record overtakes that associated with record $j'$ is given by

1078: \be

1079: \label{mftimes}

1080: \tilde T(j,j')= \frac{R_{j}-R_{j'}}{j-j'}.

1081: \ee

1082: Then the probability $\tilde \beta_{2}$ that the second record is not

1083: bypassed can be written as

1084: \bea

1085: \tilde \beta_{2} &=&

1086: \mbox{Prob} \left[ R_{2}-1= \mbox{min} \left( R_{2}-1, \frac{R_{3}-1}{2},

1087: \frac{R_{4}-1}{3},...\right) \right].

1088: \eea

1089: The evaluation of the

1090: condition on the right hand using the joint

1091: probability distribution for the record

1092: times \cite{Nevzorov87,Arnold98,Nevzorov01}

1093: \be

1094: \label{jointRn}

1095: \mbox{Prob}(R_2,R_3,...,R_n) =

1096: \frac{1}{(R_2 - 1)(R_3 - 1)(R_4 - 1)...(R_n - 1)R_n}

1097: \ee

1098: is clearly a difficult task. An upper bound on $\tilde \beta_2$ is obtained by

1099: requiring that the record at $R_2$ is not bypassed by the one at $R_3$, i.e.

1100: that $\tilde T(2,1) < \tilde T(3,1)$. This gives

1101: \be

1102: \label{b2bound}

1103: \tilde \beta_2 \leq

1104: \mbox{Prob}(R_{3} > 2 R_{2} - 1)= \sum_{R_{2}=2}^{\infty}

1105: \frac{1}{R_{2}-1} \;

1106: \frac{1}{2 R_{2}-1} = 2 \; (1- \ln 2) \approx 0.613706 \;\;.

1107: \ee

1108: In our simulations on a large system, we find

1109: $\tilde \beta_{2} \approx 0.600786$, showing that bypassing of $R_2$

1110: by the records beyond $R_3$ is rather unimportant.

1111:

1112: Consider next the behavior of the crossing times (\ref{mftimes}) when $j$ and

1113: $j'$ are large. Williams \cite{Williams73} has shown that the sequence of

1114: record times

1115: can be generated from the recursion relation

1116: \cite{Glick78,Nevzorov87,Arnold98,Nevzorov01}

1117: \be

1118: \label{Williams}

1119: R_{j+1} =  [ e^{X_j} R_{j}] + 1,

1120: \ee

1121: where the $X_j$ are independent, exponentially distributed random variables

1122: with mean one,

1123: and $[ a ]$ is the integer part of $a$. For large $j$, the integer constraint

1124: can be ignored, and hence

1125: \be

1126: \label{Tijlarge}

1127: \tilde T(j,j') \approx \frac{R_{j'}}{j - j'}

1128: \left[ \exp \left(\sum_{i=j'+1}^j X_i \right) - 1

1129: \right].

1130: \ee

1131: Recalling that

1132: the choice of the next non-bypassed record involves finding the minimum among

1133: all crossing times $\tilde T(j,j')$ with $j > j'$, we see that that the current

1134: location $R_{j'}$ cancels in the comparison between two such crossing times.

1135: The problem thus acquires translation invariance in the

1136: record space, in the sense that the position of the next non-bypassed

1137: record depends only on $j - j'$ and on the random

1138: variables $X_i$ associated with the records between $j'$ and $j$. It is

1139: therefore plausible

1140: that the bypassing probability tends to a constant for large $j$, and one is

1141: tempted

1142: to describe the process by a Markov chain on the

1143: set of records with the transition probability

1144: \be

1145: \label{Pjk}

1146: P_{j',j} = \mbox{Prob}[ \tilde T(j,j') = \min_{n > j'} \tilde T(n,j')].

1147: \ee

1148: This is the conditional probability that the next jump occurs at $j$, given

1149: that

1150: the preceding jump was at $j'$, averaged over all realizations of record times.

1151: Using the representation (\ref{Tijlarge}), the $P_{j,j'}$ are manifestly

1152: translationally

1153: invariant for large $j,j'$, depending only on $j-j'$.

1154: Even in the asymptotic limit in which the expression (\ref{Tijlarge}) can be

1155: used,

1156: the evaluation of (\ref{Pjk}) is cumbersome, but an analytic

1157: upper bound on $ P_{j,j+1}$ can be obtained along the lines of

1158: (\ref{b2bound}).

1159: In the limit of large $j$, we can write

1160: \be

1161: \label{Pjj+1bound}

1162: P_{j,j+1} \leq \mbox{Prob}[\tilde T(j+1,j) < \tilde T(j+2,j)]

1163: = \mbox{Prob}[e^{X_{j+1}} + e^{-X_j} > 2] = \ln 2,

1164: \ee

1165: where $X_j$ and $X_{j+1}$ are the independent exponential random variables

1166: used in the

1167: representation (\ref{Williams}). The numerical evalution of (\ref{Pjk}) yields

1168: $P_{j,j+1} \approx 0.669$, $P_{j,j+2} \approx 0.225$ and

1169: $P_{j,j+3} \approx 0.075$,

1170: indicating a roughly exponential decay of the transition probability.

1171: From the $P_{j',j}$, the mean density of jumps (non-bypassed records) can be

1172: computed according to

1173: \be

1174: \label{betamf2}

1175: \tilde \beta_{\mathrm{Markov}} =

1176: \left( \sum_{n=1}^{\infty} n P_{j,j+n} \right)^{-1} \approx 0.676,

1177: \ee

1178: which is significantly larger than the direct numerical estimate

1179: $\tilde \beta \approx 0.63$

1180: (see Fig.~\ref{mftjump}). This shows that the transition probability

1181: (\ref{Pjk}) is

1182: \textit{not} an exact representation of the process. The reason is that

1183: (\ref{Pjk})

1184: is an \textit{annealed} average, whereas in the full problem the record times

1185: (or, equivalently, the exponential random variables in (\ref{Williams}))

1186: must be

1187: treated as \textit{quenched}: Minimizing the crossing times $\tilde T(j,j')$

1188: for a given $j'$ involves, in principle, \textit{all} $j > j'$, and

1189: the \textit{same} set of

1190: random variables is used every time this minimization is repeated for

1191: different $j'$.

1192:

1193: We thus have to conclude that the range of attainable analytic results, even

1194: for the

1195: simplified problem (\ref{mfmodel}), is very limited. An extension of the bound

1196: (\ref{b2bound}) to the full i.i.d. model should be feasible using the

1197: representation

1198: of the joint distribution of record times and record values through a Markov

1199: chain \cite{Arnold98,Nevzorov01}; however, as such a bound is unlikely to

1200: provide much insight

1201: into the conjectured relation (\ref{relation}), we have not pursued this

1202: approach.

1203:

1204: %=============================================================================

1205: %CONCLUSION

1206: %=============================================================================

1207: \section{Conclusions}

1208: \l{conclude}

1209:

1210: In this article, we characterised the evolutionary trajectories

1211: traced out by a quasispecies population in an uncorrelated rugged

1212: fitness landscape. These trajectories approach the global fitness maximum

1213: through a sequence of \textit {jumps}

1214: which mark a change in the identity of the most populated genotype.

1215: The statistics of these evolutionary jumps

1216: was studied mainly numerically. However, useful insights were provided by

1217: a study of \textit{record} statistics which could be handled analytically.

1218: It was found that the jump statistics are qualitatively similar to records,

1219: but there are quantitative differences because, as shown in

1220: Fig.~\ref{lines},

1221: a record breaking genotype can be \textit{bypassed}

1222: by a superior one before it can acquire dominance

1223: in the population (i.e. qualify to be a jump).

1224:

1225: The statistics of records and jumps depends strongly on the

1226: geometry of the space of genotypes. The natural setting for genotype

1227: evolution is the Hamming space of sequences of fixed length $N$. However,

1228: computational effort could be greatly reduced by lumping together

1229: the sequences within a shell of constant Hamming distance with respect to

1230: the initial population \cite{Krug03}. Complementary to this shell model, we

1231: also considered

1232: a model of i.i.d. shell fitnesses, which corresponds effectively to a

1233: one-dimensional sequence space. While for the i.i.d. model, the average

1234: number of jumps differs from the number of records only through the

1235: prefactor $\beta$ of the logarithm of $N$, for the shell model

1236: the ratio of the two numbers

1237: ${\cal{J}}_{\mathrm{shell}}/{\cal{R}}_{\mathrm{shell}} \rightarrow 0$ as

1238: $N \rightarrow \infty$.

1239: For fat-tailed fitness distributions the evolutionary

1240: trajectories in the shell model may even degenerate, in the sense that the

1241: global fitness maximum is reached in a single step. For distributions

1242: decaying faster than a power law, like the exponential and normal

1243: distributions,

1244: we find numerically that ${\cal{J}}_{\mathrm{shell}} \sim \sqrt{N}$;

1245: an analytic understanding of this result would be very desirable.

1246:

1247: A universal feature of the

1248: evolutionary trajectories, which is independent of the geometry of genotype

1249: space as well as of the fitness distribution, is the hierarchy of power

1250: law tails

1251: for the distributions of the times at which the jumps occur. In particular,

1252: the $T^{-2}$-tail for the total evolution time implies that the average

1253: of $T$ is infinite. The dependence of the

1254: \textit{typical} evolution time on the size of the sequence space

1255: can be characterized by a dynamic exponent $z$, which was obtained

1256: exactly in terms of the extremal properties of the fitness distribution.

1257: On a mathematical level, perhaps the most intriguing result of this work

1258: is the conjectured relation (\ref{relation}) between  $\beta$ and $z$ for the

1259: i.i.d. model. It would be extremely interesting to understand how such a

1260: simple relation arises from the properties of the matrix of crossing

1261: times (\ref{Tkkprime}); however,

1262: in view of the difficulties encountered even in the analysis of the simplified

1263: problem (\ref{mfmodel}), we do not see at present how further progress in this

1264: direction can be achieved.

1265:

1266: In this article, we worked in the limit of infinite population and strong

1267: selection. However, we expect our results to

1268: hold, at least qualitatively, for finite selective temperature as well. The

1269: jumps appear instantaneous in the strong selection limit;

1270: at finite selective temperature

1271: they take a finite amount of time in which the peak

1272: associated with the new leader catches up with the currently dominating

1273: peak and the population distribution briefly becomes bimodal

1274: \cite{Rujan88,Ebeling84,Krug93}.

1275: Although the quasispecies theory, which works in the infinite population

1276: limit, is inadequate

1277: to address the fluctuation effects that become important

1278: when small mutant populations cross a fitness valley \cite{vanNimwegen00},

1279: for the related problem of episodic behavior in evolutionary

1280: computation \cite{vanNimwegen97}, some of the finite

1281: population behavior has been understood on the basis of the infinite population

1282: limit. Thus we expect our investigation to give some insight into the

1283: behavior of finite populations in rugged fitness landscapes

1284: \cite{Campos02a,Campos02b} as well.

1285:

1286: We close with some remarks on the applicability of the present work to the

1287: evolution of

1288: biological populations. A realization of asexual mutation-selection dynamics

1289: in a

1290: static fitness landscape that is believed to be quite rugged is provided by

1291: the long-term experiments on populations of \textit{Escherichia coli} carried

1292: out by Lenski and collaborators \cite{Lenski94,Elena96,Elena03}. These

1293: experiments

1294: show evidence for punctuated behavior both in the fitness and in the

1295: morphological

1296: features (such as cell size) of the evolving populations, which is

1297: attributed to the

1298: emergence and fixation of beneficial mutations. Fixation implies that a

1299: mutation which

1300: is initially present only in a single individual is inherited by a growing

1301: fraction

1302: of the population and eventually acquires dominance.

1303:

1304: At least on a qualitative level,

1305: this process corresponds in our model to that in which a subpopulation

1306: residing in a distant shell

1307: of sequence space takes over the leadership of the population,

1308: as described by (\ref{lineq}) and illustrated in Fig.~\ref{lines}.

1309: The bypassing of one subpopulation by another is analogous to the phenomenon of

1310: \textit{clonal interference}, in which one beneficial mutation is superseded by

1311: another one before reaching fixation \cite{Gerrish98}.

1312: The key difference between our model and the behavior of real asexual

1313: evolution is that in the latter case beneficial mutants arise through the

1314: stochastic search of a finite

1315: population in an immensely large sequence space, while in the quasispecies

1316: model \textit{all possible mutants are present}

1317: (in extremely small numbers) \textit{after the first time

1318: step}. It is, therefore, mandatory to include the

1319: stochastic finite population dynamics

1320: into the model. Nevertheless, some features of the competition between

1321: the mutant populations (however they may have arisen) could

1322: well survive in a more complete, realistic treatment.

1323:

1324: \section*{Acknowledgments}

1325:

1326: We are grateful to A. Engel, L. Peliti, P. Eichelsbacher, W. Kirsch,

1327: T. Kriecherbauer and N. Kumar for useful discussions. This work was

1328: supported by DFG within

1329: SFB/TR 12 \textit{Symmetries and Universality in Mesoscopic Systems.}

1330:

1331:

1332: %=============================================================================

1333: %APPENDIX

1334: %=============================================================================

1335: \appendix

1336:

1337: \section{Inter-record spacing for the shell model}

1338: \l{appendix1}

1339: Using the Stirling's formula $r ! \approx \sqrt{ 2 \pi r} (r/e)^{r}$ in

1340: the binomial coefficient ${N \choose r}$ for $r, N \gg 1$ with $r/N$ fixed and

1341: expanding ${N \choose r}$ about its maximum at $r=N/2$, we have

1342: \be

1343: \frac{1}{2^{N}} \; {N \choose r} \approx \sqrt{\frac{2}{\pi N}} \;

1344: \mbox{exp} \left[ {-\frac{(N/2-r)}{N/2}}^{2} \right] \;\;. \l{cnr}

1345: \ee

1346: This expression can be used to show that

1347: \bea

1348: \frac{1}{2^{N}} \; \sum_{r=0}^{y}{N \choose r} & \approx &

1349: \frac{1}{2} \mbox{erfc}(\alpha) \l{sumcnr} \\

1350: \frac{1}{2^{N}} \; \sum_{r=0}^{y} r {N \choose r} & \approx & \frac{N}{4}

1351: \left( \mbox{erfc}( \alpha) - \sqrt{\frac{2}{\pi N}} \mbox{exp}

1352: (- \alpha^{2}) \right) \;\;,\l{avgcnr}

1353: \eea

1354: by approximating the sum on the left hand side by an integral. In

1355: the above expressions,

1356: $\alpha = (N/2-y)/\sqrt{N/2}$ and

1357: $\mbox{erfc(x)} = (2/\sqrt{\pi}) \int_{x}^{\infty} dt \; e^{-t^{2}}$ is

1358: the complementary error function. One can derive (\ref{rj}) for the

1359: average location $\av{r_{j}}=\sum r_{j} \tilde{P}(r_{j})$ where

1360: $\tilde{P}(r_{j})$ is given by (\ref{recloc}) using

1361: Eqs.(\ref{cnr})-(\ref{avgcnr}) and replacing the sums by integrals.

1362:

1363: The integral $G(m)$ in Section~\ref{recordspace} can be estimated by saddle

1364: point method as follows. We have

1365: \be

1366: G(m) = (-1)^{m} \int_{-\infty}^{\infty} dx \;

1367: \frac{e^{-2 x^{2}}}{\mbox{erfc}(x)} \;

1368: \frac{\left(\ln {\mbox{erfc(x)}}\right)^{m}- \left( \ln {2} \right)^{m}}{m!}

1369: \approx  (-1)^{m}

1370: \int_{-\infty}^{\infty} dx \; \frac{e^{-{\cal{G}}(x)}}{m!} \;\;,

1371: \ee

1372: where ${\cal{G}}(x) \approx x^{2} - \ln x \; (2 m + 1+ x^{-2})$ for

1373: large $m$. Here we have used

1374: that $\mbox{erfc}(x) \approx e^{-x^{2}}/ (\sqrt{\pi} x) $ for $x \gg 1$.

1375: Expanding ${\cal{G}}(x)$ about its minimum $x_{0} \approx m- \ln \sqrt{m}$

1376: up to $O(x-x_{0})^{2}$ and doing the Gaussian

1377: integral, we obtain $G(m) \approx \sqrt{ \pi m/2}$ for large $m$.

1378:

1379: %=============================================================================

1380: \section{Distribution of the most populated sequence}

1381: \l{appendix2}

1382:

1383: Our goal is to compute the probability $P_t(k^\ast)$ that the

1384: sequence $k^{\ast}$ has

1385: the maximum population at time $t$ in the i.i.d. model.

1386: This distribution is given by

1387: \be

1388: P_t(k^\ast) = \int_{E_{\mathrm{min}}(k^\ast,t)}^{E_{\mathrm{max}}(k^\ast,t)}

1389: dE \;\; p_t^{(k^\ast)}(E)  \;\;

1390: \prod_{k \neq k^{\ast}}

1391: q_t^{(k)}(E) \;\;,

1392: \ee

1393: where

1394: \be

1395: p_t^{(k)}(E) = t^{-1} p[(E+k)/t]

1396: \ee

1397: is the distribution of $E(k,t)$ obtained from the fitness distribution

1398: $p(F)$ via the variable transformation (\ref{lineq}). The

1399: limits of the support of $p_t^{(k)}(E)$ are

1400: $E_{\mathrm{min}}(k,t) = -k+ F_{\mathrm{min}} t$

1401: and $E_{\mathrm{max}}(k,t) = -k+ F_{\mathrm{max}} t $, and

1402: $q_t^{(k)}(E)= \int_{E_{\mathrm{min}}}^{E} dE \; p_t^{(k)}(E) $

1403: is the corresponding cumulative distribution for $E > E_{\mathrm{min}}$

1404: and zero otherwise. In the following, we show

1405: that the distribution $P_t(k^\ast)$ is of the scaling form

1406: $P_t(k^\ast) \approx t^{-1/z} \; \Phi (k^{\ast}/{t^{1/z}})$

1407: for various choices of fitness distribution.

1408:

1409: (i) $\;$ \underline{$p(F)=e^{-F}$:}

1410: \bea

1411: P_t(k^{\ast}) = \frac{1}{t}

1412: \int_{0}^{\infty} dx \; e^{-(x+k^{\ast})/t}

1413: \prod_{k \neq k^{\ast}} (1-e^{-(x+k)/t}) \approx \frac{1}{t} e^{-k^{\ast}/t}  \;\;,

1414: \eea

1415: where the last expression is obtained by exponentiating the product and

1416: evaluating the resulting sum as an integral for an infinite system.

1417: Thus in this case $z = 1$ and the scaling function is given by

1418: $\Phi_1(y) = e^{-y}$.

1419:

1420: (ii) $\;$ \underline{$p(F)=(1+\nu) (1-F)^{\nu}$, $\nu \geq -1$:}

1421: \bea

1422: P_t(k^\ast) &=& \frac{1+\nu}{t} \int_{0}^{t-k^{\ast}} dx \;

1423: \left( 1- \frac{x+k^{\ast}}{t} \right)^{\nu}

1424: \prod_{k \neq k^{\ast}}

1425: \left[ 1-\left( 1- \frac{x+k}{t} \right)^{1+\nu} \right] \\

1426:  & \approx &  \frac{1+\nu}{t} \int_{0}^{t-k^{\ast}} dx \;

1427: \left( 1- \frac{x+k^{\ast}}{t} \right)^{\nu}

1428: \mbox{exp} \left[ - \frac{t}{2+\nu}

1429: \left( 1- \frac{x}{t} \right)^{2+\nu} \right] \\

1430: & \approx & \frac{1}{t^{1/z}} \; \Phi_{2} \left( y= \frac{k^{\ast}}{t^{1/z}}

1431: \right) \;\;,

1432: \eea

1433: where $z=(2+\nu)/(1+\nu)$ and the scaling function $\Phi_{2}(y)$ is given by

1434: \be

1435: \Phi_{2}(y) \approx  \frac{1+\nu}{(2+\nu)^{1/z}}

1436: \int_{y^{2+\nu}/(2+\nu)}^{\infty} dx \;

1437: e^{-x} x^{-1/z} \left( y+ ((2+\nu) x)^{1/(2+\nu)} \right)^{\nu} \;\;.

1438: \ee

1439:

1440:

1441: (iii) $\;$ \underline{$p(F)= \delta \; F^{-1-\delta}$, $\delta > 1$:}

1442: \bea

1443: P_t(k^\ast) &=& \frac{\delta}{t} \int_{t}^{\infty} dx \;

1444: \left( \frac{t}{x+k^{\ast}} \right)^{1+\delta}

1445: \prod_{k \neq k^{\ast}} \left[ 1- \left( \frac{t}{x+k} \right)^\delta

1446: \right] \\

1447: & \approx & \frac{1}{t^{1/z}} \; \Phi_{3} \left( y= \frac{k^{\ast}}{t^{1/z}}

1448: \right) \;\;,

1449: \eea

1450: where $z=(\delta-1)/\delta$ and the scaling function $\Phi_{3}(y)$

1451: is given by

1452: \be

1453: \l{f3}

1454: \Phi_{3}(y) \approx \delta (\delta -1)^{1/(\delta -1)}

1455: \int_{0}^{\infty} dx

1456: \frac{e^{-x} x^{1/(\delta-1)}}{(1+((\delta-1) x)^{1/(\delta-1)}y)^{1+\delta}}

1457: \;\;.

1458: \ee

1459:

1460: %=============================================================================

1461: %BIBLIOGRAPHY

1462: %=============================================================================

1463:

1464: \begin{thebibliography}{999}

1465:

1466: \bibitem{Sneppen95} K. Sneppen, P. Bak, H. Flyvbjerg and M. H. Jensen,

1467: Proc. Natl. Acad. Sci. USA \textbf{92}, 5209 (1995).

1468:

1469: \bibitem{Sibani98} P. Sibani, M. Brandt and P. Alstr\o m, Int. J. Mod.

1470: Phys. {\bf 12}, 361 (1998).

1471:

1472: \bibitem{Peliti97} L. Peliti, {\tt{cond-mat/9712027}}.

1473:

1474: \bibitem{Baake99} E. Baake and W. Gabriel,

1475: in {\it Annual Reviews of Computational Physics VII},

1476: ed. by  D. Stauffer (World Scientific, Singapore, 2000), p. 203.

1477:

1478: \bibitem{Drossel01} B. Drossel, Adv. Phys. \textbf{50}, 209 (2001).

1479:

1480: \bibitem{Eldredge89} N. Eldredge, \textit{Macroevolutionary Dynamics}

1481: (McGraw-Hill, New York 1989).

1482:

1483: \bibitem{Gould93} S.J. Gould and N. Eldredge, Nature {\bf 366}, 223 (1993).

1484:

1485:

1486: \bibitem{Lenski94} R.E. Lenski and M. Travisano,

1487: Proc. Natl. Acad. USA \textbf{91}, 6808 (1994).

1488:

1489: \bibitem{Elena96} S.F. Elena, V.S. Cooper, and R.E. Lenski, Science

1490: \textbf{272}, 1802 (1996).

1491:

1492: \bibitem{Burch99} C.L. Burch and L. Chao, Genetics \textbf{151}, 921 (1999).

1493:

1494: \bibitem{Elena03} S.F. Elena and R.E. Lenski,

1495: Nature Reviews Genetics \textbf{4}, 457 (2003).

1496:

1497: \bibitem{Schuster02} P. Schuster, in \textit{Biological Evolution and

1498: Statistical Physics}, eds. M. L\"assig and A. Valleriani

1499: (Springer, Berlin, 2002), p.55.

1500:

1501: \bibitem{Rujan88} P. Ruj\'an, Z. Phys. B \textbf{73}, 391 (1988).

1502:

1503: \bibitem{vanNimwegen97} E. van Nimwegen, J.P. Crutchfield, and M. Mitchell,

1504: Phys. Lett. A \textbf{229}, 144 (1997);

1505: Theor. Comp. Sci. \textbf{229}, 41 (1999).

1506:

1507: \bibitem{Adami95} C. Adami, Phys. Lett. A \textbf{203}, 29 (1995).

1508:

1509: \bibitem{Newman85} C.M. Newman, J.E. Cohen and C. Kipnis, Nature

1510: {\bf 315}, 400 (1985).

1511:

1512: \bibitem{Lande85} R. Lande,

1513: Proc. Natl. Acad. Sci. USA \textbf{82}, 7641 (1985).

1514:

1515: \bibitem{Simpson44} G.G. Simpson, \textit{Tempo and Mode in Evolution}

1516: (Columbia University Press, New York 1944).

1517:

1518: \bibitem{Ebeling84} W. Ebeling, A. Engel, B. Esser and R. Feistel,

1519: J. Stat. Phys. {\bf 37}, 369 (1984).

1520:

1521: \bibitem{Zhang86} Y. C. Zhang, Phys. Rev. Lett. {\bf 56}, 2113 (1986).

1522:

1523: \bibitem{Eigen71} M. Eigen, Naturwissenschaften {\bf 58}, 465 (1971).

1524:

1525: \bibitem{Eigen89}

1526: M. Eigen, J. McCaskill, and P. Schuster, Adv. Chem. Phys. {\bf 75}, 149 (1989).

1527:

1528: \bibitem{Krug02} J. Krug in \textit{Biological Evolution and

1529: Statistical Physics},

1530: eds. M. L\"assig and A. Valleriani (Springer, Berlin, 2002), p. 205.

1531:

1532: \bibitem{Woodcock96} G. Woodcock and P.G. Higgs,

1533: J. theor. Biol. \textbf{179}, 61 (1996).

1534:

1535: \bibitem{Krug03} J. Krug and C. Karl, Physica A {\bf 318}, 137 (2003).

1536:

1537: \bibitem{Glick78} N. Glick, Amer. Math. Monthly {\bf 85}, 2 (1978).

1538:

1539: \bibitem{Nevzorov87} V.B. Nevzorov, Theory Probab. Appl.,

1540: {\bf 32}, 201 (1987).

1541:

1542: \bibitem{Arnold98} B.C. Arnold, N. Balakrishnan and

1543: H.N. Nagaraja, \textit{Records}

1544: (Wiley, New York 1998).

1545:

1546: \bibitem{Nevzorov01} V.B. Nevzorov,

1547: \textit{Records: Mathematical Theory} (American Mathematical

1548: Society, Providence, 2001).

1549:

1550: \bibitem{Kauffman87} S.A. Kauffman, S. Levin, J. theor. Biol.

1551: \textbf{128}, 11 (1987).

1552:

1553: \bibitem{Krug04} J. Krug and K. Jain in {\it Proceedings of 8th

1554: ICCMSP/Marrakech}, to be published in Physica A, {{\tt q-bio.PE/0409019}}.

1555:

1556: \bibitem{Franz97} S. Franz and L. Peliti, J. Phys. A \textbf{30}, 4481 (1997).

1557:

1558: \bibitem{Krug93} J. Krug and T. Halpin-Healy, J. Phys. I

1559: France \textbf{3}, 2179 (1993).

1560:

1561: \bibitem{Gaertner04} J. G\"artner and W. K\"onig, in

1562: J.-D. Deuschel and A. Greven (Eds.), \textit{Interacting Stochastic Systems}

1563: (Springer, Berlin 2005) p 153.

1564:

1565: \bibitem{Nevzorov84} V. B. Nevzorov, Theory Probab. Appl.,

1566: {\bf 29}, 845 (1984).

1567:

1568: \bibitem{Yang75} M. C. K. Yang, J. Appl. Prob. {\bf 12}, 148 (1975).

1569:

1570: \bibitem{David70} H.A. David, \textit{Order Statistics}

1571: (Wiley, New York, 1970).

1572:

1573: \bibitem{Sornette00} D. Sornette,

1574: \textit{Critical Phenomena in Natural Sciences} (Springer, Berlin 2000).

1575:

1576: \bibitem{remark} This scaling form is not true for

1577: power law distributed

1578: fitness with $1 < \delta \leq 2$, because the scaling function $\Phi$ in

1579: (\ref{Ptscale}) does not possess a first moment in this case (see (\ref{f3})).

1580: Nevertheless, the time scale

1581: $T^{\ast} \sim N^{z}$ with $z$ given in (\ref{1dz}) for all $\delta > 1$.

1582:

1583: \bibitem{Tata69} M. N. Tata, Z. Wahrsch. Verw. Geb. {\bf 12}, 9 (1969).

1584:

1585:

1586: \bibitem{Williams73} D. Williams, Bull. London Math. Soc. {\bf 5}, 235 (1973).

1587:

1588: \bibitem{vanNimwegen00} E. van Nimwegen and J.P. Crutchfield,

1589: Bull. Math. Biol. \textbf{62}, 799 (2000).

1590:

1591: \bibitem{Campos02a} P.R.A. Campos, C. Adami and C.O. Wilke,

1592: Physica A \textbf{304}, 495 (2002).

1593:

1594: \bibitem{Campos02b} P.R.A. Campos and V.M. de Oliveira,

1595: Int. J. Mod. Phys. C \textbf{13}, 1003 (2002).

1596:

1597:

1598: \bibitem{Gerrish98} P.J. Gerrish and R.E. Lenski,

1599: Genetica \textbf{102/103}, 127 (1998).

1600:

1601: \end{thebibliography}

1602:

1603: %=============================================================================

1604: %FIGURES

1605: %=============================================================================

1606: \begin{figure}

1607: \begin{center}

1608: \psfig{figure=lines.ps,width=9cm,angle=270}

1609: \caption{

1610: Time evolution of the logarithmic population variable $E(k,t) = - k +

1611: F(k)t$. The fitnesses $F(k)$ of the sequences corresponding to thin green

1612: lines are not records whereas those corresponding to bold red lines

1613: are. The lines appearing in the upper envelope define the most populated

1614: sequence $k^\ast$; an evolutionary jump occurs when two such lines cross.

1615: Note that the sequences $k=2$ and $k=6$, although constituting fitness

1616: records, are bypassed as they do not satisfy the minimum crossing time

1617: constraint. The inset shows the punctuated evolution of the most

1618: populated sequence $k^\ast$ as a function of time.}

1619: \label{lines}

1620: \end{center}

1621: \end{figure}

1622:

1623: \begin{figure}

1624: \begin{center}

1625: \psfig{figure=iidjump.ps,width=9cm,angle=270}

1626: \caption{Mean number ${\cal{J}}_{\mathrm{iid}}$ of jumps for

1627: the i.i.d. model for

1628: various fitness distributions. The lines are the best fits to the functional

1629: form ${\cal{J}}_{\mathrm{iid}}= \beta \ln N + \mbox{constant}$.}

1630: \label{iidjump}

1631: \end{center}

1632: \end{figure}

1633:

1634:

1635: \begin{figure}

1636: \begin{center}

1637: \psfig{figure=sjump.ps,width=9cm,angle=270}

1638: \caption{Data collapse for the scaled jump distribution

1639: $\sqrt{N} P_{\mathrm{shell}}(k,N)$ vs. $k/N$ for

1640: the shell model with $\ell=2$ and $p(F)=e^{-F}$. Inset:

1641: Scaled inter-jump spacing $\Delta_{\mathrm{shell}}(j)/\sqrt{N}$ vs.

1642: $j/\sqrt{N}$ to show that jumps are roughly equally spaced in the shell model.}

1643: \label{sjump}

1644: \end{center}

1645: \end{figure}

1646:

1647: \begin{figure}

1648: \begin{center}

1649: \psfig{figure=rspace.ps,width=9cm,angle=270}

1650: \caption{Semi-log plot for inter-jump spacing $\Delta_{\mathrm{iid}}(j)$

1651: for the i.i.d. model in accordance with (\ref{interjumpiid}) with

1652: slope $b=\beta/2$.

1653: Inset: Correlation $C_{\mathrm{iid}}(k_{1},k_{2})$

1654: vs. $k_{2}-k_{1}$ for two fixed values of $k_{1}$

1655: to show correlations between jumps in the i.i.d. model.}

1656: \label{rspace}

1657: \end{center}

1658: \end{figure}

1659:

1660: \begin{figure}

1661: \begin{center}

1662: \psfig{figure=evol.ps,width=9cm,angle=270}

1663: \caption{Log-log plot for the ratios $P(T_{2})/P(T_{1})$ and

1664: $P(T_{3})/P(T_{2})$ of evolution time distributions

1665: for the i.i.d. model with $p(F)=e^{-F}$. The ratios decay as

1666: $1/T$ for large $T$, as the line with

1667: slope $-1$ indicates.}

1668: \label{evol}

1669: \end{center}

1670: \end{figure}

1671:

1672: \begin{figure}

1673: \begin{center}

1674: \psfig{figure=sum.ps,width=9cm,angle=270}

1675: \caption{Distribution

1676: $P_{\mathrm{iid}}(X,N)=P_{\mathrm{iid}}(t=X,N)+P_{\mathrm{iid}}(k=X,N)$ vs.

1677: $X$ for various distributions, illustrating the

1678: duality between the jump processes along the $k$- and $t$-axes. The

1679: solid line has slope $-1$. Inset: Semi-log plot for the inter-jump spacing

1680: $\Delta_{\mathrm{iid}}^{\prime}(j)$ along the time axis as a function of

1681: $j$ for the i.i.d. model. The slope $(1-\beta)/2$ is consistent with the

1682: conjecture (\ref{cnjctre}).}

1683: \label{sum}

1684: \end{center}

1685: \end{figure}

1686:

1687: \begin{figure}

1688: \begin{center}

1689: \psfig{figure=mftjump.ps,width=9cm,angle=270}

1690: \caption{Average number $\tilde {\cal{J}}$ of jumps

1691: vs. $N$ for the record time model defined by (\ref{mfmodel}). The solid line

1692: $0.63 \ln N-0.13$ is the best fit.}

1693: \label{mftjump}

1694: \end{center}

1695: \end{figure}

1696:

1697: %=============================================================================

1698: %=============================================================================

1699:

1700: \end{document}

1701: