0002:physics0002001/B1.tex

1: \documentstyle[12pt,amssymb,psfig]{article}

2: \setlength{\textwidth}{6.0in}

3: \setlength{\textheight}{8.0in}

4: \setlength{\oddsidemargin}{0.3in}

5: \renewcommand{\theequation}{\arabic{equation}}

6: \newcommand{\be}{\begin{eqnarray}}

7: \newcommand{\ee}{\end{eqnarray}}

8: \newcommand{\tierra}{{\sf tierra}}

9: \newcommand\ie {{\it i.e.\ }}

10: \newcommand\eg {{\it e.g.\  }}

11: \newcommand\etc{{\it etc.\  }}

12: \newcommand\cf {{\it cf.\   }}

13: \renewcommand{\simeq}{\stackrel\sim=}

14: \newcommand{\la}{\langle}

15: \newcommand{\ra}{\rangle}

16: %

17: \newcommand{\myhead}{\sf J. Chu and C. Adami}

18: \begin{document}

19: %\thepage

20: \pagestyle{myheadings}

21: \markboth{\myhead}{\myhead}

22: \begin{titlepage}

23: \vskip 7cm

24: \centerline{\Large\bf A Simple Explanation for}

25: \centerline{\Large\bf Taxon Abundance Patterns}

26: \vskip 0.5in

27: \centerline{Johan Chu and Christoph Adami}

28: \vskip 0.25in

29: \centerline{\it W.K.\ Kellogg Radiation Laboratory 106-38}

30: \centerline{\it California Institute of Technology, Pasadena, CA 91125}

31: \vskip 3.5 cm

32: \noindent{\it Classification:}\\

33: \noindent{\it  Biological Sciences (Evolution), Physical

34:   Sciences (Applied Mathematics)}

35: \vskip 2cm

36: \noindent Corresponding author:

37: \vskip 0.25in

38: Dr. Chris Adami

39:

40: E-mail: adami@krl.caltech.edu

41:

42: Phone: (+) 626 395-4256

43:

44: Fax: (+) 626 564-8708

45:

46:

47: \end{titlepage}

48: \setcounter{page}{2}

49: %\setlength{\baselineskip}{25pt}

50: \begin{abstract}

51:   For taxonomic levels higher than

52:   species, the abundance distributions of number of subtaxa per taxon

53:   tend to approximate power laws,  but often show strong deviationns

54:   from such a law.  Previously, these deviations were attributed

55:   to finite-time effects in a continuous time branching process at the

56:   generic level. Instead, we describe here a simple discrete branching

57:   process which generates the observed distributions and find that the

58:   distribution's deviation from power-law form is not caused by

59:   disequilibration, but rather that it is time-independent and

60:   determined by the evolutionary properties of the taxa of interest.

61:   Our model predicts---with no free parameters---the rank-frequency

62:   distribution of number of families in fossil marine animal orders

63:   obtained from the fossil record.  We find that near power-law

64:   distributions are statistically almost inevitable for taxa higher

65:   than species.  The branching model also sheds light on species

66:   abundance patterns, as well as on links between evolutionary

67:   processes, self-organized criticality and fractals.

68: \end{abstract}

69:

70: Taxonomic abundance distributions have been studied since the

71: pioneering work of Yule~\cite{YULE}, who proposed a continuous time

72: branching process model to explain the distributions at the generic

73: level, and found that they were power laws in the limit of

74: equilibrated populations. Deviations from the geometric law were

75: attributed to a finite-time effect, namely, to the fact that the

76: populations had not reached equilibrium. Much later,

77: Burlando~\cite{BURLANDO1,BURLANDO2} compiled data that appeared to

78: corroborate the geometric nature of the distributions, even though

79: clear violations of the law are visible in his data also. In this

80: paper, we present a model which is based on a discrete branching process

81: whose distributions are time-independent and where violations of the

82: geometric form reflect specific environmental conditions and pressures

83: that the assemblage under consideration was subject to during

84: evolution. As such, it holds the promise that an analysis of taxonomic

85: abundance distributions may reveal certain characteristics of

86: ecological niches long after its inhabitants have disappeared.

87:

88: The model described here is based on the simplest of branching

89: processes, known in the mathematical literature as the {\it

90:   Galton-Watson} process. Consider an assemblage of taxa at one

91: taxonomic level. This assemblage can be all the families under a

92: particular order, all the subspecies of a particular species, or any

93: other group of taxa at the same taxonomic level that can be assumed to

94: have suffered the same evolutionary pressures. We are interested in

95: the shape of the rank-frequency distribution of this assemblage and

96: the factors that influence it.

97:

98: We describe the model by explaining a specific example: the

99: distribution of the number of families within orders for a particular

100: phylum. The adaptation of this model to different levels in the

101: taxonomic hierarchy is obvious.  We can assume that the assemblage was

102: founded by one order in the phylum and that this order consisted of

103: one family which had one genus with one species.  We further assume

104: that new families in this order are created by means of mutation in

105: individuals of extant families.  This can be viewed as a process where

106: existing families can ``replicate'' and create new families of the

107: same order, which we term {\em daughters} of the initial family.  Of

108: course, relatively rarely, mutations may lead to the creation of a new

109: order, a new class, etc. We define a probability $p_i$ for a family to

110: have $i$ daughter families of the same order ({\em true daughters}).

111: Thus, a family will have no true daughters with probability $p_0$, one

112: true daughter with probability $p_1$, and so on.  For the sake of

113: simplicity, we initially assume that all families of this phylum share

114: the same $p_i$.  We show later that variance in $p_i$ among different

115: families does not significantly affect the results, in particular the

116: shape of the distribution. The branching process described above gives

117: rise to an abundance distribution of families within orders, and its

118: probability distribution can be obtained from the Lagrange expansion

119: of a nonlinear differential equation~\cite{HARRIS}.  Using a simple

120: iterative algorithm~\cite{JCCA2} in place of this Lagrange expansion

121: procedure, we can calculate rank-frequency curves for many different

122: sets of $p_i$.  It should be emphasized here that we are mostly

123: concerned with the shape of this curve for $n \lesssim 10^4$, and not

124: the asymptotic shape as $n \rightarrow \infty$, a limit that is not

125: reached in nature.

126:

127: For different sets of $p_i$, the theoretical curve

128: can either be close to a power-law, a power law with an exponential tail

129: or a purely exponential distribution (Fig.\ 1).

130: \begin{figure}[h]

131: \centerline{\psfig{figure=FIG1.PS,width=4in,angle=90}}

132: \caption{Predicted abundance pattern $P(n)$

133: (probability for a taxon to have $n$ subtaxa) of the

134: branching model with different values of $m$. The

135: curves have been individually rescaled. }

136: \end{figure}

137: We show here that there is a

138: global parameter that distinguishes among these cases.

139: Indeed, the mean number of true daughters, i.e., the mean number

140: of different families of the same order that each family gives

141: rise to in the example above,

142: \be

143: m = \sum_{i=0}^{\infty} i \cdot p_i

144: \ee

145: is a good indicator of the overall shape of the curve.

146: Universally, $m = 1$ leads to a power law for the abundance distribution.

147: The further

148: $m$ is away from $1$, the further the curve diverges from

149: a power-law and towards an exponential curve. The value

150: of $m$ for a particular assemblage can be estimated from

151: the fossil record also, allowing for a characterization

152: of the evolutionary process with no free parameters. Indeed,

153: if we assume that the number of families in this phylum existing at one

154: time is roughly constant, or varies slowly compared to the average rate

155: of family creation

156: (an assumption the fossil record seems to vindicate~\cite{RAUP}),

157: we find that $m$ can be related to the ratio $R_o/R_f$

158: of the rates of creation of orders and families---by

159: \be

160: m = (1+\frac{R_o}{R_f})^{-1}

161: \ee

162: to leading order~\cite{JCCA2}.

163:

164: In general, we can not expect all the families within an order to share

165: the same $m$. Interestingly, it turns out that

166: even if the $p_i$ and $m$ differ

167: widely between different families, the rank-frequency curve is

168: identical to that obtained by assuming a fixed $m$ equal to the average

169: of $m$ across the families (Fig.\ 2), i.e., the variance of the

170: $p_i$ across families appears to be completely immaterial to the shape

171: of the distribution---only the average  $\mu \equiv \langle m\rangle$ counts.

172: \begin{figure}[h]

173: \centerline{\psfig{figure=FIG2.PS,width=4in,angle=90}}

174: \caption{Abundance patterns obtained from two sets of

175: numerical simulations of the branching model, each with $\mu=\langle m\rangle =

176: 0.5$. $m$ was chosen from a uniform probability distribution of width

177: $1$ for the runs represented by crosses, and from a distribution of

178: width $0.01$ for those represented by circles.

179: Simulations where $m$ and $p_i$ are allowed to vary significantly

180: and those where they are severely constricted are impossible to distinguish

181: if they share the same $\langle m\rangle$.}

182: \end{figure}

183:

184:

185:

186:

187: \begin{figure}[h]

188: \centerline{\psfig{figure=FIG3.PS,width=4in,angle=90}}

189: \caption{The abundance distribution of fossil marine animal

190: orders~\cite{SEPKOSKI} (squares) and the predicted curve from the

191: branching model (solid line).  The fossil data has been binned above

192: $n=37$ with a variable bin size~\cite{JCCA2}.  The predicted curve was

193: generated using $R_o/R_f = N_o/N_f = 0.115$, where $N_o$ and $N_f$

194: were obtained directly from the fossil data.  The inset shows

195: Kolmogorov-Smirnov (K-S) significance levels $P$ obtained from

196: comparison of the fossil data to several predicted distributions with

197: different values of $R_o/R_f$, which shows that the data is best fit

198: by $R_o/R_f=0.135$. The arrow points to our prediction $R_o/R_f=0.115$

199: where $P=0.12$. A Monte Carlo analysis shows that for a sample size of

200: $626$ (as we have here), the predicted $R_o/R_f=0.115$ is within the

201: 66\% confidence interval of the best fit $R_o/R_f=0.135$ ($P=0.44$).

202: The K-S tests were done after removal of the first point, which

203: suffers from sampling uncertainties.}

204: \end{figure}

205: In Fig.\ 3, we show the abundance distribution of families within

206: orders for fossil marine animals~\cite{SEPKOSKI}, together with the

207: prediction of our branching model.  The theoretical curve was obtained

208: by assuming that the ratio $R_o/R_f$ is approximated by the ratio of

209: the total number of orders to the total number of families

210: \begin{equation}

211: \frac{R_o}{R_f} \simeq \frac{N_o}{N_f}  \label{HMM}

212: \end{equation}

213: and that both are very small compared to the rate of mutations.  The

214: prediction $\mu=0.9(16)$ obtained from the branching process model by

215: using (\ref{HMM}) as the sole parameter fits the observed data

216: remarkably well ($P=0.12$, Kolmogorov-Smirnov test, see inset in

217: Fig.~3).  Alternatively, we can use a best fit to determine the ratio

218: $R_o/R_f$ without resorting to (\ref{HMM}), yielding

219: $R_o/R_f=0.115(20)$ ($P=0.44$). Fitting abundance distributions to the

220: branching model thus allows us to determine a ratio of parameters

221: which reflect dynamics intrinsic to the taxon under consideration, and

222: the niche(s) it inhabits. Indeed, some taxa analyzed in

223: Refs.~\cite{BURLANDO1,BURLANDO2} are better fit with $0.5<\mu<0.75$,

224: pointing to conditions in which the rate of taxon formation was much

225: closer to the rate of subtaxon formation, indicating either a more

226: ``robust'' genome or richer and more diverse niches.

227:

228:

229: In general, however, Burlando's data~\cite{BURLANDO1,BURLANDO2}

230: suggest that a wide variety of taxonomic distributions are fit quite

231: well by power laws ($\mu = 1$).  This seems to imply that actual

232: taxonomic abundance patterns from the fossil record are characterized

233: by a relatively narrow range of $\mu$ near $1$. This is likely within

234: the model description advanced here.  It is obvious that $\mu$ can not

235: remain above $1$ for significant time scales as this would lead to an

236: infinite number of subtaxa for each taxon.  What about low $\mu$?  We

237: propose that low values of $\mu$ are not observed for large (and

238: therefore statistically important) taxon assemblages for the following

239: reasons.  If $\mu$ is very small, this implies either a small number

240: of total individuals for this assemblage, or a very low rate of

241: beneficial taxon-forming (or niche-filling) mutations. The former

242: might lead to this assemblage not being recognized at all in field

243: observations. Either case will lead to an assemblage with too few

244: taxons to be statistically tractable. Also, since such an assemblage

245: either contains a small number of individuals or is less suited for

246: further adaptation or both, it would seem to be susceptible to early

247: extinction.

248:

249: The branching model can---with appropriate care---also be applied to

250: species-abundance distributions, even though these are more

251: complicated than those for higher taxonomic orders for several

252: reasons. Among these are the effects of sexual reproduction and the

253: localized and variable effects of the environment and other species on

254: specific populations. Still, as the arguments for using a branching

255: process model essentially rely on mutations which may produce lines of

256: individuals that displace others, species-abundance distributions may

257: turn out {\em not} to be qualitatively as different from taxonomically

258: higher-level rank-frequency distributions as is usually expected.

259: \begin{figure}[h]

260: \centerline{\psfig{figure=FIG4.PS,width=4in,angle=90}}

261: \caption{The abundance distribution of fossil marine

262: animal orders in logarithmic abundance classes (the same data as Fig.\ 3).

263: The histogram shows the

264: number of orders in each abundance class (left scale), while the solid line

265: depicts the number of families in each abundance class (right scale).

266: Species rank-abundance distributions where the highest abundance class present

267: also has the highest number of individuals (as in these data)

268: are termed {\em canonical lognormal}~\cite{PRESTON2}.}

269: \end{figure}

270:

271: Historically, species abundance distributions have been characterized

272: using frequency histograms of the number of species in logarithmic

273: abundance classes.  For many taxonomic assemblages, this was found

274: to produce a humped distribution truncated on the left---a shape usually

275: dubbed {\em lognormal}~\cite{PRESTON1,PRESTON2,SUGIHARA}.

276: In fact, this distribution is

277: not incompatible with the power-law type distributions described

278: above. Indeed, plotting the fossil data of Fig.\ 3 in logarithmic

279: abundance classes produces a lognormal (Fig.\ 4).

280:

281: For species, $\mu$ is the mean number of children each individual of

282: the species has. (Of course, for sexual species, $\mu$ would be half

283: the mean number of children per individual.)  In the present case,

284: $\mu$ less than $1$ implies that extant species' populations {\em

285:   decrease} on average, while $\mu$ equal to $1$ implies that average

286: populations do not change.  An extant species' population can decline

287: due to the introduction of competitors and/or the decrease of the size

288: of the species' ecological niche.  Let us examine the former more

289: closely. If a competitor is introduced into a saturated niche, all

290: species currently occupying that niche would temporarily see a

291: decrease in their $m$ until a new equilibrium was obtained. If the new

292: species is significantly fitter than the previously existing species,

293: it may eliminate the others. If the new species is significantly less

294: fit, then it may be the one eliminated. If the competitors are about

295: as efficient as the species already present, then the outcome is less

296: certain. Indeed, it is analogous to a non-biased random walk with a

297: possibility of ruin.  The effects of introducing a single competitor

298: are transient. However, if new competitors are introduced more or less

299: periodically, then this would act to push $m$ lower for all species in

300: this niche and we would expect an abundance pattern closer to the

301: exponential curve as opposed to the power-law than otherwise expected.

302: We have examined this in simulations of populations where new

303: competitors were introduced into the population by means of neutral

304: mutations---mutations leading to new species of the same fitness as

305: extant species---and found that these are fit very well by the

306: branching model.  A higher rate of neutral mutations and thus of new

307: competitors leads to distributions closer to exponential.  We have

308: performed the same experiment in more sophisticated systems of digital

309: organisms (artificial life)~\cite{AVIDA,SANDA} and found the same

310: result~\cite{JCCA2}.

311:

312: If no new competitors are introduced but the size of the niche

313: is gradually reduced, we expect the same effect on $m$ and on the

314: abundance distributions. Whether it is possible to separate

315: the effects of these two mechanisms in ecological abundance patterns

316: obtained from field data is an open question.

317: An analysis of such data to examine these trends would certainly

318: be very interesting.

319:

320: So far, we have sidestepped the difference between historical and

321: ecological distributions.  For the fossil record, the historical

322: distribution we have modeled here should work well. For field

323: observations where only currently living groups are considered, the

324: nature of the death and extinction processes for each group will

325: affect the abundance pattern.  In our simulations and artificial-life

326: experiments, we have universally observed a strong correlation between

327: the shapes of historical and ecological distributions.  We believe

328: this correspondence will hold in natural distributions as well when

329: death rates are affected mainly by competition for resources.

330: The model's validity for different scenarios is an interesting question,

331: which could be answered by comparison with more taxonomical data.

332:

333: Our branching process model allows us to reexamine

334: the question of whether any type of special

335: dynamics---such as self-organized criticality~\cite{soc} (SOC)---is

336: at work in evolution~\cite{BakSneppen,Adami-soc}. While showing that the

337: statistics of taxon rank-frequency patterns in evolution are

338: closely related to the avalanche sizes in SOC sandpile models,

339: the present model clearly shows that

340: instead of a subsidiary relationship where evolutionary processes may

341: be self-organized critical, the power-law

342: behaviour of both evolutionary {\em and} sandpile

343: distributions can be understood in terms of the mechanics

344: of a Galton-Watson branching process~\cite{JCCA2,ZAPPERI}.

345: The mechanics of this branching process

346: are such that the branching trees are probabilistic

347: fractal constructs. However, the underlying stochastic process

348: responsible for the observed behaviour can be

349: explained simply in terms of a random walk~\cite{SPITZER}.

350: For evolution, the propensity for near power-law behaviour is found to

351: stem from a dynamical process in which $\mu\approx1$ is selected for and

352: highly more likely to be observed than other values, while the

353: ``self-tuning'' of the SOC models is seen to result from arbitrarily

354: enforcing conditions which would correspond to the limit

355: $R_o/R_f \rightarrow 0$ and therefore

356: $m \rightarrow 1$~\cite{JCCA2}.

357: \newpage

358:

359: \bibliographystyle{unsrt}

360: \begin{thebibliography}{99}

361: \bibitem{YULE}

362: Yule, G. U. (1924)

363: A mathematical theory of evolution.

364: {\it Proc. Roy. Soc. London Ser. B} {\bf 213}, 21-87.

365: \bibitem{BURLANDO1}

366: Burlando, B. (1990)

367: The fractal dimension of taxonomic systems.

368: {\it J. theor. Biol.} {\bf 146}, 99-114.

369: \bibitem{BURLANDO2}

370: Burlando, B. (1993)

371: The fractal geometry of evolution.

372: {\it J. theor. Biol.} {\bf 163}, 161-172.

373: \bibitem{HARRIS}

374: Harris, T. E. (1963) {\it The Theory of Branching Processes}.

375: (Springer, Berlin; Prentice-Hall, Englewood Cliffs, N.J.).

376: \bibitem{JCCA2}

377: Chu, J. \& Adami, C. (1999)

378: Critical and near-critical branching processes (submitted).

379: \bibitem{RAUP}

380: Raup, D. M. (1985)

381: Mathematical models of cladogenesis.

382: {\it Paleobiology}, {\bf 11}, 42-52.

383: \bibitem{SEPKOSKI}

384: Sepkoski, J. J. (1992)

385: {\it A Compendium of Fossil Marine Animal Families},

386: 2nd ed. (Milwaukee Public Museum; Milwaukee, WI; 1992)

387: with emendations by J. J. Sepkoski based largely on

388: {\it The Fossil Record 2}, Benton, M. J., ed. (Chapman \& Hall;

389: New York; 1993).

390: \bibitem{PRESTON1}

391: Preston, F. W. (1948)

392: The commonness, and rarity, of species.

393: {\it Ecology} {\bf 29}, 255-283.

394: \bibitem{PRESTON2}

395: Preston, F. W. (1962)

396: The canonical distribution of commonness and rarity.

397: {\it Ecology} {\bf 43}, 185-215, 410-432.

398: \bibitem{SUGIHARA}

399: Sugihara, G. (1980)

400: Minimal community structure: An explanation of species abundance patterns.

401: {\it Am. Nat.} {\bf 116}, 770-787.

402: \bibitem{AVIDA}

403: Adami, C. (1998) {\it Introduction to Artificial Life}.

404: (Telos, Springer-Verlag, New York)

405: \bibitem{SANDA}

406: Chu, J. \& Adami, C. (1997)

407: Propagation of information in populations of self-replicating code,

408: in {\it Artificial Life V: Proceedings of the Fifth International

409: Workshop on the Synthesis and Simulation of Living Systems},

410: Langton, C. G. and Shimohara, K. eds.,  p. 462-469

411: (MIT Press, Cambridge, MA).

412: \bibitem{soc}

413: Bak, P., Tang, C. \& Wiesenfeld, K. (1987)

414: Self-organized criticality---an explanation of 1/$f$ noise.

415: {\it Phys. Rev. Lett.} {\bf 59}, 381-384.

416: Self-organized criticality.

417: {\it Phys. Rev. A} {\bf 38}, 364-374 (1988).

418: \bibitem{BakSneppen}

419: Sneppen, K., Bak P., Flyvbjerg, H. \& Jensen, M. H. (1995)

420: Evolution as a self-organized critical phenomenon.

421: {\it Proc. Nat. Acad. Sci. USA} {\bf 92}, 5209-5213.

422: \bibitem{Adami-soc}

423: Adami, C. (1995)

424: Self-organized criticality in living systems.

425: {\it Phys. Lett. A} {\bf 203}, 29-32.

426: \bibitem{ZAPPERI}

427: Vespignani, A. \& Zapperi, S. (1998)

428: How self-organized criticality works: A unified mean-field picture.

429: {\it Phys. Rev. E} {\bf 57}, 6345-6362.

430: \bibitem{SPITZER}

431: Spitzer, F. (1964) {\it Principles of Random Walk}.

432: (Springer-Verlag, New York).

433: \end{thebibliography}

434: \vskip 0.25in

435:

436: \noindent {\bf Acknowledgments.} We would like to thank J. J. Sepkoski

437: for kindly sending us his amended data set of fossil marine animal families.

438: Access to the Intel Paragon XP/S was provided by the Center of

439: Advanced Computing Research at the California Institute of Technology.

440: This work was supported by a grant from the NSF.

441: \vskip 0.25cm

442:

443: \noindent Correspondence and requests for materials should be addressed to C.A.

444: (e-mail: adami@krl.caltech.edu).

445:

446:

447: \end{document}

448:

449:

450:

451:

452:

453: