physics0002001/B1.tex
1: \documentstyle[12pt,amssymb,psfig]{article}
2: \setlength{\textwidth}{6.0in}
3: \setlength{\textheight}{8.0in}
4: \setlength{\oddsidemargin}{0.3in}
5: \renewcommand{\theequation}{\arabic{equation}}
6: \newcommand{\be}{\begin{eqnarray}}
7: \newcommand{\ee}{\end{eqnarray}}
8: \newcommand{\tierra}{{\sf tierra}}
9: \newcommand\ie {{\it i.e.\ }}
10: \newcommand\eg {{\it e.g.\  }}
11: \newcommand\etc{{\it etc.\  }}
12: \newcommand\cf {{\it cf.\   }}
13: \renewcommand{\simeq}{\stackrel\sim=}
14: \newcommand{\la}{\langle}
15: \newcommand{\ra}{\rangle}
16: %
17: \newcommand{\myhead}{\sf J. Chu and C. Adami}
18: \begin{document}
19: %\thepage
20: \pagestyle{myheadings}
21: \markboth{\myhead}{\myhead}
22: \begin{titlepage}
23: \vskip 7cm
24: \centerline{\Large\bf A Simple Explanation for}
25: \centerline{\Large\bf Taxon Abundance Patterns}
26: \vskip 0.5in
27: \centerline{Johan Chu and Christoph Adami}
28: \vskip 0.25in
29: \centerline{\it W.K.\ Kellogg Radiation Laboratory 106-38}
30: \centerline{\it California Institute of Technology, Pasadena, CA 91125} 
31: \vskip 3.5 cm
32: \noindent{\it Classification:}\\
33: \noindent{\it  Biological Sciences (Evolution), Physical
34:   Sciences (Applied Mathematics)}
35: \vskip 2cm
36: \noindent Corresponding author:
37: \vskip 0.25in
38: Dr. Chris Adami
39: 
40: E-mail: adami@krl.caltech.edu
41: 
42: Phone: (+) 626 395-4256
43: 
44: Fax: (+) 626 564-8708
45: 
46: 
47: \end{titlepage}
48: \setcounter{page}{2}
49: %\setlength{\baselineskip}{25pt}
50: \begin{abstract}
51:   For taxonomic levels higher than
52:   species, the abundance distributions of number of subtaxa per taxon
53:   tend to approximate power laws,  but often show strong deviationns
54:   from such a law.  Previously, these deviations were attributed
55:   to finite-time effects in a continuous time branching process at the
56:   generic level. Instead, we describe here a simple discrete branching
57:   process which generates the observed distributions and find that the
58:   distribution's deviation from power-law form is not caused by
59:   disequilibration, but rather that it is time-independent and
60:   determined by the evolutionary properties of the taxa of interest.
61:   Our model predicts---with no free parameters---the rank-frequency
62:   distribution of number of families in fossil marine animal orders
63:   obtained from the fossil record.  We find that near power-law
64:   distributions are statistically almost inevitable for taxa higher
65:   than species.  The branching model also sheds light on species
66:   abundance patterns, as well as on links between evolutionary
67:   processes, self-organized criticality and fractals.
68: \end{abstract}
69: 
70: Taxonomic abundance distributions have been studied since the
71: pioneering work of Yule~\cite{YULE}, who proposed a continuous time
72: branching process model to explain the distributions at the generic
73: level, and found that they were power laws in the limit of
74: equilibrated populations. Deviations from the geometric law were
75: attributed to a finite-time effect, namely, to the fact that the
76: populations had not reached equilibrium. Much later,
77: Burlando~\cite{BURLANDO1,BURLANDO2} compiled data that appeared to
78: corroborate the geometric nature of the distributions, even though
79: clear violations of the law are visible in his data also. In this
80: paper, we present a model which is based on a discrete branching process 
81: whose distributions are time-independent and where violations of the
82: geometric form reflect specific environmental conditions and pressures
83: that the assemblage under consideration was subject to during
84: evolution. As such, it holds the promise that an analysis of taxonomic
85: abundance distributions may reveal certain characteristics of
86: ecological niches long after its inhabitants have disappeared.
87: 
88: The model described here is based on the simplest of branching
89: processes, known in the mathematical literature as the {\it
90:   Galton-Watson} process. Consider an assemblage of taxa at one
91: taxonomic level. This assemblage can be all the families under a
92: particular order, all the subspecies of a particular species, or any
93: other group of taxa at the same taxonomic level that can be assumed to
94: have suffered the same evolutionary pressures. We are interested in
95: the shape of the rank-frequency distribution of this assemblage and
96: the factors that influence it.
97: 
98: We describe the model by explaining a specific example: the
99: distribution of the number of families within orders for a particular
100: phylum. The adaptation of this model to different levels in the
101: taxonomic hierarchy is obvious.  We can assume that the assemblage was
102: founded by one order in the phylum and that this order consisted of
103: one family which had one genus with one species.  We further assume
104: that new families in this order are created by means of mutation in
105: individuals of extant families.  This can be viewed as a process where
106: existing families can ``replicate'' and create new families of the
107: same order, which we term {\em daughters} of the initial family.  Of
108: course, relatively rarely, mutations may lead to the creation of a new
109: order, a new class, etc. We define a probability $p_i$ for a family to
110: have $i$ daughter families of the same order ({\em true daughters}).
111: Thus, a family will have no true daughters with probability $p_0$, one
112: true daughter with probability $p_1$, and so on.  For the sake of
113: simplicity, we initially assume that all families of this phylum share
114: the same $p_i$.  We show later that variance in $p_i$ among different
115: families does not significantly affect the results, in particular the
116: shape of the distribution. The branching process described above gives
117: rise to an abundance distribution of families within orders, and its
118: probability distribution can be obtained from the Lagrange expansion
119: of a nonlinear differential equation~\cite{HARRIS}.  Using a simple
120: iterative algorithm~\cite{JCCA2} in place of this Lagrange expansion
121: procedure, we can calculate rank-frequency curves for many different
122: sets of $p_i$.  It should be emphasized here that we are mostly
123: concerned with the shape of this curve for $n \lesssim 10^4$, and not
124: the asymptotic shape as $n \rightarrow \infty$, a limit that is not
125: reached in nature.
126: 
127: For different sets of $p_i$, the theoretical curve
128: can either be close to a power-law, a power law with an exponential tail
129: or a purely exponential distribution (Fig.\ 1). 
130: \begin{figure}[h]
131: \centerline{\psfig{figure=FIG1.PS,width=4in,angle=90}}
132: \caption{Predicted abundance pattern $P(n)$
133: (probability for a taxon to have $n$ subtaxa) of the
134: branching model with different values of $m$. The
135: curves have been individually rescaled. }
136: \end{figure}
137: We show here that there is a
138: global parameter that distinguishes among these cases. 
139: Indeed, the mean number of true daughters, i.e., the mean number
140: of different families of the same order that each family gives 
141: rise to in the example above,  
142: \be
143: m = \sum_{i=0}^{\infty} i \cdot p_i
144: \ee
145: is a good indicator of the overall shape of the curve.
146: Universally, $m = 1$ leads to a power law for the abundance distribution.
147: The further 
148: $m$ is away from $1$, the further the curve diverges from
149: a power-law and towards an exponential curve. The value
150: of $m$ for a particular assemblage can be estimated from
151: the fossil record also, allowing for a characterization
152: of the evolutionary process with no free parameters. Indeed,  
153: if we assume that the number of families in this phylum existing at one
154: time is roughly constant, or varies slowly compared to the average rate
155: of family creation 
156: (an assumption the fossil record seems to vindicate~\cite{RAUP}), 
157: we find that $m$ can be related to the ratio $R_o/R_f$
158: of the rates of creation of orders and families---by
159: \be
160: m = (1+\frac{R_o}{R_f})^{-1}
161: \ee
162: to leading order~\cite{JCCA2}.
163: 
164: In general, we can not expect all the families within an order to share
165: the same $m$. Interestingly, it turns out that 
166: even if the $p_i$ and $m$ differ
167: widely between different families, the rank-frequency curve is 
168: identical to that obtained by assuming a fixed $m$ equal to the average
169: of $m$ across the families (Fig.\ 2), i.e., the variance of the
170: $p_i$ across families appears to be completely immaterial to the shape
171: of the distribution---only the average  $\mu \equiv \langle m\rangle$ counts.
172: \begin{figure}[h]
173: \centerline{\psfig{figure=FIG2.PS,width=4in,angle=90}}
174: \caption{Abundance patterns obtained from two sets of
175: numerical simulations of the branching model, each with $\mu=\langle m\rangle =
176: 0.5$. $m$ was chosen from a uniform probability distribution of width
177: $1$ for the runs represented by crosses, and from a distribution of 
178: width $0.01$ for those represented by circles.
179: Simulations where $m$ and $p_i$ are allowed to vary significantly
180: and those where they are severely constricted are impossible to distinguish
181: if they share the same $\langle m\rangle$.}
182: \end{figure}
183: 
184: 
185: 
186: 
187: \begin{figure}[h]
188: \centerline{\psfig{figure=FIG3.PS,width=4in,angle=90}}
189: \caption{The abundance distribution of fossil marine animal
190: orders~\cite{SEPKOSKI} (squares) and the predicted curve from the
191: branching model (solid line).  The fossil data has been binned above
192: $n=37$ with a variable bin size~\cite{JCCA2}.  The predicted curve was
193: generated using $R_o/R_f = N_o/N_f = 0.115$, where $N_o$ and $N_f$
194: were obtained directly from the fossil data.  The inset shows
195: Kolmogorov-Smirnov (K-S) significance levels $P$ obtained from
196: comparison of the fossil data to several predicted distributions with
197: different values of $R_o/R_f$, which shows that the data is best fit
198: by $R_o/R_f=0.135$. The arrow points to our prediction $R_o/R_f=0.115$
199: where $P=0.12$. A Monte Carlo analysis shows that for a sample size of
200: $626$ (as we have here), the predicted $R_o/R_f=0.115$ is within the
201: 66\% confidence interval of the best fit $R_o/R_f=0.135$ ($P=0.44$).
202: The K-S tests were done after removal of the first point, which
203: suffers from sampling uncertainties.}
204: \end{figure}
205: In Fig.\ 3, we show the abundance distribution of families within
206: orders for fossil marine animals~\cite{SEPKOSKI}, together with the
207: prediction of our branching model.  The theoretical curve was obtained
208: by assuming that the ratio $R_o/R_f$ is approximated by the ratio of
209: the total number of orders to the total number of families
210: \begin{equation}
211: \frac{R_o}{R_f} \simeq \frac{N_o}{N_f}  \label{HMM}
212: \end{equation}
213: and that both are very small compared to the rate of mutations.  The
214: prediction $\mu=0.9(16)$ obtained from the branching process model by
215: using (\ref{HMM}) as the sole parameter fits the observed data
216: remarkably well ($P=0.12$, Kolmogorov-Smirnov test, see inset in
217: Fig.~3).  Alternatively, we can use a best fit to determine the ratio
218: $R_o/R_f$ without resorting to (\ref{HMM}), yielding
219: $R_o/R_f=0.115(20)$ ($P=0.44$). Fitting abundance distributions to the
220: branching model thus allows us to determine a ratio of parameters
221: which reflect dynamics intrinsic to the taxon under consideration, and
222: the niche(s) it inhabits. Indeed, some taxa analyzed in
223: Refs.~\cite{BURLANDO1,BURLANDO2} are better fit with $0.5<\mu<0.75$,
224: pointing to conditions in which the rate of taxon formation was much
225: closer to the rate of subtaxon formation, indicating either a more
226: ``robust'' genome or richer and more diverse niches.
227: 
228: 
229: In general, however, Burlando's data~\cite{BURLANDO1,BURLANDO2}
230: suggest that a wide variety of taxonomic distributions are fit quite
231: well by power laws ($\mu = 1$).  This seems to imply that actual
232: taxonomic abundance patterns from the fossil record are characterized
233: by a relatively narrow range of $\mu$ near $1$. This is likely within
234: the model description advanced here.  It is obvious that $\mu$ can not
235: remain above $1$ for significant time scales as this would lead to an
236: infinite number of subtaxa for each taxon.  What about low $\mu$?  We
237: propose that low values of $\mu$ are not observed for large (and
238: therefore statistically important) taxon assemblages for the following
239: reasons.  If $\mu$ is very small, this implies either a small number
240: of total individuals for this assemblage, or a very low rate of
241: beneficial taxon-forming (or niche-filling) mutations. The former
242: might lead to this assemblage not being recognized at all in field
243: observations. Either case will lead to an assemblage with too few
244: taxons to be statistically tractable. Also, since such an assemblage
245: either contains a small number of individuals or is less suited for
246: further adaptation or both, it would seem to be susceptible to early
247: extinction.
248: 
249: The branching model can---with appropriate care---also be applied to
250: species-abundance distributions, even though these are more
251: complicated than those for higher taxonomic orders for several
252: reasons. Among these are the effects of sexual reproduction and the
253: localized and variable effects of the environment and other species on
254: specific populations. Still, as the arguments for using a branching
255: process model essentially rely on mutations which may produce lines of
256: individuals that displace others, species-abundance distributions may
257: turn out {\em not} to be qualitatively as different from taxonomically
258: higher-level rank-frequency distributions as is usually expected.
259: \begin{figure}[h]
260: \centerline{\psfig{figure=FIG4.PS,width=4in,angle=90}}
261: \caption{The abundance distribution of fossil marine
262: animal orders in logarithmic abundance classes (the same data as Fig.\ 3).
263: The histogram shows the 
264: number of orders in each abundance class (left scale), while the solid line 
265: depicts the number of families in each abundance class (right scale). 
266: Species rank-abundance distributions where the highest abundance class present
267: also has the highest number of individuals (as in these data)
268: are termed {\em canonical lognormal}~\cite{PRESTON2}.}
269: \end{figure}
270: 
271: Historically, species abundance distributions have been characterized 
272: using frequency histograms of the number of species in logarithmic 
273: abundance classes.  For many taxonomic assemblages, this was found 
274: to produce a humped distribution truncated on the left---a shape usually 
275: dubbed {\em lognormal}~\cite{PRESTON1,PRESTON2,SUGIHARA}.  
276: In fact, this distribution is
277: not incompatible with the power-law type distributions described
278: above. Indeed, plotting the fossil data of Fig.\ 3 in logarithmic
279: abundance classes produces a lognormal (Fig.\ 4). 
280: 
281: For species, $\mu$ is the mean number of children each individual of
282: the species has. (Of course, for sexual species, $\mu$ would be half
283: the mean number of children per individual.)  In the present case,
284: $\mu$ less than $1$ implies that extant species' populations {\em
285:   decrease} on average, while $\mu$ equal to $1$ implies that average
286: populations do not change.  An extant species' population can decline
287: due to the introduction of competitors and/or the decrease of the size
288: of the species' ecological niche.  Let us examine the former more
289: closely. If a competitor is introduced into a saturated niche, all
290: species currently occupying that niche would temporarily see a
291: decrease in their $m$ until a new equilibrium was obtained. If the new
292: species is significantly fitter than the previously existing species,
293: it may eliminate the others. If the new species is significantly less
294: fit, then it may be the one eliminated. If the competitors are about
295: as efficient as the species already present, then the outcome is less
296: certain. Indeed, it is analogous to a non-biased random walk with a
297: possibility of ruin.  The effects of introducing a single competitor
298: are transient. However, if new competitors are introduced more or less
299: periodically, then this would act to push $m$ lower for all species in
300: this niche and we would expect an abundance pattern closer to the
301: exponential curve as opposed to the power-law than otherwise expected.
302: We have examined this in simulations of populations where new
303: competitors were introduced into the population by means of neutral
304: mutations---mutations leading to new species of the same fitness as
305: extant species---and found that these are fit very well by the
306: branching model.  A higher rate of neutral mutations and thus of new
307: competitors leads to distributions closer to exponential.  We have
308: performed the same experiment in more sophisticated systems of digital
309: organisms (artificial life)~\cite{AVIDA,SANDA} and found the same
310: result~\cite{JCCA2}.
311: 
312: If no new competitors are introduced but the size of the niche
313: is gradually reduced, we expect the same effect on $m$ and on the
314: abundance distributions. Whether it is possible to separate
315: the effects of these two mechanisms in ecological abundance patterns
316: obtained from field data is an open question.
317: An analysis of such data to examine these trends would certainly 
318: be very interesting.
319: 
320: So far, we have sidestepped the difference between historical and
321: ecological distributions.  For the fossil record, the historical
322: distribution we have modeled here should work well. For field
323: observations where only currently living groups are considered, the
324: nature of the death and extinction processes for each group will
325: affect the abundance pattern.  In our simulations and artificial-life
326: experiments, we have universally observed a strong correlation between
327: the shapes of historical and ecological distributions.  We believe
328: this correspondence will hold in natural distributions as well when
329: death rates are affected mainly by competition for resources.
330: The model's validity for different scenarios is an interesting question,
331: which could be answered by comparison with more taxonomical data.
332: 
333: Our branching process model allows us to reexamine 
334: the question of whether any type of special
335: dynamics---such as self-organized criticality~\cite{soc} (SOC)---is 
336: at work in evolution~\cite{BakSneppen,Adami-soc}. While showing that the
337: statistics of taxon rank-frequency patterns in evolution are
338: closely related to the avalanche sizes in SOC sandpile models,
339: the present model clearly shows that
340: instead of a subsidiary relationship where evolutionary processes may
341: be self-organized critical, the power-law
342: behaviour of both evolutionary {\em and} sandpile
343: distributions can be understood in terms of the mechanics
344: of a Galton-Watson branching process~\cite{JCCA2,ZAPPERI}. 
345: The mechanics of this branching process 
346: are such that the branching trees are probabilistic
347: fractal constructs. However, the underlying stochastic process
348: responsible for the observed behaviour can be
349: explained simply in terms of a random walk~\cite{SPITZER}. 
350: For evolution, the propensity for near power-law behaviour is found to 
351: stem from a dynamical process in which $\mu\approx1$ is selected for and
352: highly more likely to be observed than other values, while the
353: ``self-tuning'' of the SOC models is seen to result from arbitrarily 
354: enforcing conditions which would correspond to the limit
355: $R_o/R_f \rightarrow 0$ and therefore
356: $m \rightarrow 1$~\cite{JCCA2}.
357: \newpage
358: 
359: \bibliographystyle{unsrt}
360: \begin{thebibliography}{99}
361: \bibitem{YULE}
362: Yule, G. U. (1924)
363: A mathematical theory of evolution.
364: {\it Proc. Roy. Soc. London Ser. B} {\bf 213}, 21-87.
365: \bibitem{BURLANDO1} 
366: Burlando, B. (1990)
367: The fractal dimension of taxonomic systems.
368: {\it J. theor. Biol.} {\bf 146}, 99-114.
369: \bibitem{BURLANDO2} 
370: Burlando, B. (1993)
371: The fractal geometry of evolution.
372: {\it J. theor. Biol.} {\bf 163}, 161-172.
373: \bibitem{HARRIS}
374: Harris, T. E. (1963) {\it The Theory of Branching Processes}.
375: (Springer, Berlin; Prentice-Hall, Englewood Cliffs, N.J.).
376: \bibitem{JCCA2} 
377: Chu, J. \& Adami, C. (1999)
378: Critical and near-critical branching processes (submitted).
379: \bibitem{RAUP}
380: Raup, D. M. (1985)
381: Mathematical models of cladogenesis.
382: {\it Paleobiology}, {\bf 11}, 42-52.
383: \bibitem{SEPKOSKI} 
384: Sepkoski, J. J. (1992)
385: {\it A Compendium of Fossil Marine Animal Families},
386: 2nd ed. (Milwaukee Public Museum; Milwaukee, WI; 1992)
387: with emendations by J. J. Sepkoski based largely on 
388: {\it The Fossil Record 2}, Benton, M. J., ed. (Chapman \& Hall;
389: New York; 1993).
390: \bibitem{PRESTON1}
391: Preston, F. W. (1948)
392: The commonness, and rarity, of species.
393: {\it Ecology} {\bf 29}, 255-283.
394: \bibitem{PRESTON2}
395: Preston, F. W. (1962)
396: The canonical distribution of commonness and rarity.
397: {\it Ecology} {\bf 43}, 185-215, 410-432.
398: \bibitem{SUGIHARA}
399: Sugihara, G. (1980)
400: Minimal community structure: An explanation of species abundance patterns.
401: {\it Am. Nat.} {\bf 116}, 770-787.
402: \bibitem{AVIDA} 
403: Adami, C. (1998) {\it Introduction to Artificial Life}. 
404: (Telos, Springer-Verlag, New York) 
405: \bibitem{SANDA}
406: Chu, J. \& Adami, C. (1997)
407: Propagation of information in populations of self-replicating code,
408: in {\it Artificial Life V: Proceedings of the Fifth International
409: Workshop on the Synthesis and Simulation of Living Systems},
410: Langton, C. G. and Shimohara, K. eds.,  p. 462-469
411: (MIT Press, Cambridge, MA).
412: \bibitem{soc}
413: Bak, P., Tang, C. \& Wiesenfeld, K. (1987)
414: Self-organized criticality---an explanation of 1/$f$ noise.
415: {\it Phys. Rev. Lett.} {\bf 59}, 381-384.
416: Self-organized criticality.
417: {\it Phys. Rev. A} {\bf 38}, 364-374 (1988).
418: \bibitem{BakSneppen}
419: Sneppen, K., Bak P., Flyvbjerg, H. \& Jensen, M. H. (1995) 
420: Evolution as a self-organized critical phenomenon. 
421: {\it Proc. Nat. Acad. Sci. USA} {\bf 92}, 5209-5213.
422: \bibitem{Adami-soc}
423: Adami, C. (1995)
424: Self-organized criticality in living systems. 
425: {\it Phys. Lett. A} {\bf 203}, 29-32.
426: \bibitem{ZAPPERI} 
427: Vespignani, A. \& Zapperi, S. (1998)
428: How self-organized criticality works: A unified mean-field picture.
429: {\it Phys. Rev. E} {\bf 57}, 6345-6362.
430: \bibitem{SPITZER} 
431: Spitzer, F. (1964) {\it Principles of Random Walk}.
432: (Springer-Verlag, New York).
433: \end{thebibliography}
434: \vskip 0.25in
435: 
436: \noindent {\bf Acknowledgments.} We would like to thank J. J. Sepkoski
437: for kindly sending us his amended data set of fossil marine animal families.
438: Access to the Intel Paragon XP/S was provided by the Center of
439: Advanced Computing Research at the California Institute of Technology.
440: This work was supported by a grant from the NSF.
441: \vskip 0.25cm
442: 
443: \noindent Correspondence and requests for materials should be addressed to C.A.
444: (e-mail: adami@krl.caltech.edu).
445: 
446: 
447: \end{document}
448: 
449: 
450: 
451: 
452: 
453: