1: \documentstyle[12pt,amssymb,psfig]{article}
2: \setlength{\textwidth}{6.0in}
3: \setlength{\textheight}{8.0in}
4: \setlength{\oddsidemargin}{0.3in}
5: \renewcommand{\theequation}{\arabic{equation}}
6: \newcommand{\be}{\begin{eqnarray}}
7: \newcommand{\ee}{\end{eqnarray}}
8: \newcommand{\tierra}{{\sf tierra}}
9: \newcommand\ie {{\it i.e.\ }}
10: \newcommand\eg {{\it e.g.\ }}
11: \newcommand\etc{{\it etc.\ }}
12: \newcommand\cf {{\it cf.\ }}
13: \renewcommand{\simeq}{\stackrel\sim=}
14: \newcommand{\la}{\langle}
15: \newcommand{\ra}{\rangle}
16: %
17: \newcommand{\myhead}{\sf J. Chu and C. Adami}
18: \begin{document}
19: %\thepage
20: \pagestyle{myheadings}
21: \markboth{\myhead}{\myhead}
22: \begin{titlepage}
23: \vskip 7cm
24: \centerline{\Large\bf A Simple Explanation for}
25: \centerline{\Large\bf Taxon Abundance Patterns}
26: \vskip 0.5in
27: \centerline{Johan Chu and Christoph Adami}
28: \vskip 0.25in
29: \centerline{\it W.K.\ Kellogg Radiation Laboratory 106-38}
30: \centerline{\it California Institute of Technology, Pasadena, CA 91125}
31: \vskip 3.5 cm
32: \noindent{\it Classification:}\\
33: \noindent{\it Biological Sciences (Evolution), Physical
34: Sciences (Applied Mathematics)}
35: \vskip 2cm
36: \noindent Corresponding author:
37: \vskip 0.25in
38: Dr. Chris Adami
39:
40: E-mail: adami@krl.caltech.edu
41:
42: Phone: (+) 626 395-4256
43:
44: Fax: (+) 626 564-8708
45:
46:
47: \end{titlepage}
48: \setcounter{page}{2}
49: %\setlength{\baselineskip}{25pt}
50: \begin{abstract}
51: For taxonomic levels higher than
52: species, the abundance distributions of number of subtaxa per taxon
53: tend to approximate power laws, but often show strong deviationns
54: from such a law. Previously, these deviations were attributed
55: to finite-time effects in a continuous time branching process at the
56: generic level. Instead, we describe here a simple discrete branching
57: process which generates the observed distributions and find that the
58: distribution's deviation from power-law form is not caused by
59: disequilibration, but rather that it is time-independent and
60: determined by the evolutionary properties of the taxa of interest.
61: Our model predicts---with no free parameters---the rank-frequency
62: distribution of number of families in fossil marine animal orders
63: obtained from the fossil record. We find that near power-law
64: distributions are statistically almost inevitable for taxa higher
65: than species. The branching model also sheds light on species
66: abundance patterns, as well as on links between evolutionary
67: processes, self-organized criticality and fractals.
68: \end{abstract}
69:
70: Taxonomic abundance distributions have been studied since the
71: pioneering work of Yule~\cite{YULE}, who proposed a continuous time
72: branching process model to explain the distributions at the generic
73: level, and found that they were power laws in the limit of
74: equilibrated populations. Deviations from the geometric law were
75: attributed to a finite-time effect, namely, to the fact that the
76: populations had not reached equilibrium. Much later,
77: Burlando~\cite{BURLANDO1,BURLANDO2} compiled data that appeared to
78: corroborate the geometric nature of the distributions, even though
79: clear violations of the law are visible in his data also. In this
80: paper, we present a model which is based on a discrete branching process
81: whose distributions are time-independent and where violations of the
82: geometric form reflect specific environmental conditions and pressures
83: that the assemblage under consideration was subject to during
84: evolution. As such, it holds the promise that an analysis of taxonomic
85: abundance distributions may reveal certain characteristics of
86: ecological niches long after its inhabitants have disappeared.
87:
88: The model described here is based on the simplest of branching
89: processes, known in the mathematical literature as the {\it
90: Galton-Watson} process. Consider an assemblage of taxa at one
91: taxonomic level. This assemblage can be all the families under a
92: particular order, all the subspecies of a particular species, or any
93: other group of taxa at the same taxonomic level that can be assumed to
94: have suffered the same evolutionary pressures. We are interested in
95: the shape of the rank-frequency distribution of this assemblage and
96: the factors that influence it.
97:
98: We describe the model by explaining a specific example: the
99: distribution of the number of families within orders for a particular
100: phylum. The adaptation of this model to different levels in the
101: taxonomic hierarchy is obvious. We can assume that the assemblage was
102: founded by one order in the phylum and that this order consisted of
103: one family which had one genus with one species. We further assume
104: that new families in this order are created by means of mutation in
105: individuals of extant families. This can be viewed as a process where
106: existing families can ``replicate'' and create new families of the
107: same order, which we term {\em daughters} of the initial family. Of
108: course, relatively rarely, mutations may lead to the creation of a new
109: order, a new class, etc. We define a probability $p_i$ for a family to
110: have $i$ daughter families of the same order ({\em true daughters}).
111: Thus, a family will have no true daughters with probability $p_0$, one
112: true daughter with probability $p_1$, and so on. For the sake of
113: simplicity, we initially assume that all families of this phylum share
114: the same $p_i$. We show later that variance in $p_i$ among different
115: families does not significantly affect the results, in particular the
116: shape of the distribution. The branching process described above gives
117: rise to an abundance distribution of families within orders, and its
118: probability distribution can be obtained from the Lagrange expansion
119: of a nonlinear differential equation~\cite{HARRIS}. Using a simple
120: iterative algorithm~\cite{JCCA2} in place of this Lagrange expansion
121: procedure, we can calculate rank-frequency curves for many different
122: sets of $p_i$. It should be emphasized here that we are mostly
123: concerned with the shape of this curve for $n \lesssim 10^4$, and not
124: the asymptotic shape as $n \rightarrow \infty$, a limit that is not
125: reached in nature.
126:
127: For different sets of $p_i$, the theoretical curve
128: can either be close to a power-law, a power law with an exponential tail
129: or a purely exponential distribution (Fig.\ 1).
130: \begin{figure}[h]
131: \centerline{\psfig{figure=FIG1.PS,width=4in,angle=90}}
132: \caption{Predicted abundance pattern $P(n)$
133: (probability for a taxon to have $n$ subtaxa) of the
134: branching model with different values of $m$. The
135: curves have been individually rescaled. }
136: \end{figure}
137: We show here that there is a
138: global parameter that distinguishes among these cases.
139: Indeed, the mean number of true daughters, i.e., the mean number
140: of different families of the same order that each family gives
141: rise to in the example above,
142: \be
143: m = \sum_{i=0}^{\infty} i \cdot p_i
144: \ee
145: is a good indicator of the overall shape of the curve.
146: Universally, $m = 1$ leads to a power law for the abundance distribution.
147: The further
148: $m$ is away from $1$, the further the curve diverges from
149: a power-law and towards an exponential curve. The value
150: of $m$ for a particular assemblage can be estimated from
151: the fossil record also, allowing for a characterization
152: of the evolutionary process with no free parameters. Indeed,
153: if we assume that the number of families in this phylum existing at one
154: time is roughly constant, or varies slowly compared to the average rate
155: of family creation
156: (an assumption the fossil record seems to vindicate~\cite{RAUP}),
157: we find that $m$ can be related to the ratio $R_o/R_f$
158: of the rates of creation of orders and families---by
159: \be
160: m = (1+\frac{R_o}{R_f})^{-1}
161: \ee
162: to leading order~\cite{JCCA2}.
163:
164: In general, we can not expect all the families within an order to share
165: the same $m$. Interestingly, it turns out that
166: even if the $p_i$ and $m$ differ
167: widely between different families, the rank-frequency curve is
168: identical to that obtained by assuming a fixed $m$ equal to the average
169: of $m$ across the families (Fig.\ 2), i.e., the variance of the
170: $p_i$ across families appears to be completely immaterial to the shape
171: of the distribution---only the average $\mu \equiv \langle m\rangle$ counts.
172: \begin{figure}[h]
173: \centerline{\psfig{figure=FIG2.PS,width=4in,angle=90}}
174: \caption{Abundance patterns obtained from two sets of
175: numerical simulations of the branching model, each with $\mu=\langle m\rangle =
176: 0.5$. $m$ was chosen from a uniform probability distribution of width
177: $1$ for the runs represented by crosses, and from a distribution of
178: width $0.01$ for those represented by circles.
179: Simulations where $m$ and $p_i$ are allowed to vary significantly
180: and those where they are severely constricted are impossible to distinguish
181: if they share the same $\langle m\rangle$.}
182: \end{figure}
183:
184:
185:
186:
187: \begin{figure}[h]
188: \centerline{\psfig{figure=FIG3.PS,width=4in,angle=90}}
189: \caption{The abundance distribution of fossil marine animal
190: orders~\cite{SEPKOSKI} (squares) and the predicted curve from the
191: branching model (solid line). The fossil data has been binned above
192: $n=37$ with a variable bin size~\cite{JCCA2}. The predicted curve was
193: generated using $R_o/R_f = N_o/N_f = 0.115$, where $N_o$ and $N_f$
194: were obtained directly from the fossil data. The inset shows
195: Kolmogorov-Smirnov (K-S) significance levels $P$ obtained from
196: comparison of the fossil data to several predicted distributions with
197: different values of $R_o/R_f$, which shows that the data is best fit
198: by $R_o/R_f=0.135$. The arrow points to our prediction $R_o/R_f=0.115$
199: where $P=0.12$. A Monte Carlo analysis shows that for a sample size of
200: $626$ (as we have here), the predicted $R_o/R_f=0.115$ is within the
201: 66\% confidence interval of the best fit $R_o/R_f=0.135$ ($P=0.44$).
202: The K-S tests were done after removal of the first point, which
203: suffers from sampling uncertainties.}
204: \end{figure}
205: In Fig.\ 3, we show the abundance distribution of families within
206: orders for fossil marine animals~\cite{SEPKOSKI}, together with the
207: prediction of our branching model. The theoretical curve was obtained
208: by assuming that the ratio $R_o/R_f$ is approximated by the ratio of
209: the total number of orders to the total number of families
210: \begin{equation}
211: \frac{R_o}{R_f} \simeq \frac{N_o}{N_f} \label{HMM}
212: \end{equation}
213: and that both are very small compared to the rate of mutations. The
214: prediction $\mu=0.9(16)$ obtained from the branching process model by
215: using (\ref{HMM}) as the sole parameter fits the observed data
216: remarkably well ($P=0.12$, Kolmogorov-Smirnov test, see inset in
217: Fig.~3). Alternatively, we can use a best fit to determine the ratio
218: $R_o/R_f$ without resorting to (\ref{HMM}), yielding
219: $R_o/R_f=0.115(20)$ ($P=0.44$). Fitting abundance distributions to the
220: branching model thus allows us to determine a ratio of parameters
221: which reflect dynamics intrinsic to the taxon under consideration, and
222: the niche(s) it inhabits. Indeed, some taxa analyzed in
223: Refs.~\cite{BURLANDO1,BURLANDO2} are better fit with $0.5<\mu<0.75$,
224: pointing to conditions in which the rate of taxon formation was much
225: closer to the rate of subtaxon formation, indicating either a more
226: ``robust'' genome or richer and more diverse niches.
227:
228:
229: In general, however, Burlando's data~\cite{BURLANDO1,BURLANDO2}
230: suggest that a wide variety of taxonomic distributions are fit quite
231: well by power laws ($\mu = 1$). This seems to imply that actual
232: taxonomic abundance patterns from the fossil record are characterized
233: by a relatively narrow range of $\mu$ near $1$. This is likely within
234: the model description advanced here. It is obvious that $\mu$ can not
235: remain above $1$ for significant time scales as this would lead to an
236: infinite number of subtaxa for each taxon. What about low $\mu$? We
237: propose that low values of $\mu$ are not observed for large (and
238: therefore statistically important) taxon assemblages for the following
239: reasons. If $\mu$ is very small, this implies either a small number
240: of total individuals for this assemblage, or a very low rate of
241: beneficial taxon-forming (or niche-filling) mutations. The former
242: might lead to this assemblage not being recognized at all in field
243: observations. Either case will lead to an assemblage with too few
244: taxons to be statistically tractable. Also, since such an assemblage
245: either contains a small number of individuals or is less suited for
246: further adaptation or both, it would seem to be susceptible to early
247: extinction.
248:
249: The branching model can---with appropriate care---also be applied to
250: species-abundance distributions, even though these are more
251: complicated than those for higher taxonomic orders for several
252: reasons. Among these are the effects of sexual reproduction and the
253: localized and variable effects of the environment and other species on
254: specific populations. Still, as the arguments for using a branching
255: process model essentially rely on mutations which may produce lines of
256: individuals that displace others, species-abundance distributions may
257: turn out {\em not} to be qualitatively as different from taxonomically
258: higher-level rank-frequency distributions as is usually expected.
259: \begin{figure}[h]
260: \centerline{\psfig{figure=FIG4.PS,width=4in,angle=90}}
261: \caption{The abundance distribution of fossil marine
262: animal orders in logarithmic abundance classes (the same data as Fig.\ 3).
263: The histogram shows the
264: number of orders in each abundance class (left scale), while the solid line
265: depicts the number of families in each abundance class (right scale).
266: Species rank-abundance distributions where the highest abundance class present
267: also has the highest number of individuals (as in these data)
268: are termed {\em canonical lognormal}~\cite{PRESTON2}.}
269: \end{figure}
270:
271: Historically, species abundance distributions have been characterized
272: using frequency histograms of the number of species in logarithmic
273: abundance classes. For many taxonomic assemblages, this was found
274: to produce a humped distribution truncated on the left---a shape usually
275: dubbed {\em lognormal}~\cite{PRESTON1,PRESTON2,SUGIHARA}.
276: In fact, this distribution is
277: not incompatible with the power-law type distributions described
278: above. Indeed, plotting the fossil data of Fig.\ 3 in logarithmic
279: abundance classes produces a lognormal (Fig.\ 4).
280:
281: For species, $\mu$ is the mean number of children each individual of
282: the species has. (Of course, for sexual species, $\mu$ would be half
283: the mean number of children per individual.) In the present case,
284: $\mu$ less than $1$ implies that extant species' populations {\em
285: decrease} on average, while $\mu$ equal to $1$ implies that average
286: populations do not change. An extant species' population can decline
287: due to the introduction of competitors and/or the decrease of the size
288: of the species' ecological niche. Let us examine the former more
289: closely. If a competitor is introduced into a saturated niche, all
290: species currently occupying that niche would temporarily see a
291: decrease in their $m$ until a new equilibrium was obtained. If the new
292: species is significantly fitter than the previously existing species,
293: it may eliminate the others. If the new species is significantly less
294: fit, then it may be the one eliminated. If the competitors are about
295: as efficient as the species already present, then the outcome is less
296: certain. Indeed, it is analogous to a non-biased random walk with a
297: possibility of ruin. The effects of introducing a single competitor
298: are transient. However, if new competitors are introduced more or less
299: periodically, then this would act to push $m$ lower for all species in
300: this niche and we would expect an abundance pattern closer to the
301: exponential curve as opposed to the power-law than otherwise expected.
302: We have examined this in simulations of populations where new
303: competitors were introduced into the population by means of neutral
304: mutations---mutations leading to new species of the same fitness as
305: extant species---and found that these are fit very well by the
306: branching model. A higher rate of neutral mutations and thus of new
307: competitors leads to distributions closer to exponential. We have
308: performed the same experiment in more sophisticated systems of digital
309: organisms (artificial life)~\cite{AVIDA,SANDA} and found the same
310: result~\cite{JCCA2}.
311:
312: If no new competitors are introduced but the size of the niche
313: is gradually reduced, we expect the same effect on $m$ and on the
314: abundance distributions. Whether it is possible to separate
315: the effects of these two mechanisms in ecological abundance patterns
316: obtained from field data is an open question.
317: An analysis of such data to examine these trends would certainly
318: be very interesting.
319:
320: So far, we have sidestepped the difference between historical and
321: ecological distributions. For the fossil record, the historical
322: distribution we have modeled here should work well. For field
323: observations where only currently living groups are considered, the
324: nature of the death and extinction processes for each group will
325: affect the abundance pattern. In our simulations and artificial-life
326: experiments, we have universally observed a strong correlation between
327: the shapes of historical and ecological distributions. We believe
328: this correspondence will hold in natural distributions as well when
329: death rates are affected mainly by competition for resources.
330: The model's validity for different scenarios is an interesting question,
331: which could be answered by comparison with more taxonomical data.
332:
333: Our branching process model allows us to reexamine
334: the question of whether any type of special
335: dynamics---such as self-organized criticality~\cite{soc} (SOC)---is
336: at work in evolution~\cite{BakSneppen,Adami-soc}. While showing that the
337: statistics of taxon rank-frequency patterns in evolution are
338: closely related to the avalanche sizes in SOC sandpile models,
339: the present model clearly shows that
340: instead of a subsidiary relationship where evolutionary processes may
341: be self-organized critical, the power-law
342: behaviour of both evolutionary {\em and} sandpile
343: distributions can be understood in terms of the mechanics
344: of a Galton-Watson branching process~\cite{JCCA2,ZAPPERI}.
345: The mechanics of this branching process
346: are such that the branching trees are probabilistic
347: fractal constructs. However, the underlying stochastic process
348: responsible for the observed behaviour can be
349: explained simply in terms of a random walk~\cite{SPITZER}.
350: For evolution, the propensity for near power-law behaviour is found to
351: stem from a dynamical process in which $\mu\approx1$ is selected for and
352: highly more likely to be observed than other values, while the
353: ``self-tuning'' of the SOC models is seen to result from arbitrarily
354: enforcing conditions which would correspond to the limit
355: $R_o/R_f \rightarrow 0$ and therefore
356: $m \rightarrow 1$~\cite{JCCA2}.
357: \newpage
358:
359: \bibliographystyle{unsrt}
360: \begin{thebibliography}{99}
361: \bibitem{YULE}
362: Yule, G. U. (1924)
363: A mathematical theory of evolution.
364: {\it Proc. Roy. Soc. London Ser. B} {\bf 213}, 21-87.
365: \bibitem{BURLANDO1}
366: Burlando, B. (1990)
367: The fractal dimension of taxonomic systems.
368: {\it J. theor. Biol.} {\bf 146}, 99-114.
369: \bibitem{BURLANDO2}
370: Burlando, B. (1993)
371: The fractal geometry of evolution.
372: {\it J. theor. Biol.} {\bf 163}, 161-172.
373: \bibitem{HARRIS}
374: Harris, T. E. (1963) {\it The Theory of Branching Processes}.
375: (Springer, Berlin; Prentice-Hall, Englewood Cliffs, N.J.).
376: \bibitem{JCCA2}
377: Chu, J. \& Adami, C. (1999)
378: Critical and near-critical branching processes (submitted).
379: \bibitem{RAUP}
380: Raup, D. M. (1985)
381: Mathematical models of cladogenesis.
382: {\it Paleobiology}, {\bf 11}, 42-52.
383: \bibitem{SEPKOSKI}
384: Sepkoski, J. J. (1992)
385: {\it A Compendium of Fossil Marine Animal Families},
386: 2nd ed. (Milwaukee Public Museum; Milwaukee, WI; 1992)
387: with emendations by J. J. Sepkoski based largely on
388: {\it The Fossil Record 2}, Benton, M. J., ed. (Chapman \& Hall;
389: New York; 1993).
390: \bibitem{PRESTON1}
391: Preston, F. W. (1948)
392: The commonness, and rarity, of species.
393: {\it Ecology} {\bf 29}, 255-283.
394: \bibitem{PRESTON2}
395: Preston, F. W. (1962)
396: The canonical distribution of commonness and rarity.
397: {\it Ecology} {\bf 43}, 185-215, 410-432.
398: \bibitem{SUGIHARA}
399: Sugihara, G. (1980)
400: Minimal community structure: An explanation of species abundance patterns.
401: {\it Am. Nat.} {\bf 116}, 770-787.
402: \bibitem{AVIDA}
403: Adami, C. (1998) {\it Introduction to Artificial Life}.
404: (Telos, Springer-Verlag, New York)
405: \bibitem{SANDA}
406: Chu, J. \& Adami, C. (1997)
407: Propagation of information in populations of self-replicating code,
408: in {\it Artificial Life V: Proceedings of the Fifth International
409: Workshop on the Synthesis and Simulation of Living Systems},
410: Langton, C. G. and Shimohara, K. eds., p. 462-469
411: (MIT Press, Cambridge, MA).
412: \bibitem{soc}
413: Bak, P., Tang, C. \& Wiesenfeld, K. (1987)
414: Self-organized criticality---an explanation of 1/$f$ noise.
415: {\it Phys. Rev. Lett.} {\bf 59}, 381-384.
416: Self-organized criticality.
417: {\it Phys. Rev. A} {\bf 38}, 364-374 (1988).
418: \bibitem{BakSneppen}
419: Sneppen, K., Bak P., Flyvbjerg, H. \& Jensen, M. H. (1995)
420: Evolution as a self-organized critical phenomenon.
421: {\it Proc. Nat. Acad. Sci. USA} {\bf 92}, 5209-5213.
422: \bibitem{Adami-soc}
423: Adami, C. (1995)
424: Self-organized criticality in living systems.
425: {\it Phys. Lett. A} {\bf 203}, 29-32.
426: \bibitem{ZAPPERI}
427: Vespignani, A. \& Zapperi, S. (1998)
428: How self-organized criticality works: A unified mean-field picture.
429: {\it Phys. Rev. E} {\bf 57}, 6345-6362.
430: \bibitem{SPITZER}
431: Spitzer, F. (1964) {\it Principles of Random Walk}.
432: (Springer-Verlag, New York).
433: \end{thebibliography}
434: \vskip 0.25in
435:
436: \noindent {\bf Acknowledgments.} We would like to thank J. J. Sepkoski
437: for kindly sending us his amended data set of fossil marine animal families.
438: Access to the Intel Paragon XP/S was provided by the Center of
439: Advanced Computing Research at the California Institute of Technology.
440: This work was supported by a grant from the NSF.
441: \vskip 0.25cm
442:
443: \noindent Correspondence and requests for materials should be addressed to C.A.
444: (e-mail: adami@krl.caltech.edu).
445:
446:
447: \end{document}
448:
449:
450:
451:
452:
453: