cond-mat0309173/ffm.tex
1: \documentclass[amssymb,amsmath,aps,secnumarabic,floatfix,rmp,showpacs]{revtex4}
2: \usepackage{color}
3: \usepackage{alltt}
4: \usepackage{psfig}
5: \usepackage{epsfig}
6: \usepackage{pstricks,pst-node,pst-tree,pst-grad,graphics}
7: \usepackage{dcolumn}
8: \usepackage{xspace}
9: \usepackage{natbib}
10: \bibliographystyle{apsrev}
11: 
12: \expandafter\ifx\csname package@font\endcsname\relax\else
13:  \expandafter\expandafter
14:  \expandafter\usepackage
15:  \expandafter\expandafter
16:  \expandafter{\csname package@font\endcsname}%
17: \fi
18: 
19: \begin{document}
20: \title{A new, efficient algorithm for the Forest Fire Model}
21: \author{Gunnar Pruessner and Henrik Jeldtoft Jensen}
22: \affiliation{
23: Department of Mathematics,
24: Imperial College London,
25: 180 Queen's Gate,
26: London SW7 2BZ,
27: UK\\
28: gunnar.pruessner@physics.org
29: and
30: h.jensen@ic.ac.uk
31: }
32: \date{\today}
33: \begin{abstract}
34: The Drossel-Schwabl Forest Fire Model is one of the best studied models
35:  of non-conservative self-organised criticality. However, using a new
36:  algorithm, which allows us to study the model on large statistical and
37:  spatial scales, it has been shown to lack simple scaling. We thereby
38:  show that the considered model is not critical. This
39:  paper presents the algorithm and its parallel implementation in detail,
40:  together with large scale numerical results for several
41:  observables. The algorithm can easily be adapted to related problems
42:  such as percolation.
43: \end{abstract}
44: \pacs{02.70.-c, 64.60.Ht, 05.65.+b, 02.50.-r}
45: \maketitle
46: \tableofcontents
47: 
48: \section{Introduction}
49: The assumption that SOC \citep{Jensen:98} is the correct framework to
50: describe and explain the ubiquity of power laws in nature, has been
51: greatly supported by the development of non-conservative models, because
52: natural processes are typically dissipative. Contrary to these models,
53: analytical work has suggested, that the deterministic part of the
54: dynamics must be conservative in order to obtain scale invariance
55: \citep{Hwa:1989,Grinstein:1990}. However, on a mean-field level, this is
56: not necessarily true \citep{VespignaniZapperi:1998}, which has been
57: exemplified in an exact solution of a model, that has a forest fire-like
58: driving \citep{JensenPruessner:2002a}. However, as a random neighbour
59: model, the latter lacks spatial extension.
60: 
61: The Drossel-Schwabl Forest Fire Model (DS-FFM)
62: \citep{DrosselSchwabl:1992} is one of the few spatially extended,
63: dissipative models, which supposedly exhibit SOC. Contrary to the
64: Olami-Feder-Christensen stick-slip model
65: \citep{OlamiFederChristensen:1992}, where criticality is still disputed
66: (for recent results see for example
67: \citep{LisePaczuski:2001,LisePaczuski:2001b,BoulterMiller:2003}),
68: for the DS-FFM the asymptotic divergence of several moments of its statistics, and
69: therefore the divergence of an upper cutoff can be shown rigorously.
70: Although this might be considered as a sign of criticality, it is far
71: from being a sufficient proof. In equilibrium thermodynamics ``criticality''
72: usually refers to a divergent correlation length
73: \citep{Binney:98,Stanley:71} in the two-point correlation function, which
74: is associated with a scale-invariant or power-law like behaviour. This
75: is how the term ``criticality'' is to be interpreted in SOC: Observables
76: need to be scale invariant\footnote{In a finite system the
77: distributions are not expected to be free of any scale, but to be
78: dominated asymptotically by one scale only.}, i.e. power laws in the
79: statistics. There are many examples of divergent moments without scale
80: invariance, such as the over critical branching process
81: \citep{Harris:1963} or over critical percolation
82: \citep{StaufferAharonyENG:1994}.
83: 
84: Thus, there is \emph{a priori} no reason to assume that the DS-FFM is
85: scale free. However, there are many numerical studies, which suggest so
86: \citep{DrosselSchwabl:1992,Christensen:1993,ClarDrosselSchwabl:1994}, one
87: of them, however, suggests the breakdown of simple scaling
88: \citep{Grassberger:1993}. Since an analytical approach is still lacking,
89: numerical methods are required to investigate this problem. In this
90: paper, we propose a new, very fast algorithm to simulate the DS-FFM with
91: large statistics and on large scales. The implementation of the
92: algorithm, has produced data of very high statistical quality. Some of
93: the results have been already published elsewhere
94: \citep{JensenPruessner:2002b}.
95: 
96: The structure of the paper is as follows:
97: The next section contains the definition of the model together with its
98: standard observables and their relations. Then the algorithm is
99: explained in detail. The section finishes with a detailed discussion on
100: the changes necessary to run the algorithm on parallel or distributed
101: machines. In the third section results for the two dimensional FFM are
102: presented and analysed. The paper concludes with a summary in the fourth
103: section.
104: 
105: \section{Method and Model}
106: This section is mainly technical: After defining the model, all relevant
107: details of the implementation are discussed. Apart from concepts such as the
108: change from a tree oriented algorithm to a cluster oriented algorithm,
109: concrete technical details are given, for example memory requirements
110: and methods for handling histograms. The section also contains a
111: description of the performance analysis of the implementation. A
112: parallelised version of the algorithm is introduced an discussed in the
113: last section.
114: 
115: \subsection{The Model} \label{sec:the_model}
116: A Forest Fire Model was first proposed by Bak, Chen and Tang
117: \citep{BakChenTang:1990} and changed later by Drossel and Schwabl
118: \citep{DrosselSchwabl:1992} to what is now known as \emph{the} Forest
119: Fire Model (or DS-FFM as we call it): On a $d$ dimensional lattice of
120: linear length $L$, each site has a variable associated with it, which
121: indicates the state of the site. This can either be ``occupied'' (by a tree),
122: ``burning'' (occupied by a fire) or ``empty'' (ash). In each time step,
123: all sites are updated in parallel according to the following rules: If a
124: site is occupied and at least one of its neighbours is burning, it
125: becomes burning in the next time step. If a site is occupied and none of
126: its neighbours is burning, it becomes burning with probability $f$. If a
127: site is empty, it becomes occupied with probability $p$. If a site is
128: burning it becomes empty in the next time-step with probability
129: one. As these probabilities become very small, they are better described
130: as rates in a Poisson like process. From
131: a simple analysis it is immediately clear
132: \citep{ClarDrosselSchwabl:1994}, that the model can become critical only in
133: the limit $p\to 0$ and $f\to 0$. In this limit, the burning process
134: becomes instantaneous compared to all other processes (see also
135: sec. \ref{sec:timescales}) and can be represented by the algorithm
136: shown in \aref{straight}.
137: 
138: \begin{figure}[ht]
139: \begin{alltt}
140: FOREVER \{
141: \emph{   /* Choose a site randomly */}
142:    rn = random site;
143: \emph{   /* If empty occupy with probability p */}
144:    IF (rn empty) THEN \{
145:       with probability p: rn=occupied;
146:    \} ELSE \{
147: \emph{   /* If occupied start a fire with probability f */}
148:       with probability f: 
149:          burn entire cluster connected to rn;
150:    \}
151: \}
152: \end{alltt}
153: \caption{\alabel{straight} The naive, basic algorithm of the DS-FFM}
154: \end{figure}
155: 
156: Compared to the instantaneous burning, both of the remaining processes
157: are slow. In section \ref{sec:timescales} it is shown that $p\gg f$ is
158: required \citep{ClarDrosselSchwabl:1994} for criticality, so that $f/p<1$
159: and the algorithm in \aref{straight} can be written as
160: \aref{straight_fast}, which is faster than the former, because
161: the number of random choices of a site is reduced, but equivalent
162: otherwise.
163: 
164: \begin{figure}[ht]
165: \begin{alltt}
166: FOREVER \{
167: \emph{   /* The following line is without effect */}
168:    with probability p: \{ 
169:       rn = randomly chosen site;
170:       IF (rn empty) THEN \{
171:          rn=occupied;
172:       \} ELSE \{
173:          with probability f/p: 
174:             burn entire cluster connected to rn;
175:       \}
176:    \}
177: \}
178: \end{alltt}
179: \caption{\alabel{straight_fast} A faster algorithm, doing essentially
180:  the same as the one shown in \aref{straight}.}
181: \end{figure}
182: 
183: The line \verb#with probability p# makes sure that the occupation
184: attempt still happens with probability $p$ and the burning attempt still
185: occurs with $p f/p = f$. Of course, the line is completely meaningless,
186: because the alternative, which occurs with probability $1-p$ is no
187: action at all. It therefore can be omitted. Then every randomly picked
188: empty site will become occupied, while burning happens with the reduced
189: probability $f/p$. 
190: 
191: This rescaling of probabilities is only possible in this form if the two
192: processes are independent, which is the case because a new occupation
193: can only occur for empty sites, while a burning attempt operates only on
194: occupied sites. If both processes were to operate on the same type of
195: site, a reduced probability $(1+f/p)^{-1}$ would decide between the two
196: alternatives.
197: 
198: The implementation shown in \aref{straight_fast} (without the
199: meaningless line) has been used for example in
200: \citep{HoneckerPeschel:1997,Henley:1993}. However, probably for
201: historical reasons, the model is usually
202: \citep{Grassberger:1993,ClarDrosselSchwabl:1994,SchenkETAL:2000}
203: implemented as shown in \aref{straight_traditional}, where trees
204: are grown in chunks of $p/f$ between two lightning attempts. Although
205: this means that sites become re-occupied only in chunks of $p/f$,
206: it turns out that apart from peaks in the histogram of the time series
207: of global densities of occupied sites \citep{SchenkETAL:2000}, the
208: statistics do not depend on these details. Only in order to avoid any
209: confusion, all data for this article have been produced by means of the
210: algorithm in \aref{straight_traditional}. Moreover this algorithm
211: is much more suitable for parallelisation (see section
212: \ref{sec:Parallelizing_the_code}).
213: 
214: \begin{figure}[ht]
215: \begin{alltt}
216: FOREVER \{
217: \emph{   /* This is just a loop to occupy the }
218: \emph{    * right number of sites */}
219:    REPEAT p/f TIMES \{
220:       rn = randomly chosen site;
221:       IF (rn empty) THEN \{rn=occupied;\}
222:    \}
223:    rn = randomly chosen site;
224:    IF (rn occupied) THEN \{
225:       burn entire cluster connected to rn;
226:    \}
227: \}
228: \end{alltt}
229: \caption{\alabel{straight_traditional} The traditional implementation.}
230: \end{figure}
231: 
232: 
233: \subsection{Statistical Quantities} \label{sec:statistical_quantities}
234: The objects of interest in the DS-FFM are clusters formed by occupied
235: sites: Two trees belong to the same cluster, if there exists a path
236: between them along nearest neighbouring, occupied sites. The cluster in
237: the DS-FFM correspond to avalanches in sandpile-like models
238: \citep{Jensen:98}. The cluster, which is burnt at each burning step can
239: be examined more closely, so that various geometrical properties can be
240: determined either as averages (and higher moments) or as entire
241: distribution: Mass (in the following this term is used synonymously to
242: size), diameter, time to burn it etc. The last property is better
243: expressed as the maximum length for all paths parallel to the axes and
244: fully within the given cluster, connecting the initially burnt tree and
245: each tree within the same cluster. It is the maximum number of nearest
246: neighbour moves one has to make to reach all sites in the same cluster,
247: in this sense a ``Manhattan distance''
248: \citep{CormenLeisersonRivest_Manhattan:1990}. As trees catch fire due to
249: nearest neighbours only, this maximum distance is the total burning time
250: of the entire cluster. In the definition above, the ``time to burn''
251: $\manh$ becomes a purely geometrical property of the cluster and
252: therefore independent from the actual implementation
253: (see sec.~\ref{sec:burning_procedure}) of the burning procedure.
254: 
255: \subsubsection{Cluster size distribution} \label{sec:clusterdistribution}
256: The most prominent property of the model, however, is the size
257: distribution of the clusters, $\dns$, which is the single-site
258: normalised number density of clusters of mass $s$, i.e. the number of
259: clusters of size $s$ per unit volume. The average cluster size, i.e. the
260: average size of a cluster a randomly chosen occupied site belongs to, is
261: correspondingly defined as
262: \begin{equation}
263:  \aves{s} = \frac{\sum_{s} s^2 \dns}{\sum_{s} s \dns} \quad .
264: \elabel{ave_s}
265: \end{equation}
266: As indicated by the bar, $\dns$ denotes the \emph{expected}
267: distribution, i.e. something to be \emph{estimated} from the
268: observables. On average, the probability that a randomly chosen site
269: belongs to a cluster of size $s$ is then $s \dns$. If $n_t(s)$ denotes
270: the cluster size distribution of the configuration at time $t$ (see
271: below), then one expects
272: \begin{equation}
273:  \ave{n_t(s)} = \dns \quad .
274: \end{equation}
275: where $\ave{}$ denotes the ensemble average (as opposed to
276: $\aves{\ }$, which denotes the average over $s \dnst$). Assuming
277: ergodicity, one has
278: \begin{equation}
279: \lim_{T\to\infty} \frac{1}{T} \sum_{t=1}^T A_t \to \ave{A}
280: \end{equation}
281: for an arbitrary quantity $A_t$ measured at each step $t$ of the
282: simulation. The limit exists for all bound observables $A_t$. 
283: 
284: Regarding the time $t$, it is worth noting that a step in the simulation
285: is considered completed, i.e. $t \to t+1$ if the randomly chosen site
286: for the lightning attempt was occupied, i.e. the attempt was successful,
287: so that $T$ is the number of burnt clusters. For sufficiently large
288: systems, the changes of the system due to growing or lightning are
289: almost negligible, and so are the differences between averages taken
290: over all lightning attempts or all \emph{successful} lightning
291: attempts. Also, the distributions found directly before and directly
292: after burning tend to the same expectation value for sufficiently large
293: systems, see sec.~\ref{sec:finite_size_scaling}. It is noted only for
294: completeness, that in this paper the cluster size distribution $n_t(s)$
295: has been measured directly \emph{after} the burning procedure. Therefore
296: $n_t(s)$ does not include the cluster burnt at time step $t$, just like
297: $n_{t+1}(s)$ does not in an implementation, where the distribution is
298: measured \emph{before} burning.
299: 
300: Introducing
301: \begin{equation} \elabel{def_rho}
302:  \rhobar = \sum_{s=1} s \dns
303: \end{equation}
304: as average density of occupied sites, the expected distribution of
305: burnt clusters is $s \dns/\rhobar$. To see this, $\PCB_t(s)$ is
306: introduced, denoting the distribution of clusters burnt in the $t$th
307: step of the simulation. This distribution contains only one non-zero
308: value for each $t$, namely $\PCB_t(s)=1$ for the size $s$ of the cluster
309: burnt at time $t$, and $\PCB_t(s)=0$ for all other $s$. Therefore
310: \begin{equation}
311:  \sum_{s=1}^N \PCB_t(s) = 1
312: \end{equation}
313: where $N$ is the number of sites in the system, $N=L^d$, which is also
314: the maximum mass of a cluster.  Since the site where the fire starts is
315: picked randomly, the cluster burnt in time step $t+1$ is drawn randomly
316: from the distribution $n_t(s)$ with a probability proportional to the
317: mass of the cluster. The normalisation of the distribution $s \dns$ is
318: given by \eref{def_rho}, so that for $t$ large enough, the effect of
319: the initial condition can be neglected,
320: \begin{equation} \elabel{converge_pcb}
321:  \ave{\PCB_t(s)} = s \dns/\rhobar \ .
322: \end{equation}
323: 
324: 
325: 
326: In the stationary state the average number of trees, $\rhobar$ is
327: related to $\aves{s}$ by \citep{ClarDrosselSchwabl:1994}
328: \begin{equation} \elabel{aves_averho}
329:  \aves{s} =\frac{1-\rhobar}{\theta \rhobar} \ .
330: \end{equation}
331: This equation, as well as \eref{converge_pcb}, is strictly only exact if the density of occupied
332: sites is constant over the course of the growing phase. For very large
333: system sizes \eref{aves_averho} holds almost perfectly, as shown in
334: Tab.~\ref{tab:absolute_results}; however, note the remarks in Sec.~\ref{sec:finite_size_scaling}.
335: 
336: For a coherent picture $\PCA_t(s)$ is introduced, which is the histogram
337: of \emph{all} clusters, i.e. $\sum_s \PCA_t(s)$ is the number of clusters
338: in the system at time $t$. According to the definition of $\dns$ it is
339: \begin{equation} \elabel{converge_pca}
340:  \ave{\PCA_t(s)} = N \dns \quad ,
341: \end{equation}
342: and correspondingly
343: \begin{equation}
344:  \rho_t = \frac{1}{N} \sum_s s \PCA_t(s)
345: \end{equation}
346: with $\ave{\rho_t}=\rhobar$. Since \eref{converge_pcb} and
347: \eref{converge_pca} differ on the RHS only by constants rather than by
348: random variables, both distribution, $\PCB_t(s)$ and $\PCA_t(s)$, are
349: estimators of the expected distribution $\dns$. Clearly, the burnt
350: cluster distribution $\PCB_t(s)$ is much sparser than than
351: $\PCA_t(s)$ and the estimator for $\dns$ derived from this quantity, is
352: therefore expected to have a significant larger standard deviation. On
353: the other hand, its autocorrelation time is expected to be considerably
354: smaller than that of $\PCA_t(s)$, because on average only $p/f+1$
355: entries ($\rhobar p/f$ sites are occupied in each ``growing loop'', which
356: is repeated on average $1/\rhobar$ times) of the latter are changed between
357: two subsequent measurements, corresponding to the number of newly
358: occupied sites plus the cluster which is burnt down. So, $\PCA_t(s)$
359: provides a much larger sample size, but is also expected to be much more
360: correlated. In order to judge, whether it is wise to spend CPU time on
361: calculating the full $\PCA_t(s)$ rather than only $\PCB_t(s)$, as it was
362: done in the past \citep{ClarDrosselSchwabl:1994}, these competing effects
363: need to be considered, by calculating the estimate for the standard
364: deviation of the estimator of $\dns$ from both observables, which is
365: discussed in detail in section~\ref{sec:std_details}.
366: 
367: \subsubsection{Timescales} \label{sec:timescales}
368: In order to obtain critical behaviour in the FFM, a double separation of
369: time scales is required \citep{ClarDrosselSchwabl:1996}
370: \begin{equation} \elabel{double_separation}
371:  f \ll p \ll \left(\frac{f}{p}\right)^{\nu'} \quad ,
372: \end{equation}
373: with some positive exponent $\nu'$.
374: The left relation, $f \ll p$, entails $f/p \to 0$ and therefore
375: \eref{double_separation} entails $p\to 0$ and $f\to 0$. This is also
376: the case for
377: \begin{equation} \elabel{double_separation_sloppy}
378:  f \ll p \ll 1 \quad ,
379: \end{equation}
380: and therefore leads to the same prescription to drive the system,
381: however \eref{double_separation} entails
382: \eref{double_separation_sloppy} but not vice versa. This can be seen
383: by noting that \eref{double_separation} entails the non-trivial
384: relation $p^{1+1/\nu'} \ll f \ll p$. Some authors, however, just state
385: \eref{double_separation_sloppy}
386: \citep{Grassberger:1993,VespignaniZapperi:1998}.  The three scales
387: involved are due to three different processes and their corresponding
388: rates: \\
389: \begin{enumerate}
390: \item The timescale on which the burning happens, the typical time of
391:       which is handwavingly estimated as the average number of sites in
392:       a burnt cluster, $\aves{s} \propto p/f$. A more appropriate
393:       assumption is that the typical burning time scales like a power of
394:       the average cluster size \citep{ClarDrosselSchwabl:1996}. This
395:       should be distinguished from the scaling of the \emph{average}
396:       time it takes to burn a cluster, because the \emph{typical} time
397:       represents the chracteristic scale of the burning time
398:       distribution, which might be very different from its average.\\
399: \item The timescale of the growing, which is $1/p$. \\ 
400: \item The timescale of the lightning, $1/f$. \\ 
401: \end{enumerate}
402: Burning must be fast compared to growing, so that clusters are burnt
403: down, before new trees grow on it edges \citep{ClarDrosselSchwabl:1996},
404: i.e. $(p/f)^{\nu'} \ll 1/p$ or $(f/p)^{\nu'} \gg p$. 
405: In order to obtain divergent cluster
406: sizes, growing must be much faster than lightning, i.e. $p \gg f$. Thus,
407: the double separation reads as stated in \eref{double_separation}.
408: By making the burning instantaneous compared to all other processes, the
409: dynamics effectively loses one timescale. In this case, the rates $f$ and $p$,
410: measured on this microscopic timescale, vanish, i.e. $f=0$ and
411: $p=0$, so that the right relation of \eref{double_separation} is
412: perfectly met, provided that $p/f$ does not vanish. However, the ratio $f/p$ remains finite, and $f \ll p$ is
413: still to be fulfilled. A finite $f/p$ means that one rate provides a scale
414: for the other. Measuring the rates on the macroscopic timescale, defined
415: by the sequence of burning attempts, $f$ becomes $1$ in these new
416: unities, and $p$ becomes $p/f \equiv \theta^{-1}$. The notation
417: $\theta = f/p$ corresponds to \citep{VespignaniZapperi:1998}, which is,
418: unfortunately, the inverse of $\theta$ used in
419: \citep{Grassberger:1993}. \Eref{double_separation} then means
420: $\theta \to 0$. 
421: At first sight, this result seems paradoxical, since $\theta=0$
422: is incompatible with instantenous burning's compliance with $p\ll
423: \theta^{\nu'}$. However, this problem does not appear in the
424: \emph{limit} $\theta\to 0$.
425: In a finite system, one cannot make $\theta$ arbitrarily
426: small, as the system will asymptotically oscillate between the two states
427: of being completely filled and completely empty. On the other hand, for
428: fixed $\theta$ and sufficiently large system sizes, a further increase
429: in system size will leave the main observables, such as $\rho_t$ and
430: $\PCA$ (see Section~\ref{sec:clusterdistribution}), essentially
431: unchanged. These asymptotic values, namely the observables at a given
432: $\theta$ in the thermodynamic limit, are to be measured.
433: 
434: \subsubsection{Scaling of the cluster size distribution} \label{sec:scaling}
435: Assuming that finite size effects do not play any r\^ole, i.e. for $\theta$
436: not too small, the ansatz 
437: \begin{equation} \elabel{def_tau}
438:  \dnst = s^{-\tau} \GC(s/\Scutoff(\theta))
439: \end{equation}
440: as obtained in percolation \citep{StaufferAharonyENG:1994} is reasonable
441: for $s$ larger than a fixed lower cutoff. In the following, the
442: additional parameter $\theta$ in $\dnst$ is omitted, whenever
443: possible. The quantity $\Scutoff(\theta)$ is the upper cutoff and
444: supposed to incorporate all $\theta$ dependence of the distribution. It
445: can be shown easily \citep{ClarDrosselSchwabl:1994} that the second
446: moment of $\dnst$ (see \eref{ave_s}) diverges in the limit $\theta \to
447: 0$ and $L \to \infty$, so that $\Scutoff$ must diverge with $\theta \to
448: 0$.  Here, $\GC(x)$ plays the r\^ole of a cutoff function, so that
449: $\lim_{x \to \infty} \GC(x)=0$ and for large $x$ falls off faster than any power,
450: because all moments of $\dnst$
451: are finite in a finite system. For finite $x$, $\GC(x)$ can show any
452: structure and does not have to be constant. However, assuming
453: $\lim_{\Scutoff \to \infty }\dnst$ finite, $\GC(s/\Scutoff)$ can be
454: regarded as constant in $s$ for sufficiently large $\Scutoff$, so that $\dnst$
455: behaves like a power law, $s^{-\tau}$, for certain $s$. However, \emph{a
456: priori} it is completely unknown, whether $\Scutoff$ is large enough in
457: that sense and the \emph{only} way to determine $\tau$ directly from
458: $\dnst$ is via a data
459: collapse.  It is already known that ``simple scaling'' \eref{def_tau}
460: does not apply in the presence of finite size effects
461: \citep{SchenkETAL:2000}.
462: 
463: The assumption \eref{def_tau} states that the FFM is scale-free in the
464: limit $\Scutoff(\theta) \to \infty$ and \emph{defines} the exponent
465: $\tau$ which characterises the scale invariance. One cannot stress
466: enough, that with the breakdown of \eref{def_tau}, the proposed exponent
467: is undefined, unless a new scaling behaviour is proposed. It has been
468: pointed out that \eref{def_tau} certainly contains corrections
469: \citep{Pastor-SatorrasVespignani:2000}. This asymptotic character of the
470: universal scaling function is well known \citep{Wegner:72} from
471: equilibrium critical phenomena.
472: 
473: While Grassberger concludes that the ansatz \eref{def_tau} ``cannot
474: be correct'' \citep{Grassberger:1993}, this is rejected in
475: \citep{SchenkETAL:2000}. However, the latter authors do not actually investigate
476: $\GC(x)$ and simply plot their estimate of $s \dnst$
477: vs. $s/\Scutoff(\theta)$. In the result section it is shown that there
478: is no reason to believe that \eref{def_tau} could hold in any finite
479: system.
480: 
481: \subsubsection{Other distributions}
482: \label{sec:other_dists}
483: The exponent $\tau$ as defined in \eref{def_tau} can be related to
484: exponents of other assumed power laws. To this end, the distribution
485: $\PSF(s, \manh ; \theta)$ is introduced, which is the joint
486: probability density function (PDF), for a cluster burnt to be of mass
487: $s$ and burning time (see sec. \ref{sec:statistical_quantities})
488: $\manh$ at given $\theta$. Then it is possible to define conditional expectation values as
489: \citep{ChristensenFogedbyJensen:1991}.
490: \begin{eqnarray}
491:  {\mathsf E}(s | \manh ; \theta) & = & \sum_{s'} s' \PSF(s', \manh ; \theta) \\
492:  {\mathsf E}(\manh | s ; \theta) & = & \sum_{\manh'} \manh' \PSF(s, \manh' ; \theta) \quad .
493: \end{eqnarray}
494: Moreover it is clear that $\dnst$ is just a marginal distribution, i.e.
495: \begin{equation}
496:  s \dnst =  \sum_{\manh'} \PSF(s, \manh' ; \theta) \equiv  \PSF_s(s ; \theta) \quad .
497: \end{equation}
498: In the assumed absence of any scale, it is reasonable to define for the
499: distribution of $\manh$ similar to \eref{def_tau} 
500: \begin{equation} \label{def:expo_b}
501:  \PSF_{\manh}(\manh ; \theta) = \manh^{-b} \GC_{\manh}(\manh/\manh_0 (\theta))
502: \end{equation}
503: and for the relation between ${\mathsf E}(s | \manh)$ and $\manh$:
504: \begin{equation} \elabel{Esmanh}
505:  {\mathsf E}(s | \manh) \propto \manh^{\mu'}
506: \end{equation}
507: To avoid confusion, it is important to keep in mind that the absence of
508: scales is not a physical or mathematical necessity: The system could as
509: well ``self-organise'' to any other, sufficiently broad distribution,
510: which could have an intrinsic, finite scale, i.e. a natural constant
511: characterising the features of the distribution. This looks much less
512: surprising considering the fact that standard models of critical
513: phenomena \citep{Stanley:71} like the Ising model, possess such a scale
514: everywhere apart from the critical point.
515: 
516: An additional assumption is necessary in order to produce a scaling
517: relation:
518: \begin{equation} \elabel{asumption_peaked}
519:  \PSF_{\manh}(\manh ; \theta) d\manh = \PSF_s( {\mathsf E}(s | \manh)  ; \theta) d({\mathsf E}(s | \manh ; \theta)) \quad
520: \end{equation}
521: where $\PSF_{\manh}$ and $\PSF_s$ denote the marginal distributions of
522: $\PSF(s, \manh ; \theta)$, which leads --- assuming sufficiently large
523: $\Scutoff$ and $\manh_0$ --- to
524: \begin{equation} \elabel{scaling_relation}
525:  b = 1 + \mu' (\tau - 2)
526: \end{equation}
527: using $\PSF_s=s\dnst$ and \eref{def_tau}.  \Eref{asumption_peaked} is
528: based on the idea that a cluster requiring burning time $\manh$ is as
529: likely to occur as a cluster of the size corresponding to the average taken
530: conditional to the burning time $\manh$. If the distribution
531: $\PSF(s, \manh ; \theta)$ is very narrow, such that ${\mathsf E}(s | \manh)$ is
532: virtually the only value of $s$ with non-vanishing
533: probability\footnote{The extreme case would be $\PSF(s, \manh ; \theta)
534: = \delta(s-f(\manh)) g(\manh)$ with a monotonic function $f(\manh)$
535: representing the conditional average.}, this
536: condition is met. However, the distribution can have any shape and still
537: obey the assumption, as illustrated in \Fref{peaked_dist}.
538: 
539: \begin{figure}[th]
540: \begin{center}
541: \input{peaked_dist}
542: \end{center}
543: \caption{\flabel{peaked_dist} A schematic joint PDF $\PSF(s, \manh';
544: \theta )$. The gray shading is used to indicate the density and the
545:  straight lines indicate roughly the limits of the distribution. While a
546:  narrower distribution would most easily obey \eref{asumption_peaked},
547:  it does not necessarily have to be sharply peaked. In this example the
548:  weighted areas of the horizontal and the vertical stripes might be the
549:  same. They cross at the conditional averages.}
550: \end{figure}
551: 
552: Scaling relation \eref{scaling_relation} can only be derived via
553: \eref{asumption_peaked}, which cannot be mathematically correct, as
554: $\PSF_s$ is actually only defined for integer arguments, while in general
555: ${\mathsf E}(s | \manh)$ is not integer valued. However, the
556: scaling relation might hold in some limit. 
557: 
558: The exponent defining the divergence of $\Scutoff$ in \eref{def_tau} is
559: defined as
560: \begin{equation} \elabel{def_scutoff}
561:  \Scutoff(\theta) = \theta^{-\lambda}
562: \end{equation}
563: leading together with \eref{ave_s} and \eref{aves_averho}
564: to the scaling relation \citep{ClarDrosselSchwabl:1996}
565: \begin{equation}
566:  \lambda (3-\tau) =1 \quad .
567: \elabel{scaling_relation_ltau}
568: \end{equation}
569: The corresponding exponent for $\manh_0$ in (\ref{def:expo_b}) as
570: \begin{equation}
571:  \manh_0(\theta) = \theta^{-\nu'}
572: \elabel{def_nu}
573: \end{equation}
574: The assumption $\manh_0 = {\mathsf E}(\manh | \Scutoff) \propto \Scutoff^{1/\mu'}$
575: then gives the scaling relation 
576: \begin{equation}\elabel{scaling_relation_nup}
577:  \nu' = \frac{\lambda}{\mu'}
578: \end{equation}
579: It is interesting to note that this assumption is consistent with the
580: assumption that clusters, which have a size of the order
581: $\Scutoff(\theta)$ need of the order $\manh_0$ time to burn. In
582: that case one has $\PSF_\manh(\manh_0;\theta) d\manh =
583: \PSF_s(\Scutoff;\theta) ds$ and as $\manh_0 \propto
584: \Scutoff^{\nu'/\lambda}$, one has using (\ref{def:expo_b}) and
585: \eref{def_tau}: 
586: \begin{equation}
587:  (1-b)\frac{\nu'}{\lambda}=2-\tau
588: \end{equation}
589: corresponding to \eref{scaling_relation} with \eref{scaling_relation_nup}.
590: 
591: 
592: 
593: \subsection{The Implementation} \label{sec:implementation}
594: In this section the new implementation of the DS-FFM is discussed. An
595: implementation especially capable to handle large scales has been
596: proposed by Honecker \citep{HoneckersFFMCode} earlier. The most prominent
597: feature of it is the bitwise encoding of the model, which significantly
598: reduces memory requirements. Some of the properties investigated, profit
599: from this scheme of bitwise encoding, because bitwise logical operators
600: can be used to determine for example correlations, and operate on entire
601: words ``in parallel''. However, in this implementation it would have
602: been inefficient to count all clusters, i.e. $\dns$ is determined via
603: $\PCB(s)$ rather than $\PCA(s)$.
604: 
605: 
606: In contrast to standard implementations
607: \citep{ClarDrosselSchwabl:1994,SchenkETAL:2000,HoneckerPeschel:1997},
608: where $\dns$ is derived from $\PCB(s)$, the philosophy of the
609: implementation presented in this article is to count \emph{all} clusters
610: efficiently by keeping track of their growing and disappearance, so that
611: $\dns$ is derived from $\PCA(s)$. By comparing the standard deviation of
612: the estimates, and the costs (CPU time), the efficiency is found to be
613: at least one order of magnitude higher. At the same time, the complexity
614: of the algorithm is essentially unchanged, namely $\OC(\INFL \log(N))$
615: instead of $\OC(\INFL)$, while a naive implementation of the counting of
616: all clusters is typically of order $\OC(N)$. In the following the
617: algorithm is described in detail. Because of its close relation to
618: standard percolation, the algorithm presented below is also applicable
619: for this classical problem of statistical mechanics. In fact, the
620: percolation algorithm recently proposed by Newman and Ziff
621: \citep{NewmanZiff:2000,NewmanZiff:2001} is very similar. Based on many
622: principles presented in this paper, an asynchronously parallelised
623: version for percolation has been developed recently \citep{MoloneyPruessner:2003}.
624: 
625: \subsubsection{Tracking clusters}
626: \label{sec:tracking_clusters} Usually each site is represented by a
627: two-state variable, which indicates whether the site is occupied or
628: empty. The variable does not need to indicate the state ``burning'',
629: because the burning procedure is instantaneous compared to all other
630: processes and can be implemented without introducing a third state
631: (see sec.~\ref{sec:burning_procedure}). In order to keep track of the cluster
632: distribution, each site gets associated two further variables (in an
633: actual implementation the number of variables can be reduced to one,
634: see sec.~\ref{sec:reducing_memory}), one which
635: points (depending on the programming language either directly as an
636: address or as an index) to its ``representative'' and one which contains
637: the mass of the cluster the given site is connected to. The
638: representative of a site is another site of the same cluster, but not
639: necessarily and in fact typically not a nearest neighbour. This is shown
640: in \fref{lattice_figure}. If a site is empty, the pointer to a
641: representative is meaningless. The pointer of representatives form a
642: tree-like structure, because representatives might point to another
643: representative, as shown in \fref{tree_figure}. A site which
644: points to itself and is therefore its own representative, is called a
645: ``root'' site, since it forms the root of the tree like structure. Only
646: at a root site, the second variable, denoting the mass of the cluster, is
647: actually meaningful and indicates the mass of the entire cluster. Each
648: cluster is therefore uniquely identified by its root site: Any two
649: sites, which belong to the same cluster have the same root and vice
650: versa. By construction of the clusters (shown below), it takes less than
651: $\OC(\log N)$ to find the root of any site in the system.
652: 
653: \begin{figure}[th]
654: \begin{center}
655: \input{ffm_lattice_figure}
656: \end{center}
657: \caption{All occupied sites (black) on the lattice point to a representative. The
658:  site pointing to itself is the root of the cluster. The site shown in
659:  light gray is the one which is about to become occupied, as shown in
660:  \fref{cluster_join}. The labels on the sites are just to uniquely
661:  identify them in other figures.
662: \flabel{lattice_figure}}
663: \end{figure}
664: 
665: \begin{figure}[th]
666: \begin{center}
667: \input{ffm_tree_figure}
668: \end{center}
669: \caption{The tree-like structure of the largest cluster in \fref{lattice_figure}.
670: \flabel{tree_figure}}
671: \end{figure}
672: 
673: The algorithm is a dynamically updated form of the Hoshen-Kopelman
674: algorithm \citep{HoshenKopelman:1976}. The same technique has recently
675: been used to simulate percolation efficiently for many different
676: occupations densities \citep{NewmanZiff:2000}. The method described in
677: the following differs from \citep{NewmanZiff:2000}, by not only growing
678: clusters, but also removing them. While one of the main advantages of
679: the original Hoshen-Kopelman algorithm is its strong reduction of memory
680: requirements to $\OC(L^{d-1})$, the algorithm described here
681: only makes use of the data representation proposed by Hoshen and
682: Kopelman, so that the memory requirements are still $\OC(L^d)$.
683: 
684: 
685: In the following the technique, how to create and to update the clusters,
686: is described in detail. 
687: 
688: Starting from an empty lattice, the first site becomes occupied by
689: setting the state variable. Since this site cannot be member of a larger
690: cluster, its representative is the site itself. Therefore the mass
691: variable must be set to one. The same pattern applies to all other sites
692: which get occupied, as long as they are isolated. The procedure becomes
693: more involved, when a site induces a merging of clusters. This is the
694: case whenever one or more neighbours of the newly occupied site are
695: already occupied. In general the procedure is then as follows:
696: \begin{itemize}
697: \item Find the root of all neighbouring clusters.
698: \item Reject all roots, which appear more than once in
699:       order to avoid double counting.
700: \item Identify the largest neighbouring cluster. 
701: \item Increase the mass variable of the root of this cluster by the mass
702:       of all remaining clusters (ignoring those which have been rejected
703:       above) plus one (for the newly occupied site).
704: \item Bend the representative pointers of the roots of all remaining
705:       clusters to point to the root of the largest cluster (keeps the
706:       tree height small, see below).
707: \item Bend the representative pointers of the newly occupied site to
708:       point to the root of the largest cluster.
709: \end{itemize}
710: 
711: \begin{figure}[th]
712: \begin{center}
713: \input{ffm_lattice_figure_after}
714: \end{center}
715: \caption{The configuration in \fref{lattice_figure} after
716:  occupying the highlighted site. Sites, the pointer of which have been
717:  changed, are shown in dark gray (site $6$, $7$ and $9$).\flabel{cluster_join}}
718: \end{figure}
719: 
720: This procedure is depicted in \fref{cluster_join}, illustrating the join
721: of the clusters shown in \fref{lattice_figure}. As an optimisation, one
722: could also bend the pointer of site $6$ to point to site $3$, which
723: would effectively be a form of path compression. However, as shown
724: below, the trees generated have only logarithmic height, so that the
725: path compression possibly costs more CPU time than it saves for system
726: sizes reachable with current computers\footnote{Similarly for other
727: forms of path compression, for example bending the pointer of the
728: preceeding to the adjacent site in \texttt{find\_root} (\aref{find_root}).}. It is
729: important to note that only the root of the largest cluster is not
730: redirected.
731: 
732: 
733: \begin{figure}[ht]
734: \begin{alltt}
735: \emph{/* Find the root of the cluster identified by start_index. }
736: \emph{ * All sites are expected to have a pointer to their }
737: \emph{ * representative in the array pointer_of. The result}
738: \emph{ * is stored in index. */}
739: index = start_index
740: WHILE ( index != pointer_of[index] ) \{
741:   index=pointer_of[index]; \}
742: \end{alltt}
743: \caption{\alabel{find_root} The \texttt{find\_root} algorithm. All sites are
744:  expected to have a pointer to their representative in the array
745:  \texttt{pointer\_of}. The result of this procedure is
746:  \texttt{index}.}
747: \end{figure}
748: 
749: To find the root of a given site, which is necessary, whenever clusters
750: are considered for merging, an algorithm like the one shown in
751: \aref{find_root} needs $\OC(h_m(M(\CC)))$ time (worst case), where
752: $h_m(M(\CC))$ is the maximum height of a cluster containing $M(\CC)$
753: sites, $\CC$ being the cluster under consideration.
754: 
755: All clusters are constructed by merging clusters, which might often involve
756: single sites. These clusters are represented as trees, like the one
757: shown in \Fref{tree_figure}. In the following this representation is
758: used. By construction, if at least two trees join, the resulting tree
759: has either the height of the tree representing the largest cluster or the
760: height of any of the smaller trees plus one --- whatever is
761: larger. Thus, by construction,
762: \begin{equation} \elabel{hmhm}
763:  h_m(M) \ge h_m(M') \text{ for any } M \ge M' \quad ,
764: \end{equation}
765: so in order to find the maximum height of a tree of mass $M$, one has to
766: consider the worst case when the smaller trees have maximum height. For
767: a given, fixed $M$, this is the case when only two cluster merge, so
768: \begin{equation}
769:  h_m(M) \le \max\Big( \max_{M' \le \lfloor \frac{M}{2} \rfloor} (h_m(M-M')), \max_{M' \le \lfloor \frac{M}{2} \rfloor} ( 1+h_m(M')) \Big) \quad ,
770: \end{equation}
771: where $\lfloor \frac{M}{2} \rfloor$ denotes the integer part of $M/2
772: \ge 0$, which is is the maximum size of the smaller cluster. The outer
773: $\max$ picks the maximum of the two $\max$ running over all allowed sizes
774: of the smaller cluster.
775: Using \eref{hmhm},
776: \begin{equation}
777:  h_m(M) \le \max(h_m(M-1), 1+h_m(\lfloor \frac{M}{2} \rfloor))
778: \end{equation}
779: so that
780: \begin{equation}
781:  h_m(M) \le \left\{ 
782: \begin{array}{lr}
783: 1 + h_m(\lfloor \frac{M}{2} \rfloor) & \text{ for } 1 + h_m(\lfloor \frac{M}{2} \rfloor) \ge h_m(M-1) \\
784: & \\
785: h_m(M-1) & \text{ otherwise }
786: \end{array}
787: \right.
788: \end{equation}
789: With $h_m(1)=1$ one can see immediately that
790: \begin{equation}
791:  h_m(M) \le \lceil \log_2(M) \rceil \quad 
792: \end{equation}
793: by induction, nothing that $\lceil \log_2(M/2) \rceil = \lceil \log_2(M)
794: \rceil -1$, where $\lceil a \rceil \equiv \lfloor a \rfloor + 1$ for any
795: $a\ge 0$. Hence 
796: \begin{equation}
797: \elabel{complexity_find}
798:  h_m(M) \in \OC(\log(M)) \quad ,
799: \end{equation} 
800: which is therefore the (worst case) complexity of the algorithm. It is
801: worthwhile noting that all the algorithms considered are just one
802: solution of the more general union-find (and also insert) problem
803: \citep{CormenLeisersonRivest:1990}.
804: 
805: As the tree constructed is directed, there is no simple way to find all
806: sites which are pointing to a given site. This means that splitting
807: trees is extremely expensive in terms of complexity. However, in the
808: DS-FFM trees do not get removed individually, but always as complete
809: clusters. Thus, no part of the tree structure needs to be updated during
810: the burning (s. section \ref{sec:burning_procedure}).
811: 
812: \subsubsection{Reducing memory requirements} \label{sec:reducing_memory} 
813: The three variables (state, pointer, size) mentioned above would require
814: a huge amount of memory: At least a bit for the state (but for
815: convenience a byte), a word for the address and a word for the mass
816: (actually depending on the maximum size of the clusters). However, as
817: the pointers are only meaningful if the site is occupied, the
818: representative pointer can also be used to indicate the state of a site:
819: If it is $0$ (or \verb#NULL# if it is an address), the site is empty and
820: occupied otherwise.
821: 
822: As mentioned above (section \ref{sec:tracking_clusters}), the mass
823: variable is meaningful only at a root site.  Since only a certain range
824: of pointers is meaningful, the remaining range can be used to indicate
825: the mass of a cluster. Assuming that indeces can only be positive,
826: negative numbers as the value of the pointer can be interpreted as
827: self references and their modulus as total mass of the cluster. The
828: concept is restricted to system sizes which are small enough that the
829: space not occupied by meaningful pointers is large enough to store the
830: mass information. How large is the maximum representable system size
831: (not to be confused with memory requirements, which is $N$ times
832: word size)?  For a word size of $b=4$ byte, i.e. $M=2^{8b}$
833: representable values in a word, the maximum system size is $N=2^{31}-1$,
834: namely $-1 \dots -N$ values for indicating masses, $1 \dots N$ for
835: indeces and $0$ for the empty site, summing up to $2N+1 \le M$, which is
836: overruled by the memory required $b N \le M$, as $M$ is (usually) the
837: maximal addressable memory for a single process.
838:  
839: \begin{figure}[th]
840: \begin{center}
841: \input{ffm_continuous_chunk}
842: \end{center}
843: \caption{The memory layout when using addresses as pointers to
844:  representative. The hatched area is used for valid addresses, what
845:  remains left can be used to represent cluster masses, i.e. if the value
846:  of an address points into the white area, the value is interpreted as a
847:  mass. \flabel{continuous_chunk}}
848: \end{figure}
849: 
850: When using addresses as pointers, it is less obvious how to identify the
851: range of meaningless pointers which could be used to store the mass
852: information.  In order to distinguish quickly whether a given value is
853: an address or a mass, the most obvious way is to use higher bits in the
854: pointers. What is the range of meaningless addresses? The addresses are
855: words, occupying $b N$ bytes. If each byte is individually addressable
856: (as usual), their value differs by at $b$, i.e. they span a range of $b
857: N$ different values. As shown in \fref{continuous_chunk}, the
858: largest remaining continuous chunk of values, not used for references to
859: representatives, has therefore at least size $\lceil (M-b N)/2 \rceil = (M-b
860: N)/2$, assuming that the pointer values used, which is also the range of
861: addresses where they are stored, spans a continuous range.
862: If the $N+1$
863: different cluster masses are to be represented as pointer values
864: pointing into the meaningless region, one has $1+N \le (M - b N)/2 $, i.e. $(b+2)N + 2 \le
865: M$. If they do not have to be continuous, the condition is relaxed:
866: $1 + N \le M - bN$. Alternatively one can make use of the lower bits: If
867: the pointers point to words in a continuous chunk of memory or at least
868: are all aligned in the same way, then all pointers are identical
869: $\pmod{b}$, i.e. all pointers $p$ obey $p = c \pmod{b}$ where $0\le c <b$ is a 
870: constant. Since $b>1$ one can use $p \ne c \pmod{b}$ to
871: indicate that a given pointer value is to be interpreted as mass, which
872: can easily be calculated via a bit-shift.
873: 
874: In C it is reasonable to represent the sites as \verb#void *# and
875: interprete these as pointers to other sites, i.e. \verb#void **#, so that
876: the loop to search for a root just becomes the code shown in
877: \aref{Cfind_root}.
878: 
879: \begin{figure}
880: \begin{alltt}
881: void *start_pointer, *root, *content;
882: \emph{/* start_pointer is the address of the site, the root of which}
883: \emph{ * is to be found. root will always point to the site currently}
884: \emph{ * under consideration, while content is always the address}
885: \emph{ * root is pointing to. }
886: \emph{ * The macro IS_SIZE verifies, whether the value given is a size. */}
887: 
888: 
889: \emph{/* Initialise: Assume that start_pointer is the root and }
890: \emph{ * read its content. */}
891: for (content=*((void **)(root=start_pointer));
892: 
893: \emph{/* Test whether root's content is actually a size. */}
894:     (!IS_SIZE(content));
895: 
896: \emph{/* Iterate: content is not a size, so the next candidate}
897: \emph{ * is what root is currently pointing to. }
898: \emph{ * Content is updated accordingly. */}
899:     content=*((void **)(root=content)));
900: \end{alltt}
901: \caption{\alabel{Cfind_root} An implementation of
902:  \texttt{find\_root} in C using pointers to \texttt{void}.}
903: \end{figure}
904: 
905: \begin{figure}[th]
906: \begin{center}
907: \rotatebox{90}{\input{ffm_subgroupify}}
908: \end{center}
909: \caption{If occupied, each site within a dashed box belongs to the
910:  same cluster. On a triangular lattice the dashed patches would be
911:  triangular, each one containing three sites. The thick dashed line
912:  shows the orientation of the boundary between two consecutive slices
913:  in the parallelised code, see Sec.~\ref{sec:Parallelizing_the_code}.
914: \flabel{subgroupify}}
915: \end{figure}
916: 
917: Representing each site by a word instead of a byte or even a bit
918: \citep{HoneckersFFMCode}, still leads to reasonably small memory
919: requirements for typical system sizes (for instance a system of size
920: $N=4096 \times 4096$ would require $64\MB$). Since the
921: algorithm has an almost random memory access pattern, it is not
922: reasonable to implement it out of core \citep{DowdSeverance:1998}. In
923: order to simulate even larger sizes, the following representation has
924: been implemented: At the beginning of the simulation the entire lattice
925: is splitted in cells so that whatever site in such a cell is occupied,
926: it must belong to the same cluster as any other occupied site in the
927: same cell, i.e. each site in the cell is
928: nearest neighbor of all other sites in the cell. On an hyper-cubic lattice these cells
929: have size $2$, as depicted in \fref{subgroupify}: Each site
930: within such a cell must belong to the same cluster if it is
931: occupied. Therefore only one pointer is necessary to refer to its
932: representative. On a triangular lattice these cells would have size
933: $3$. Since a pointer can be non-null, although not all sites in the cell
934: are occupied, a new variable must represent the state of the sites in
935: each cell, if not lower or higher order bits of the pointers can be used
936: (see above). On the hyper-cubic lattice the memory requirement is therefore
937: for each pair of sites $2$ bit for the state and $1$ word for the
938: address or index of the representative. Storing the $2$ bits in a byte
939: (and keeping the remaining $6$ bits unused), the memory requirements are
940: therefore reduced to $(b + 1)N/2$ bytes. Using indices the maximum
941: representable system size is given by $3/2 N +1 \le M$ and using
942: pointers with a size identification as shown in \Fref{continuous_chunk} the
943: constraint is $1+N \le (M-\frac{bN}{2})/2 $ in worst case.
944: 
945: \subsubsection{Efficient histogram superposition}
946: \label{sec:efficient_histogram_superposition} So far, only the
947: maintenance of the cluster structure has been described. Since the
948: masses of all clusters involved are known, it is simple to maintain a
949: histogram of the cluster mass distribution: If a cluster of size $s$ is
950: burnt, the corresponding entry in $\PCA_t(s)$ is decreased by one. If a
951: cluster changes size, $\PCA_t(s)$ is updated accordingly. For example,
952: when two clusters of size $s_1$ and $s_2$ merge as a particular site is
953: newly occupied during the growing procedure, $\PCA_t(s_1)$ and
954: $\PCA_t(s_2)$ are decreased by one and $\PCA_t(s_1+s_2+1)$ is increased
955: by one. 
956: 
957: Na\"{\i}vely, the average cluster size distribution is the average of
958: $\PCA_t(s)$, i.e.
959: \begin{equation}
960:   \frac{1}{T}\sum_{t'=1}^t  \PCA_{t'}(s) 
961: \end{equation} 
962: with $T$ as the number of iterations. Depending on the resolution of the
963: histogram, it would be very time consuming to calculate this sum for
964: each $s$. Using exponential binning (which is in fact a form of hashing)
965: in order to reduce the size of the histogram solves the problem only
966: partly.
967: 
968: Ignoring any hashing, a na\"{\i}ve superposition, where each slot in
969: the histogram needs to be read, has complexity $\OC(T H)$, where $H$ is
970: the largest cluster mass in the histogram.
971: 
972: This problem is solved by noting that early changes in the histogram
973: propagate though the entire sequence of histograms. Denoting the initial
974: histogram as $\PCA_0(s)$ and $\Delta \PCA_t(s) = \PCA_{t-1}(s) - \PCA_t(s)$
975: then
976: \begin{equation}
977:  \PCA_t(s) = \PCA_0(s) + \sum_{t'=1}^t \Delta \PCA_{t'}(s)
978: \end{equation}
979: and therefore
980: \begin{equation} \elabel{histo_superposition}
981:  \sum_{t=1}^T \PCA_t(s) = T \PCA_0(s) + \sum_{t'=1}^T (T-t'+1) \Delta \PCA_{t'}(s) \quad .
982: \end{equation}
983: By using this identity only the right hand side of
984: \eref{histo_superposition} is maintained by increasing it by $T-t+1$
985: when a new cluster is created at time $t$ and by decreasing it by the
986: same amount when it is destroyed. In this way, the complexity is only of
987: order $\OC(T (\INFL +1))$, according to the number of clusters created
988: and destroyed, i.e. the number of changes in the distribution. This
989: concept becomes only problematic, if floating point numbers are used to
990: store the histogram and the accuracy is so small that changes in the sum
991: by $1$ do not change the result anymore.\footnote{For integers the
992: precision is not a problem, but the maximum representable number easily
993: becomes a problem.} The maximum value in $\PCA_t(s)$, where this does
994: not happen, is given by the largest $m$ with $m + 1 \ne m$ where $m$ is
995: a variable of the same type as $\PCA_t(s)$. For floating point number,
996: the value of $m$ is related to the constant
997: \verb#DBL_EPSILON# (or \verb#FLT_EPSILON# for single precision), which
998: essentially characterises the length of the mantissa. The concrete value
999: of $m$ is actually platform, precision and type dependent. For an
1000: unsigned integer of size $4$, this value would be $(2^{32}-1)-1$,
1001: corresponding to $\verb#ULONG_MAX# - 1$, for double precision IEEE75
1002: floating point numbers this value is $\verb#FLT_RADIX**DBL_MANT_DIG#
1003: -1$, i.e. $2^{53}-1$.
1004: 
1005: Provided that the right hand side of \eref{histo_superposition} is below
1006: the threshold $m$ discussed above for all $s$, this means that only a single
1007: histogram needs to be maintained. It is initialised with $T \PCA_0(s)$
1008: and updated with $\pm(T-t+1)$ at time step $t$, when a cluster of size
1009: $s$ appears or disappears. It is worth mentioning that this concept
1010: obviously even works in conjunction with binning (or any other hashing).
1011: 
1012: \subsubsection{Implementation of the burning procedure} \label{sec:burning_procedure}
1013: The burning procedure was implemented in the obvious way, without making
1014: use of the tree structure, as shown in \aref{burning}. Although
1015: the burning procedure could also be implemented explicitly recursively,
1016: it is of course significantly faster when implemented iteratively. The
1017: usage of a stack in the procedure might be thought of as reminiscent of the
1018: underlying recursive structure.
1019: \begin{figure}[ht]
1020: \begin{alltt}
1021: \emph{/* Initialise current_stack. */}
1022: CLEAR current_stack;
1023: \emph{/* Put initial site on current_stack. */}
1024: PUT rn ON current_stack;
1025: \emph{/* Sites are cleared right after they have entered the current_stack. */}
1026: rn = empty;
1027: \emph{/* The first loop runs until there is nothing left to}
1028: \emph{ * burn, i.e. next_stack was not filled during the inner loop. */}
1029: DO \{
1030: \emph{   /*  Clear next_stack so that it can get filled in the next loop. */}
1031:    CLEAR next_stack;
1032: \emph{   /* The next loop runs as long as there are sites left to burn}
1033: \emph{    * in the current generation of the fire. */}
1034:    WHILE current_stack not empty \{
1035: \emph{      /* GET: remove the upmost element from current_stack and}
1036: \emph{       * put it in x */}
1037:       GET x FROM current_stack;
1038: \emph{      /* Visit all neighbours */}
1039:       FOR all neighbours n of x \{
1040:          if (n occupied) \{
1041: \emph{            /* Put occupied sites on the current_stack of the next}
1042: \emph{             * generation of the fire */}
1043:             PUT n ON next_stack;
1044:             n = empty
1045:          \}
1046:       \}
1047:    \}
1048: \emph{/*  The next current_stack to be considered is next_stack. */}
1049:    current_stack = next_stack;
1050: \} WHILE current_stack is not empty
1051: \end{alltt}
1052: \caption{\alabel{burning} The burning procedure starting at
1053:  \texttt{rn}. In an actual implementation the copying of
1054:  \texttt{next\_stack} to \texttt{current\_stack} can easily be omitted by
1055:  repeating the code above with \texttt{current\_stack} and \texttt{next\_stack}
1056:  interchanged, similar to a red-black approach
1057:  \citep{DowdSeverance:1998}.}
1058: \end{figure}
1059: The number of times the outer loop in \aref{burning} runs,
1060: defines the generation of the fire front and gives $\manh$; other
1061: properties of the burnt cluster can be extracted accordingly.  The most
1062: important resource required by this procedure are the stacks: One for
1063: the currently burning sites and one for the sites to be burnt in the
1064: next step. There is no upper limit known for the number of
1065: simultaneously burning sites, apart from the naive $N/2$ on a hyper-cubic
1066: lattice, which comes from the observation that sites, which belong to
1067: the same generation of the fire, must reside on the same sub-lattice
1068: (even or odd).
1069: 
1070: On the other hand it is also trivial to find the maximal number of sites
1071: which burn at the same time, if the fire starts in a completely dense
1072: forest, i.e. in a lattice with $\rho=1$. Obviously the size of the $t$th
1073: generation is then given by $4(t-1)$ for $t>1$ and $1$ at the beginning,
1074: $t=1$. Since the sum of these numbers is the number of sites which is
1075: reachable within a certain time $t$, \emph{the sum} is also an upper
1076: limit for the number of simultaneously burning sites. Indeed, the actual number
1077: can easily be larger than $4(t-1)$, caused by arrangements of wholes
1078: in the lattice, which delay the fire spreading at certain sites, so that
1079: they burn together with a larger fire front. Such a construction is shown
1080: in \fref{delayed_burning}.
1081: 
1082: \begin{figure}[th]
1083: \begin{center}
1084: \input{ffm_delayed_burning}
1085: \end{center}
1086: \caption{The burning order for a $6\times 6$ patch of sites, where seven
1087:  sites are not occupied and form a barrier, such that some sites behind
1088:  it burn later, together with the fire front propagating away from the
1089:  starting point of the fire at the lower left hand corner. The sites
1090:  belonging to the largest set of trees burning at the same time are
1091:  shown in light gray, unoccupied sites are shown in white, occupied
1092:  sites in black. The numbers indicate the generation of the fire, which
1093:  is one plus the Manhattan distance from the starting point of the fire along
1094:  occupied sites.\flabel{delayed_burning}}
1095: \end{figure}
1096: 
1097: Of course it is neither reasonable, nor practically possible to provide
1098: enough memory for the theoretical worst case, i.e. two stacks each of
1099: size $N/2$. Indeed the typical memory requirements seem to be of order
1100: $\OC(\sqrt{\INFL})$, as shown in Table~\ref{tab:performance_data}, where
1101: $\maxff$ denotes the largest fire front observed during the simulation.
1102: Providing stacks only of size $4 L$ turned out to be a failsafe, yet
1103: pragmatic solution. Formally one could implement a slow out-of-core
1104: algorithm in the rare yet possible case the memory for the stack is insufficient, i.e. use
1105: hard-disk space to maintain it. In fact, this is what \emph{de facto}
1106: happens if one uses a stack of size $N/2$ on a virtual memory system.
1107: 
1108: \begin{table}
1109: \caption{\label{tab:performance_data}
1110: Performance data for different parameters and setups.
1111: ``ap3000,2'' denotes a parallel run on two nodes on an AP3000,
1112: accordingly ``ap3000,4''. ``cluster,10'' denotes a cluster of 25 Intel
1113: machines, connected via an old 10 MBit network, ``cluster,100'' denotes
1114: the same cluster on a 100 MBit network. ``single1'' and ``single2''
1115: denote two different types of single nodes.
1116: The largest fire front, $\maxff$, was only measured on these
1117: systems. The quantity $\cpuratio$ is the ratio of the average time
1118: (real time on the parallel systems in order to include communication overhead, user time on single nodes) 
1119: for one successful update during
1120: statistics, i.e. when all data structures need to be maintained, and
1121:  equilibration (transient) i.e. when the standard representation is used.}
1122: \newcolumntype{d}[1]{D{.}{.}{#1}}
1123: \begin{tabular}{l|r|r|d{2}|d{3}}
1124: System & L & $\INFL$ & \cpuratio & \maxff \\
1125: \hline
1126: ap3000,2    & 8000   & 4000       & 1.51         & \\
1127: ap3000,2    & 8000   & 8000       & 1.52         & \\
1128: \hline
1129: ap3000,4    & 16000  & 4000       & 1.34         & \\
1130: ap3000,4    & 16000  & 8000       & 1.48         & \\
1131: ap3000,4    & 16000  & 16000      & 1.37         & \\
1132: ap3000,4    & 16000  & 32000      & 1.41         & \\
1133: \hline
1134: cluster,10  & 32000  & 4000       & 2.71         & \\
1135: cluster,10  & 32000  & 64000      & 3.81         & \\
1136: \hline
1137: cluster,100 & 32000  & 32000      & 1.76         & \\
1138: \hline
1139: single1     & 1000   & 500        & 1.41         & 216 \\
1140: single1     & 2000   & 1000       & 1.41         & 326 \\
1141: single1     & 4000   & 125        & 1.42         & 106 \\
1142: single1     & 4000   & 250        & 1.47         & 172 \\
1143: single1     & 4000   & 500        & 1.48         & 255 \\
1144: single1     & 4000   & 1000       & 1.53         & 317 \\
1145: single1     & 4000   & 2000       & 1.50         & 518 \\
1146: single1     & 4000   & 4000       & 1.57         & 646 \\
1147: single1     & 4000   & 8000       & 1.48         & 907 \\
1148: single1     & 4000   & 16000      & 1.45         & 1327 \\
1149: \hline
1150: single2     & 8000   & 4000       & 2.11         & 687 \\
1151: single2     & 8000   & 8000       & 2.11         & 912 \\
1152: single2     & 8000   & 16000      & 2.09         & 1415 \\
1153: \end{tabular}
1154: \end{table}
1155: 
1156: \subsubsection{Complexity of the algorithm}
1157: The overall complexity of the algorithm has two contributions: The
1158: ``growing'' part, where new clusters are generated from existing ones
1159: and the ``burning'' part. The time needed for the burning part is
1160: proportional to the number of sites burnt and therefore expected as
1161: $\OC(\LS)$ (see \eref{ave_s} and \eref{aves_averho}) and $\OC(N)$ in the worst case. 
1162: Since $\aves{\rho}$ in \eref{aves_averho} is bound, the complexity
1163: of ``burning'' is $\OC(\INFL)$ (expected). The complexity of 
1164: ``growing'' is estimated by the average number of sites newly occupied,
1165: $\INFL$, times the worst-case complexity \eref{complexity_find} to
1166: find the root of any given site, because up to four roots need to be
1167: found at each tree growing. According to \eref{complexity_find} the
1168: worst case complexity to find the root of any given site is
1169: $\OC(\log(N))$, leading to an overall complexity for ``growing'' of
1170: $\OC(\log(N) \INFL) \supset \OC(\INFL)$. In practice the logarithmic correction
1171: is negligible, especially since $\log(N)$ is an extreme overestimate of
1172: the average case and therefore essentially the same runtime-behaviour is
1173: expected for both procedures \citep{NewmanZiff:2001}. 
1174: 
1175: Implementations like the one in \citep{HoneckersFFMCode} avoid this logarithmic
1176: factor by counting only the burnt cluster and therefore arrive at an
1177: overall complexity of $\OC(\INFL)$.
1178: 
1179: The algorithm presented has therefore only a negligibly higher
1180: computational complexity compared to implementations which measure only
1181: $\PCB$. This is corroborated by the comparison of the CPU time per
1182: burnt cluster during equilibration, i.e. the transient, when the cluster
1183: structure do not need to be maintained and the algorithm used is the
1184: standard implementation, to the CPU time per burnt cluster during
1185: statistics, i.e. when observables are actually measured and especially
1186: $\PCA$ is produced. This ratio is shown as $\cpuratio$ in
1187: Tab.~\ref{tab:performance_data} and Tab.~\ref{tab:corrtimes}. It varies only
1188: slightly with $L$ or $\INFL$.
1189: 
1190: Apparently the algorithm presented offers more statistics, however it
1191: suffers from one limitation: It requires about $(b + 1)N/2$ bytes memory
1192: (see section \ref{sec:reducing_memory}), compared to $N/8$ bytes in
1193: bitwise implementations like \citep{HoneckersFFMCode}, i.e. typically a
1194: factor $20$ more. In order to ascertain whether this disadvantage is
1195: acceptable with respect to the statistical gain, one has to determine
1196: the standard deviations of the calculated quantities for both
1197: implementations.
1198: 
1199: \subsection{Calculating the standard deviation} \label{sec:std_details}
1200: In order to compare the two algorithm rigorously, it is necessary to
1201: estimate the standard deviation of the estimators for $\dns$ produced by
1202: them \citep{MuellerBinder:73,LandauBinder:2000}:
1203: \begin{equation} \elabel{stds}
1204: \begin{array}{rcl}
1205:  \sigma^2_{\PCB}(s) & = & \frac{2\tau_{\PCB} + 1}{T-1} 
1206:   \Big(\big\langle \PCB_t(s)^2 \big\rangle - \big\langle \PCB_t(s) \big\rangle^2\Big) \\
1207: &&\\
1208:  \sigma^2_{\PCA}(s) & = & \frac{2\tau_{\PCA} + 1}{T-1}  
1209:   \Big(\big\langle \PCA_t(s)^2 \big\rangle - \big\langle \PCA_t(s) \big\rangle^2\Big)
1210: \end{array}
1211: \end{equation}
1212: Here $\tau_{\PCB}$ and $\tau_{\PCA}$ are the correlation times of the
1213: two quantities. Calculating the correlation time in the standard fashion
1214: by recording the history $\PCA_t(s)$ and $\PCB_t(s)$ for each $s$ would
1215: mean to store millions of floating point numbers. Therefore it was
1216: decided to restrict these calculations to just a small yet representative
1217: set of $s$ values. The result shows that the standard deviation does not
1218: fluctuate strongly in $s$.
1219: 
1220: Because of the special form of $\PCB_t(s) \in {0,1}$, its variance is
1221: particularly simple,
1222: \begin{equation}
1223:  \big\langle \PCB_t(s)^2 \big\rangle = \big\langle \PCB_t(s) \big\rangle
1224: \end{equation}
1225: so that
1226: \begin{equation}
1227:  \sigma^2_{\PCB}(s) = \frac{2\tau_{\PCB} + 1}{T-1} \big\langle \PCB_t(s) \big\rangle \Big(1-\big\langle \PCB_t(s) \big\rangle\Big) \quad .
1228: \end{equation}
1229: 
1230: The correlation time of $\PCB_t(s)$ is expected to be extremely small,
1231: not only on physical grounds --- an cluster can only burn down once ---
1232: but also because of the extreme dilution of $\PCB_t(s)$, as it was
1233: described in section~\ref{sec:clusterdistribution}. For fixed $s$, most of the
1234: $\PCB_t(s)$ are $0$. In contrast, the $\PCA_t(s)$ are expected to have a
1235: large correlation time, because ``only'' $\INFL+1$ entries are changed
1236: between two subsequent histograms.
1237: 
1238: The correlation function is calculated in the symmetric way as proposed in
1239: \citep{Anderson:71}, here for an arbritrary quantity $A_t$:
1240: \begin{equation}
1241:  \phi_{t'}^{AA} = 
1242: \frac{\langle A_t A_{t+t'} \rangle_{T-t'} - 
1243: \langle A_t \rangle_{T-t'} \langle A_{t+t'} \rangle_{T-t'}}
1244: {\langle A_t^2 \rangle_{T} -
1245: \langle A_t \rangle_{T}^2}
1246: \end{equation}
1247: where $\ave{}_{T-t'}$ denotes the average taken over time $t$ from $t=1$
1248: to $t=T-t'$. The quantity $\phi_{t'}^{AA}$ was fitted to
1249: $\exp(-t/\tau_A)$ in order to find the correlation time $\tau_A$. The
1250: results are given in Table~\ref{tab:corrtimes}.
1251: 
1252: \begin{table}
1253: \caption{\label{tab:corrtimes}
1254: Correlation times $\tau_b$ and $\tau_a$ of the corresponding observables
1255:  $\PCB$ and $\PCA$ as a function of $s$ and for different parameters
1256:  $L$, $\INFL$. Values of $s$ marked by ``B'' are results for bins around
1257:  the $s$ value indicated.  
1258: For each set of parameters, the quantity $\cpuratio$ is
1259:  given. It denotes the ratio between the average CPU-time
1260:  for one successful update during equilibration (transient) and during
1261:  statistics, see also Tab.~\ref{tab:performance_data}. The two fractions
1262: $\frac{\sqrt{\sigma^2_{\PCB}(s)}}{\langle \PCB_t(s) \rangle}$, 
1263: $\frac{\sqrt{\sigma^2_{\PCA}(s)}}{\langle \PCA_t(s) \rangle}$, 
1264: their
1265:  ratio $\alpharatio$
1266: and
1267:  $\alpharatio^2/\cpuratio$ are derived.
1268: $\ast$ marks cases, wheres $\tau_b(s)=0$ has been assumed. $\est$ marks
1269:  values of $\tau_a(s)$, which have been extrapolated from $\tau_a(s)$
1270:  for smaller $s$. 
1271: }
1272: \newcolumntype{d}[1]{D{.}{.}{#1}}
1273: 
1274: \begin{tabular}{r|r|d{2}|r|d{3}|d{2}|d{5}|d{5}|d{1} | d{1}}
1275: $L$ & 
1276: $\INFL$ & 
1277: \cpuratio & 
1278: $s$ & 
1279: \multicolumn{1}{c|}{$\tau_b(s)$} & 
1280: \multicolumn{1}{c|}{$\tau_a(s)$} & 
1281: \multicolumn{1}{c|}{$\frac{\sqrt{\sigma^2_{\PCB}(s)}}{\langle \PCB_t(s) \rangle}$} &
1282: \multicolumn{1}{c|}{$\frac{\sqrt{\sigma^2_{\PCA}(s)}}{\langle \PCA_t(s) \rangle}$} &
1283: \multicolumn{1}{c|}{$\alpharatio$} &
1284: \multicolumn{1}{c}{$\alpharatio^2/\cpuratio$} \\
1285: \hline
1286: 4000 & 4000  & 1.57 & $10$     &  \noth  &  \noth &   0.0138\ast &  \noth    &  \noth &  \noth  \\
1287: &&&                   $100$    &  0.170  &   23.6 &   0.0637     & 0.00099   &  64.3  &  2633.4 \\
1288: &&&                  B $10^3$  &  0.028  &   14.2 &   0.0450     & 0.00191   &  23.6  &   354.8 \\
1289: &&&                  B $10^4$  &  0.006  &   10.0 &   0.0412     & 0.00470   &   8.8  &    49.3 \\
1290: &&&                  B $10^5$  &  \noth  &    7.2 &   0.0662\ast & 0.02104   &   3.1  &     6.1 \\
1291: \hline                                                    
1292: 4000 & 16000 & 1.45 & $10$     &  0.013  &   39.9 &   0.0141     & 0.00056   &  25.4  &  444.9 \\
1293: &&&                   $100$    &  0.126  &   28.8 &   0.0608     & 0.00127   &  48.0  & 1589.0 \\
1294: &&&                  B $10^3$  &  0.006  &    4.7 &   0.0457     & 0.00175   &  26.1  &  469.0 \\
1295: &&&                  B $10^4$  &  0.013  &    2.9 &   0.0512     & 0.00332   &  15.4  &  163.6 \\
1296: &&&                  B $10^5$  &  \noth  &    2.2 &   0.0433     & 0.00795   &   5.4  &   20.1 \\
1297: \hline                                                    
1298: 8000 & 1000  & \noth& $10$     &  0.131  &   \noth&   0.0154     &  \noth    &  \noth & \noth \\
1299: &&&                   $100$    &  0.122  &  284.6 &   0.0602     & 0.00158   &  38.1  & \noth \\
1300: &&&                  B $10^3$  &  0.028  &  236.5 &   0.0399     & 0.00337   &  11.8  & \noth \\
1301: &&&                  B $10^4$  &  0.016  &  163.5 &   0.0397     & 0.00878   &   4.5  & \noth \\
1302: \hline                                                    
1303: 8000 & 4000  & 2.11 & $10$     &  0.122  &   78.2 &   0.0154     & 0.00052   &  29.8  &  420.9 \\
1304: &&&                   $100$    &  0.132  &   16.4 &   0.0634     & 0.00087   &  72.9  & 2518.7 \\
1305: &&&                  B $10^3$  &  0.022  &    8.2 &   0.0438     & 0.00147   &  29.7  &  418.1 \\
1306: &&&                  B $10^4$  &  0.005  &    5.5 &   0.0442     & 0.00241   &  18.3  &  158.7 \\
1307: &&&                  B $10^5$  &  \noth  &    4.2 &   0.0409\ast & 0.01006   &   4.1  &    8.0 \\
1308: &&&           B $2\cdot 10^5$  &  \noth  &    3.8\est &  0.0635\ast & 0.02055   &   3.1 &  4.6 \\
1309: \hline                                                    
1310: 8000 & 16000 & 2.09 & $10$     &  \noth  &  262.5 &   0.0139\ast & 0.00068  &  20.5   &  201.1 \\
1311: &&&                   $100$    &  0.131  &   56.1 &   0.0629     & 0.00087  &  72.0   & 2480.4 \\
1312: &&&                  B $10^3$  &  0.014  &   19.0 &   0.0467     & 0.00115  &  40.6   &  788.7 \\
1313: &&&                  B $10^4$  &  0.009  &   11.1 &   0.0503     & 0.00296  &  17.0   &  138.3 \\
1314: &&&                  B $10^5$  &  0.006  &    8.3 &   0.0411     & 0.00689  &   6.0   &   17.2 \\
1315: &&&           B $2\cdot 10^5$  &  \noth  &    7.5 &   0.0423\ast & 0.00947  &   4.5   &    9.7 \\
1316: &&&           B $5\cdot 10^5$  &  \noth  &    7.0\est&   1.1106\ast & 0.33331  &   3.3 &   5.2 
1317: \end{tabular}
1318: \end{table}
1319: 
1320: 
1321: As described in \eref{converge_pcb} and \eref{converge_pca}, the
1322: two estimators for $\dns$ differ slightly. However, except for $\dns$ only constant
1323: values appear on the RHS of \eref{converge_pcb} and
1324: \eref{converge_pca}, so that the relative errors of $\langle
1325: \PCB_t(s) \rangle_T$ and $\langle \PCA_t(s) \rangle_T$ are also the
1326: relative errors of the estimators for $\dns$ derived from them. These
1327: relative errors are shown in Tab.~\ref{tab:corrtimes} as well. Their ratio is
1328: given as $\alpharatio$ and is an indicator for the advantage of the
1329: algorithm proposed. If the relative error is to be improved by a factor
1330: $q$, one needs to invest $q^2$ CPU-time, i.e. if the algorithm proposed
1331: in this paper costs a factor $\cpuratio$ more CPU-time, and the gain in
1332: the relative error $\alpharatio$, the total gain is
1333: $\alpharatio^2/\cpuratio$. The values for this quantity are also given
1334: in table~\ref{tab:corrtimes}.
1335: 
1336: According to the table, for fixed $\theta$ relative errors and the
1337: correlation times are only weakly affected by an increase in system size. At
1338: first sight, this is counter-intuitive, as the number of passes
1339: \citep{Henley:1993,HoneckerPeschel:1997}, the mean number of times a site
1340: has been visited between two lightnings, decreases inversely proportional
1341: to the total number of sites in the system: $1/(\theta \rhobar L^2
1342: )$, see Section~\ref{sec:tree_density}. Assuming that this number is
1343: essentially responsible for the error, suggests to keep the number of
1344: passes constant among different $L$. However, this is apparently not the
1345: case, possibly because of self-averaging \citep{FeLaBi:91} effects.
1346: 
1347: The table also shows various tendencies, which are worth
1348: mentioning. First of all, the total gain becomes smaller for larger
1349: avalanche size $s$. The B in front of some of the values indicates that
1350: a bin around the $s$ value was investigated, 
1351: i.e. the time series of
1352: \begin{equation}
1353: \sum_{s' \in \BC} \PCAB(s')
1354: \end{equation}
1355: was considered, where $\BC$ is a set of (consecutive) $s$ values, representing the
1356: bin. For larger values of $s$, these sets get exponentially larger,
1357: which is necessary for a reasonably large number of events as basis for
1358: the estimators. The general tendency that the proposed algorithm is even
1359: more efficient at small $s$ is not surprising: $\PCB$ samples from $s
1360: \dns$, while $\PCA$ samples only from $\dns$, i.e. $\PCB$ ``sees''
1361: larger cluster more often. \emph{Nevertheless $\PCA$ still is
1362: advantageous by roughly a factor $5$.} The empty entries in
1363: Tab.~\ref{tab:corrtimes} are due to numerical inaccuracies or simply
1364: missing simulations for certain parameters. Some entries are estimated
1365: and marked as such.
1366: 
1367: There is an additional correlation not mentioned so far: The individual
1368: points in the estimator of the distribution $\PCA$ are not
1369: independent. There are ``horizontal correlations'', i.e. $\PCA(s)$ is
1370: correlated for different values of $s$. These are additional
1371: correlations due to clusters of small sizes, which are likely to grow
1372: and propagate through $s$ in $\PCA_t(s)$ for consecutive time
1373: steps, i.e.
1374: \begin{equation}
1375: \ave{\PCA_t(s) \PCA_{t'}(s')} - \ave{\PCA_t(s)} \ave{\PCA_{t'}(s')}  \quad .
1376: \end{equation}
1377: This correlation is at least partly captured by the correlations
1378: measured for the binned data. It is to be distinguished from the
1379: correlations of \emph{independent} realisations, where correlations are
1380: expected in the cluster size distribution also, i.e.
1381: \begin{equation}
1382:  \ave{\PCA_t(s) \PCA_t(s')} - \ave{\PCA_t(s)} \ave{\PCA_t(s')} \quad .
1383: \end{equation}
1384: 
1385: This must be taken into account as soon as estimates of $\dns$ for
1386: different $s$ are compared, as it is done when an exponent is calculated
1387: by fitting. This effect is also present for $\PCB$, which is, however,
1388: diluted so enormously that it influences the outcome only in an
1389: insignificant way.
1390: 
1391: The horizontal correlations could be estimated using a Jackknife scheme
1392: \citep{Efron:82}, similar to that used to calculate the error bar of the
1393: exponent from the time evolution of a quenched Ising model
1394: \citep{SchLoiPru:2001}. While it is certainly essential for the careful
1395: estimation of the error bar of an exponent, it is irrelevant for the
1396: discussion in this paper, as it is quantitatively based only on
1397: \emph{local} comparisons of error bars (overlaps), while its global
1398: properties, i.e. shape and collapse with other histograms estimated, is
1399: not concerned with errors bars. Some authors even seem to dismiss the
1400: relevance of these correlations completely \citep{NewmanZiff:2001}.
1401: 
1402: \subsection{Parallelising the code}
1403: \label{sec:Parallelizing_the_code} Constructing clusters and keeping
1404: track of clusters rather than of single sites seems to be in
1405: contradiction to any attempt to run the algorithm distributed, that is
1406: splitting the lattice into $S$ {\it slices} (one-dimensional
1407: decomposition --- as periodic boundaries apply, the slices may better be
1408: called cylinders). Moreover, there is a general problem of
1409: parallelisation which becomes apparent in this context: The usual
1410: bottleneck of parallel systems is the communication layer. In order to
1411: keep the communication between sub-lattices as low as possible, fast
1412: parallel code on a lattice requires as few interaction between slices as
1413: possible, while the whole point of doing physics on large lattices is
1414: the assumption of significant interaction between their parts. It is
1415: this fundamental competition of requirement and basic assumption which
1416: makes successful parallel code so rare and which seems to indicate that
1417: problems must have very specific characteristics in order to be
1418: parallelisable in a reasonable way.
1419: 
1420: However, it is indeed possible to run the algorithm described above on
1421: parallel machines successfully in the sense that it does not only make
1422: use of the larger amount of (distributed) memory available, but also of
1423: the larger amount of computing capabilities. In fact, the code was
1424: successfully rewritten using MPI \citep{GroppLuskSkjellum:1999} and has
1425: been run on two systems with distributed memory: The massively parallel
1426: machine AP3000 at the Department of Computing at Imperial College and on
1427: a cluster of workstations (25 nodes).
1428: 
1429: In the following the most important design characteristics are described
1430: which proved important in order to make the code running reasonably
1431: fast. This concerns mainly the statistics part, but the equilibration
1432: also needs some tricks. 
1433: 
1434: MPI assures that packets sent from one node to another in a certain
1435: order are received in exactly the same order --- in the language of MPI
1436: this means that the message ordering is preserved in each particular
1437: communicator. But, how different communicators relate to each other,
1438: i.e. how one stream of packets relates to another one is not specified. If,
1439: for instance, node A sends a packet to node B, and then to node C, which
1440: then sends a packet to node B, this packet might arrive earlier at B
1441: then the packet first sent by A, see \fref{mpi_message_order}.
1442: 
1443: \begin{figure}[th]
1444: \begin{center}
1445: \input{ffm_mpi_message_order}
1446: \end{center}
1447: \caption{\flabel{mpi_message_order} 
1448: Nodes $A$, $B$ and $C$ send messages in the order indicated. However, it might
1449:  well happen that the message sent last by node $C$ to node $B$, namely
1450:  message $3$, arrives at that node before message $1$, sent
1451:  \emph{before} message $2$ was sent, which arrived \emph{before} message
1452:  $3$ was sent.}
1453: \end{figure}
1454: 
1455: However, it is one of the main goals of parallelisation, to avoid any
1456: kind of synchronisation, which is extremely expensive. Even in a
1457: master-slave design, as it was chosen here, one encourages communication
1458: between the slaves, whenever they can anticipate what to do next or can
1459: indicate each other what to do next.
1460: 
1461: As explained above (\ref{sec:the_model}) an update consists essentially
1462: of two steps: Growing and burning. Both processes now are distributed
1463: among the slices. The growing procedure is realized by trying to grow
1464: $\INFL/S$ trees in each slice. This is not an exact representation of a
1465: growing procedure taking place on the entire lattice at once, because
1466: the latter has a non-vanishing probability to grow all trees at one
1467: particular spot, while the parallelised version distributes them evenly
1468: among the different slices. Provided that $\INFL$ is large compared to
1469: $S$, this effect can certainly be neglected. The advantage of the
1470: procedure is that the growing procedure at each slice does not need to
1471: be conducted by the master. The burning procedure is more complex, as
1472: the fire starts at one particular site of the entire lattice, so that it
1473: must be selected by the master. The exact procedure of the possibly
1474: following burning process depends on the stage of the algorithm.
1475: 
1476: In the following the procedures are explained in terms of ``sites''
1477: rather than ``cells'', as introduced in section
1478: \ref{sec:reducing_memory}. Using cells instead of sites makes the code
1479: slightly more complicated, but the changes are obvious. If the cells are
1480: oriented parallel to the borders of slices (see
1481: \fref{subgroupify}), so that its width is a multiple of $2$ in
1482: case of a hyper-cubic lattice, the algorithm runs considerably faster, as
1483: the communication between the nodes is reduced by the same factor. 
1484: 
1485: \subsubsection{Equilibration}
1486: During the equilibration phase it is not necessary to keep track of all
1487: clusters. Nevertheless there is some statistics, which is very cheap to
1488: gather: The distribution of burnt clusters and the density of trees. The
1489: latter is very simple, as this number changes in time only by the
1490: number of grown trees minus the number of burnt trees. This is also a
1491: crosscheck for the overall statistics, as the tree density is equivalent
1492: to the probability of a site to belong to {\it any} cluster
1493: \eref{def_rho}. 
1494: 
1495: The burning is implemented as follows: The master chooses a site from
1496: the entire lattice and sends the corresponding slice (slave) the
1497: coordinate and (implicitly) an identifier which uniquely identifies this
1498: request within this update step. The slice's response consists of the
1499: number of sites burnt (possibly $0$), the identifier referring to the
1500: initial request and possibly up to two further, new, unique
1501: identifiers. These identifiers refer to the two possible sub-requests to
1502: the right and left neighbouring slice due to a spreading of the fire. If
1503: a slice contacts another slice, it does so by sending the coordinates of
1504: sites, which are on fire in the sending patch, together with a unique
1505: identifier. The slice contacted sends it result to the master, again
1506: together with the identifier and possibly two new ones, corresponding to
1507: the possibly two contacted neighbouring slices. In this way the master
1508: keeps track of ``open (sub)requests'', i.e. requests the master has been
1509: told about by receiving an answer containing information about
1510: sub-requests, which have not been matched by receiving a corresponding
1511: answer. The structure of requests forms a tree-like structure and if
1512: there are no open requests, the master must have received all answers of
1513: the currently burning fire. It is very important to make it impossible
1514: that by a delay of messages some answers are not counted, as it would
1515: be, if the master would just count open requests, without identifying
1516: them individually. It can easily happen that the master receives an
1517: answers for a request, without having received the information about the
1518: very existence of the request. It is worth to mention that in this
1519: scheme the order of burnings is irrelevant, if the burn-time is not
1520: measured, as it was done here.
1521: 
1522: Adding up the number of burnt sites gives the total size of the
1523: burnt cluster. This number is finally sent to all slices. If it is
1524: nonzero, the step is considered as successful.
1525: 
1526: After equilibration the cluster structure of pointers and roots as
1527: described above (see section \ref{sec:tracking_clusters}), needs to be
1528: constructed. This is done in a na\"{\i}ve manner: Keeping track of
1529: sites, which have already been visited, every site is visited once. The
1530: first site visited in each cluster becomes root of all sites connected
1531: to it, which become marked as visited. The procedure corresponds to the
1532: burning procedure described above (see section
1533: \ref{sec:burning_procedure}).
1534: 
1535: Each slice maintains a local histogram $\PCA$, which contain all
1536: clusters, which do not have a site on the border to another
1537: slice. Otherwise, they are maintained at the master's histogram, as
1538: discussed below. In this case the
1539: (local) root site of these clusters are moved to the border. As periodic
1540: boundary conditions apply, the only boundaries are those with other slices.
1541: 
1542: \subsubsection{Collecting statistics}
1543: After finishing the equilibration phase another concept needs to be
1544: applied in order to count the total cluster size distribution
1545: $\PCA_t(s)$. At every update of the lattice each slice must keep track
1546: of the clusters in the same way as it was described in section
1547: \ref{sec:tracking_clusters}. Clusters, which do not contain a site at a
1548: border to another slice are maintained locally, i.e. at each node as a
1549: \emph{local histogram}. However, if a cluster contains a site at a
1550: border, it might span several slices. As soon as a cluster acquires a
1551: site at the border, it is removed from the local histogram and the
1552: site under consideration becomes the root of the cluster. The algorithm
1553: ensures that a cluster with at least one site on the border has its root
1554: at the border.
1555: 
1556: During all processes (growing or burning), the size of all clusters is
1557: updated as usual, independent from the location of the root.  If the
1558: status of a border site changes, its new value or its change is put on a
1559: stack together with its coordinate.  During the growing procedure the
1560: following changes of the status are possible:
1561: \begin{itemize}
1562: \item \emph{New occupation:} Change in occupation information for a
1563:       site (cell); If this is the only change, then it must have been
1564:       already occupied (this is only possible, in an implementation
1565:       using cells). If this is not the case, the reference information
1566:       pointing to the root site of the given cluster, must be updated
1567:       also, see next point.
1568: \item \emph{Merging border clusters:} Change of the reference
1569:       information for a site (cell); This can only happen if the site
1570:       (cell) was (completely) unoccupied at the time of the change or
1571:       did contain a size information, i.e. it was itself a root.
1572: \item \emph{General merging of clusters:} Change in size information for
1573:       a site (cell); Only an increase is possible, so that any change
1574:       can be represented by a single number indicating the size
1575:       difference.
1576: \end{itemize}
1577: 
1578: For each border site changing at each slice, the corresponding
1579: information are sent to the master. Typically the number of messages is
1580: not very large, because the total number of sites updated during a
1581: single growing phase is limited by $\INFL/S$. The expected number of
1582: these messages is not given by the fraction of border sites in each slice,
1583: because changes in all border \emph{clusters} (i.e. clusters with at
1584: least one site in the border) affect the border \emph{sites}, as the
1585: root of each border cluster is a border site.
1586: 
1587: However, the data regarding the updates in the border do not need to be
1588: send from the slaves to the master, if the burning attempt following the
1589: growing fails, i.e. if an empty site has been selected for lightning. Of
1590: course it is much more efficient not to send any data if not
1591: necessary. As there is only a finite number of sites in each slice, the
1592: theoretical limit of updates of border sites is bound by this
1593: number. However, it is sufficient to allocate a reasonable amount of
1594: memory ($2 L$ turned out to be enough) for the stack of messages to be
1595: sent and check its limits, similar to the stack used in the burning
1596: procedure described in section \ref{sec:burning_procedure}. Henceforth
1597: the sending of the update information of the border is called ``sending
1598: the border''.
1599: 
1600: The master maintains a copy of the state of the border sites and updates
1601: a \emph{global histogram} of border clusters. By sending the changes on
1602: the border to the master as described above, the master can update its
1603: copy of the configuration of the borders as well as the global
1604: histogram. At the end of the simulation all histograms ($S$ slaves
1605: histograms plus the global histogram maintained by the master node) are
1606: summed to produce the total $\PCA$.
1607: 
1608: 
1609: \begin{figure}[th]
1610: \begin{center}
1611: \input{ffm_border_copy}
1612: \end{center}
1613: \caption{ \flabel{border_copy} The slices, three of which are shown
1614: here, maintain the references for all clusters within each slice
1615: (illustrated by arrows), even for border clusters. The references
1616: \emph{between} slices, however, are maintained by the master. The
1617: variables $A=0$, $B$=L-1, $C=I$ and $D=I+L-1$ are the indices used for
1618: references within each slice.}
1619: \end{figure}
1620: 
1621: 
1622: As suggested in \fref{border_copy}, the slices maintain the
1623: pointers within each slice and these references are not changed by the
1624: master, which only connects \emph{between} slices. If a reference at the
1625: border changes at a slice, the master receives a message to apply the
1626: corresponding changes (joining two clusters), if the size of a cluster
1627: changes, the master updates the corresponding unique root etc. These
1628: changes are indicated by the slaves and the master only realises them in
1629: the copy of the border sites. Only if a change in occupation occurs, the
1630: master must actually perform some non-trivial operations, because a
1631: newly occupied site might introduce a new connection \emph{between}
1632: borders of different slices. From the point of view of the master, only
1633: borders belonging to two different, neighbouring slices are directly
1634: connected and therefore to be maintained by the master, while the
1635: connectivity of the borders \emph{within} each slice is indicated and
1636: maintained by the corresponding slave. Apart from that, the master
1637: maintains the slice spanning structures in exactly the same way as the
1638: slaves, e.g. a cluster having multiple roots among the various slices
1639: has a unique root at the master etc.
1640: 
1641: The question arises how the master best keeps track of the changes of
1642: the borders. Ideally, a change of reference of a site at the boundary is
1643: communicated to the master simply by sending the new pointer value
1644: (index). By choosing a reasonable indexing scheme, this is indeed
1645: possible. If the value of the reference is within $0$ and $L-1$, where
1646: $L$ is the width in terms of number of sites (or cells) (see
1647: \fref{border_copy}), the reference denotes a site in the left border
1648: within the same slice. Similarly, if the value of a reference is within
1649: $I$ and $I+L-1$, where $I$ denotes the first index in the last column, a
1650: reference with such a value is bound to point to the right border of the
1651: same slice.  If the master uses indexes of the range $[L,I-1]$ for
1652: denoting cross-references between slices the references are therefore
1653: unambiguous and no translation is necessary between indeces used by the
1654: slices and indeces used by the master.
1655: 
1656: During the burning procedure the master can make use of its knowledge
1657: about the borders. The site selected for starting the fire is most
1658: likely a bulk size, so that the corresponding slave needs to be
1659: contacted for the occupation information. Three outcomes are possible:
1660: \begin{itemize}
1661: \item The site is unoccupied. Nothing happens, all slices get signalled
1662:       to continue with growing.
1663: \item The site is occupied, but does not contain a border site. In this
1664:       case the slice contacted can send back the size of the burnt
1665:       cluster (an information it knows even without actually doing the
1666:       burning as the size is stored in the root, which needs to be found
1667:       anyway in order to find out whether the cluster is a border
1668:       cluster) and the master can signal all other slices to send the
1669:       border and to continue. After receiving the borders it can update
1670:       the histogram.\footnote{One might be inclined to postpone the
1671:       sending of the borders to a time, when it is really
1672:       needed. However, after a successful burning the time $t$ is
1673:       increased and this enters the histogram (see section
1674:       \ref{sec:efficient_histogram_superposition}). Ignoring this change
1675:       for a large number of steps would introduce uncontrollable
1676:       deviations of the estimator of the histogram from its true value.}
1677: \item The site is occupied and contains a border site. In this case the
1678:       slice sends the reference of the border site back to the master,
1679:       which then contacts all slices to send the most recent border
1680:       update. It updates the border and the histogram, deletes the
1681:       cluster which is going to burn and sends the ``burning borders'',
1682:       i.e. a list of all border sites which will be affected by the
1683:       burning procedure to the slices in form of a stack as described in
1684:       section \ref{sec:burning_procedure}. The slaves use this stack as
1685:       the initial stack of the burning procedure and delete the
1686:       corresponding sites. No communication between the slices is
1687:       necessary.
1688: \end{itemize}
1689: 
1690: The global histogram contains much larger clusters than the local
1691: histograms. In order to keep memory requirements low, even for
1692: histograms of resolution unity, it is reasonable to introduce a threshold,
1693: above which slaves use the global histogram to maintain $\PCA$ even for
1694: local clusters (i.e. non border cluster). For that purpose a histogram
1695: ``appendix'' has been introduced. This is a finite stack, which stores
1696: the size of the cluster $s$ together with the value of $t'=\pm(T-t+1)$ as
1697: described in section \ref{sec:efficient_histogram_superposition}. During
1698: the growing phase when such large clusters grow fastly, one would obtain
1699: a sequence of stack entries of the form $(s, t'), (s, -t'), (s+1, t'),
1700: (s+1, -t'), (s+2, t'), \dots$, corresponding to entering the appendix,
1701: $(s, t')$, increasing in size by $1$, which gives $(s, -t'), (s+1, t')$
1702: etc.  As soon as a cluster is larger then the upper cutoff each update
1703: causes two entries, of the form $(s, -t'), (s+1, t')$, the first for the
1704: deletion from the histogram, the second from the increase in the next
1705: slot. These entries possibly cancel, for example the sequence above is
1706: equivalent to the single entry $(s+2, t')$. It turned out to be highly
1707: efficient to perform this cancellation, i.e. to check the last entry in
1708: the appendix for being the negative entry of the one to be done.
1709: 
1710: As the maximum size of the appendix is finite, it must be emptied from
1711: time to time. The information about the size of the appendix of each
1712: slave is sent to the master together with the information about the
1713: borders. If a possible overflow is detected ($2/3$ of the maximum
1714: size in the implementation presented) the master requests all slices to
1715: send the content of their appendices and applies it to the global
1716: histogram. The slices then empty their appendices.
1717: 
1718: \subsubsection{The Random Number Generator}
1719: The random number generator (RNG) acquires a crucial role when used in a
1720: parallel environment. With $M$ the number of iterations, the expected
1721: number of calls of the RNG is $M \INFL / \rho$ (for $M \approx 10^7,
1722: \INFL \approx 5 \times 10^4$ this is more than $5 \times 10^{11}$), so
1723: that an RNG as {\tt ran1} in \citep{Press:92} with a period of only
1724: $\approx 2 \times 10^9$ is insufficient. Therefore {\tt ran2} in
1725: \citep{Press:92} was used for all simulations, both parallel and
1726: non-parallel, which has a period of $>2 \times 10^{18}$.  If the number of
1727: RNG calls is small enough, one can compare results obtained by means of
1728: {\tt ran1} and {\tt ran2}. No significant deviation was found.
1729: 
1730: In the parallel implementation, each slave requires an independent
1731: sequence of random numbers. This is a classical problem in parallel
1732: computing \citep{AluruPrabhuGustafson:1992,Coddington:1996}. The simplest
1733: solution is to divide a single sequence $r_1, r_2, \dots$ into distinct
1734: subsequences. This can be done either by a leapfrog scheme
1735: \citep{Coddington:1996,Entacher:1999}, where each subsequence consists of
1736: random numbers which are $S$ calls away, i.e. $S$ subsequences of the
1737: form $r_u, r_{S+u}, r_{2S+u}, \dots$ with $u=1, 2, \dots, S$ unique at
1738: each slave, or by splitting the sequence \citep{Coddington:1996}, so that
1739: each subsequence consists of consecutive RNG calls, i.e. $r_{1+u X},
1740: r_{2+u X}, r_{3+u X}$ again with $u=1, 2, \dots, S$ and offset $X$ large
1741: enough to avoid any overlap. The latter scheme has the advantage that
1742: the sequence consists of consecutive RNG calls and therefore has been
1743: used in the following. The implementation of the offset $X$ at each
1744: slave is easily realised by restoring all state variables of the RNG,
1745: which have been produced once and for all in a single run producing all
1746: $X S$ random numbers and saving the state variables on a regular
1747: basis. However, such a technique is advisable only if the RNG calls do
1748: not dominate the overall CPU time, in which case it would take almost as
1749: long as the simulation itself to produce the random numbers required for
1750: it.
1751: 
1752: 
1753: \section{Results}
1754: 
1755: \begin{figure}[t]
1756: \begin{center}
1757: \includegraphics[width=0.7\linewidth]{scaling_function.eps}
1758: \caption{\flabel{scaling_function} 
1759: The rescaled and binned histogram
1760: %$\frac{\PCA(s)}{\PCA(1)} s^{\Tast}$,  
1761: $\PCA(s) s^{\Tast}/\PCA(1)$,
1762: where $\Tast=2.10$ 
1763: for $\INFL=125, 250, 500, \dots, 32000, 64000$ (as indicated) in a double logarithmic
1764: plot. The linear size $L$ is chosen according to the bold printed
1765: entries in Tab.~\ref{tab:absolute_results} and large enough to ensure
1766: absence of finite size effects. The error-bars are estimated from
1767: shorter runs. The rightmost histogram (dotted,
1768: $\INFL=64000$) could not be cross-checked by another run, see
1769: text. 
1770: The dashed lines belong to different exponents, whose value is specified
1771: as the sum of the slope in the diagram and $\Tast$, i.e. a horizontal line
1772: would correspond to an exponent $2.1$. The shortly dashed lines
1773: represent estimated exponents for different regions of the histogram ($2.22$
1774: for $s$ within approx. $[20,200]$ and $2.19$ for $s$ within $[200,2000]$), the other
1775: exponents are from literature, namely $2.14(3)$ in
1776: \citep{ClarDrosselSchwabl:1994,ClarDrosselSchwabl:1996} and
1777: $223/91\approx 2.45$ in \citep{Schenk:2002}. Since it was impossible to
1778: relate these exponents to any property of the data, the exact position
1779: of the lines associated with them was chosen arbitrarily.  }
1780: \end{center}
1781: \end{figure}
1782: 
1783: 
1784: \begin{table}
1785: \caption{\label{tab:absolute_results} Parameters and results for different
1786: choices of $L$ and $\INFL$.  The average cluster size is denoted by
1787: $\aves{s}$, for definition see \eref{ave_s}, but due to a truncation
1788: in the histogram for some of the simulations in the range $2000\le \INFL
1789: \le 16000$, the number presented is actually the average size of the
1790: burnt cluster. In the stationary state it is --- apart from small
1791: corrections --- also given by $(1-\rhobar)/(\theta \rhobar)$, see \eref{aves_averho}.
1792: Values of $\INFL$ and $L$ printed in
1793: bold indicate results shown in \fref{scaling_function}, the
1794: other results are only for comparison.  All data are based on $5\times
1795: 10^6$ (successful) updates (s. Sec.~\ref{sec:clusterdistribution}) for
1796: the transient and statistics, apart from those printed in italics which
1797: are based on short runs ($5\times 10^6$ updates for the transient and
1798: $1\times 10^6$ updates for statistics).
1799: }
1800: \newcolumntype{d}[1]{D{.}{.}{#1}}
1801: \begin{tabular}{r|r|d{5}|d{2}|d{6}|d{2}}
1802: $\INFL$    & $L$        & 
1803: \multicolumn{1}{c|}{$n(1)$} &
1804: \multicolumn{1}{c|}{$\aves{s}$} & 
1805: \multicolumn{1}{c|}{$\rhobar$} &
1806: \multicolumn{1}{c}{$\frac{1-\rhobar}{\theta \rhobar}$}
1807:  \\
1808: \hline 
1809: {\it 125}    & {\it 1000}                                           & 0.04553     & 204.07     & 0.37973 & 204.18  \\
1810:      125     &      1000                                            & 0.04552     & 203.81     & 0.37977 & 204.15   \\
1811: {\it 125}    & {\it 4000}                                           & 0.04553     & 203.88     & 0.37983 & 204.10  \\
1812: {\bf 125}    & {\bf 4000}                                           & 0.04552     & 203.77     & 0.37983 & 204.10   \\
1813: \hline
1814: {\it 250}    & {\it 1000}                                           & 0.04451     & 395.03     & 0.38756 & 395.06  \\
1815:      250     &      1000                                            & 0.04452     & 394.08     & 0.38750 & 395.15   \\
1816: {\it 250}    & {\it 4000}                                           & 0.04454     & 394.97     & 0.38766 & 394.89  \\
1817: {\bf 250}    & {\bf 4000}                                           & 0.04454     & 395.29     & 0.38765 & 394.91   \\
1818: \hline
1819: {\it 500}    & {\it 1000}                                           & 0.04380     & 764.73     & 0.39316 & 771.75  \\
1820:      500     &      1000                                            & 0.04380     & 764.81     & 0.39315 & 771.77 \\
1821: {\it 500}    & {\it 4000}                                           & 0.04382     & 771.12     & 0.39343 & 770.88  \\
1822: {\bf 500}    & {\bf 4000}                                           & 0.04382     & 771.90     & 0.39343 & 770.87  \\
1823: \hline
1824: {\it 1000}   & {\it 1000}                                           & 0.04328     & 1495.36   & 0.39716 & 1517.91  \\
1825:      1000    &      1000                                            & 0.04328     & 1490.05   & 0.39714 & 1518.00    \\
1826: {\it 1000}   & {\it 4000}                                           & 0.04331     & 1510.85   & 0.39761 & 1515.00   \\
1827: {\bf 1000}   & {\bf 4000}                                           & 0.04331     & 1513.13   & 0.39764 & 1514.81    \\
1828: {\it 1000}   & {\it 8000}                                           & 0.04332     & 1510.10   & 0.39763 & 1514.91   \\
1829: \hline
1830: {\it 2000}   & {\it 4000}                                           & 0.04296     & 2976.34    & 0.40053 & 2993.35  \\
1831: {\bf 2000}   & {\bf 4000}                                           & 0.04297     & 2990.50    & 0.40054 & 2993.15   \\
1832: {\it 2000}   & {\it 8000}                                           & 0.04297     & 2995.67    & 0.40060 & 2992.56  \\
1833: \hline
1834: {\it 4000}   & {\it 4000}                                           & 0.04273     & 5929.24     & 0.40258 & 5935.91 \\
1835:      4000    &      4000                                            & 0.04273     & 5930.97     & 0.40249 & 5938.03  \\
1836: {\it 4000}   & {\it 8000}                                           & 0.04274     & 5931.32     & 0.40261 & 5935.15 \\
1837: {\bf 4000}   & {\bf 8000}                                           & 0.04273     & 5935.36     & 0.40256 & 5936.47  \\
1838: \hline
1839: {\it 8000}   & {\it 4000}                                           & 0.04255     & 11786.97    & 0.40405 & 11799.72 \\
1840:      8000    &      4000                                            & 0.04255     & 11788.90    & 0.40406 & 11799.07  \\
1841: {\it 8000}   & {\it 8000}                                           & 0.04257     & 11801.31    & 0.40412 & 11795.98 \\
1842: {\bf 8000}   & {\bf 8000}                                           & 0.04257     & 11792.82    & 0.40413 & 11795.38 \\
1843: \hline
1844: {\it 16000}  & {\it 4000}                                           & 0.04244     & 23430.01    & 0.40525 & 23481.82 \\
1845: {\it 16000}  & {\it 8000}                                           & 0.04243     & 23466.93    & 0.40540 & 23467.22 \\
1846:      16000   &       8000                                           & 0.04243      &  23446.10   & 0.40542 & 23465.64   \\
1847: {\bf 16000}  & {\bf 16000}                                          &  0.04245      & 23449.31   & 0.40541 & 23466.57   \\
1848: \hline
1849:      32000   &      16000                                           & 0.04232     &  46443.83    & 0.40660 & 46701.82 \\
1850: {\bf 32000}  & {\bf 32000}                                          &  0.04233    &   46731.44   & 0.40662 & 46698.51 \\
1851: \hline
1852: {\bf 64000}  & {\bf 32000}                                          &
1853:  0.04220     &   91148.64  & 0.40777 & 92952.40 \\
1854: \end{tabular}
1855: \end{table}
1856: 
1857: The sections above were only concerned with the technical issues of the
1858: model and its implementation. Some of the actual results from the
1859: simulation carried out using the new algorithm have been published
1860: already \citep{JensenPruessner:2002b}. This article was focused on
1861: $\dns$. The main outcome was that the standard scaling assumption
1862: \eref{def_tau} is not supported by numerics, so the main conclusion was
1863: that the model \emph{is not scale invariant}.
1864: 
1865: In the following these results are shortly restated and
1866: discussed. Other observables are connected with this observation to see,
1867: whether it is only $\dns$ which lacks scale-invariance. All results
1868: presented are based on the same simulations, the parameters of which are
1869: given in Tab.~\ref{tab:absolute_results}.
1870: 
1871: 
1872: \subsection{Cluster size distribution}\label{sec:clusterdist}
1873: Before the actual findings are discussed, it is important to consider
1874: how to avoid finite size effects, which otherwise might damage the results. 
1875: Usually, finite size effects are avoided by keeping the correlation
1876: length $\xi$ small compared to the system size. However, it requires a
1877: significant amount of CPU-time to actually determine the correlation
1878: length. Moreover, \emph{a priori} it would not be clear, which ratio
1879: $\xi/L$ to choose in order to avoid finite size effects.
1880: 
1881: The simplest way to determine whether finite site effects are present
1882: is to compare estimates of observables for two systems with the same
1883: parameters but different sizes \citep{SchLoiPru:2001}. If finite size
1884: effects are not present, the differences between the estimators of those
1885: two systems must be within the error bar of the quantity under
1886: consideration. This approach has the drawback that each set of
1887: parameters must be simulated at least twice, but it gives full control
1888: over finite size effects. Apart from $\INFL=64000$, which is specially
1889: marked in most of the plots, this approach has been applied throughout the
1890: results presented. The method was discussed in greater detail in
1891: \citep{JensenPruessner:2002b}.
1892: 
1893: \begin{figure}[t]
1894: \begin{center}
1895: \includegraphics[width=0.7\linewidth]{data_collapse.eps}
1896: \caption{\flabel{data_collapse} 
1897: Attempt to collapse the data shown in \fref{scaling_function}
1898:  using $\Tast=2.10$, $\Scutoff(\theta) = \theta^{-\lambda^\ast}$ and
1899:  $\lambda^\ast=1.11$ as derived from
1900:  \Eref{scaling_relation_ltau}. As expected the data do not
1901:  collapse. 
1902: The big arrow points in the direction of increasing $\INFL$.}
1903: \end{center}
1904: \end{figure}
1905: 
1906: 
1907: \Fref{scaling_function} shows a central result of
1908: \citep{JensenPruessner:2002b}. This figures contains the reduced (and
1909: binned) data in the form
1910: \begin{equation} \elabel{pnorm}
1911: \frac{\PCA(s)}{\PCA(1)} \quad ,
1912: \end{equation}
1913: which has the convenient property to be unity for $s=1$. The
1914: normalisation $\PCA(1)$, which converges anyway to a finite value as
1915: $\INFL \to \infty$ (see Tab.~\ref{tab:absolute_results}), does not
1916: affect any of the results, especially not the (attempted) data
1917: collapses.
1918: 
1919: The crucial problem shown in \fref{scaling_function} is the intermediate
1920: minimum that develops as $\INFL$ is increased. It renders the data
1921: collapse as described by \eref{def_tau} impossible (for more details see
1922: \citep{JensenPruessner:2002b}). \Fref{data_collapse} shows the same data
1923: again, now in an attempt to form a data collapse, using
1924: $\Scutoff(\theta) = \theta^{-\lambda^\ast}$ with $\lambda^\ast=1.11$
1925: from \Eref{scaling_relation_ltau} and $\Tast=2.10$ (for comparison see
1926: Tab.~\ref{tab:expos_literature}). As expected the collapse fails.
1927: 
1928: In less technical terms, it was shown in \citep{JensenPruessner:2002b}
1929: that there is no choice of $\tau$, which allows a data collapse for
1930: $\PCA(s; \theta)$. It seems that the distribution is the same for two
1931: different values of $\theta$ up to a certain cluster size, which
1932: increases seemingly unbound with $\INFL$, i.e. for two very large values
1933: of $\INFL$ the two distributions collapse without any rescaling. Beyond
1934: this cluster size the distributions deviate, the one with the larger
1935: $\INFL$ forms a deeper dip and ascends afterwards to a maximum, which
1936: can, by rescaling, be arranged to be the same for all $\INFL$. The ever
1937: growing dip prohibits a reasonable definition of a lower cutoff and
1938: makes a data collapse impossible. Equally one could arrange the dips to
1939: be at the same height and the maximum to increase in $\INFL$.
1940: 
1941: The key problem of the DS-FFM is that more than one length scale is
1942: visible apparently for any system size $L$. The statistics of $\dns$ is
1943: not even asymptotically dominated by a single length scale. For any
1944: system size, a $\dns$ only given for all $s$ larger than any lower
1945: cutoff, allows the identification of $\theta$ by the shape of $\dns$
1946: alone. 
1947: 
1948: This indicates that simple scaling \eref{def_tau} does not apply and
1949: the exponent $\tau$ is undefined. Keeping this in mind, it is very
1950: instructive to look for other properties as well and investigate their
1951: scaling.
1952: 
1953: \subsubsection{Finite size scaling} \label{sec:finite_size_scaling}
1954: The failure of the DS-FFM to obey proper finite size scaling has been
1955: observed in \citep{SchenkETAL:2000} already. In the following some finite
1956: size scaling principles have been applied in a straight forward manner
1957: and subsequently ruled out.
1958: 
1959: As known from percolation \citep{StaufferAharonyENG:1994}, the
1960: generalised form of the scaling behaviour of $\Scutoff$ is 
1961: \begin{equation} \elabel{scutoff_generalised}
1962:  \Scutoff(\theta, L) = \theta^{-\lambda} m\left(\theta L^\sigma \right)
1963: \end{equation}
1964: where $m(x)$ is a crossover function describing the dependence of
1965: $\Scutoff$ on the two parameters $\theta$ and $L$. For sufficiently
1966: large argument $x$, the crossover function is expected to approach a
1967: constant, such that \eref{def_scutoff} is recovered. For small
1968: arguments, however, the dependence of the cutoff is expected to be
1969: strongly dominated by $L$, just like in equilibrium critical phenomena,
1970: where $L$ takes over the r\^ole of $\xi$ for sufficiently small systems.
1971: Thus, for small arguments $m(x) \propto x^\lambda$, so that for
1972: sufficiently small $L$, $\Scutoff$ becomes independent of $\theta$.
1973: 
1974: Generic models of SOC do not have any tuning parameters other than the 
1975: system size, so that the cutoff $\Scutoff$ is only a function of $L$. In this
1976: sense, finite size scaling is the only scaling behaviour in SOC, and a
1977: failure of the model to comply to finite size scaling is identical to
1978: the failure to comply to simple scaling altogether. Therefore, one might be
1979: surprised to see a simple scaling analysis \emph{and} a finite size
1980: scaling analysis in an article on an SOC model. However, the Forest Fire
1981: Model is different in this respect, as it has the additional parameter
1982: $\theta$, which is, supposedly, finite only because of the finiteness of
1983: the system size. In the thermodynamic limit it supposedly disappears as
1984: a free parameter.
1985: 
1986: As seen above (see \Fref{data_collapse}), the $\theta$-dependence of $\dnst$ can not be captured by
1987: $\Scutoff$ in the scaling function alone. However, the scaling form
1988: \eref{def_tau} would remain valid in some sense, if in the finite size
1989: scaling regime the $L$ dependence of $\dnst$ enters $\Scutoff$
1990: only. Therefore the original form \eref{def_tau} is generalised to
1991: \begin{equation} \elabel{scaling_generalised}
1992:  \dn(s; \theta, L) = s^{-\tau} \GC(s/\Scutoff(\theta, L))
1993: \end{equation}
1994: ignoring that it has been shown above already that it does not hold in
1995: the limit where $\dn(s; \theta, L)$ becomes independent of $L$. In this
1996: section the dependence of $\dn(s; \theta, L)$ on $L$ is investigated, in
1997: the limit of large $\INFL$ and small $L$. A similar study has been
1998: performed by Schenk \etal \citep{SchenkETAL:2000}, however on much
1999: smaller scales and using $\PCB$.
2000: 
2001: If the form \eref{scaling_generalised} holds, it should be possible
2002: to collapse $\dn(s; \theta, L)$ for different $L$ by choosing the correct
2003: $\tau$ and $\Scutoff$, just like for the cluster size distribution
2004: of standard percolation. This turns out
2005: not to be the case, as can be seen in
2006: \fref{finite_size_scaling_theta}: The \emph{smaller} $L$ is, the \emph{stronger}
2007: the changes of shape of $\dns$ for any $\theta$
2008: tested. Consequently, \eref{scaling_generalised} does not hold, and
2009: as $\Scutoff$ is only \emph{defined} via its r\^ole as cutoff in
2010: \eref{scaling_generalised}, $\Scutoff$ is undefined and
2011: \eref{scutoff_generalised} remains meaningless.
2012: 
2013: One might argue that the average density of trees, $\rhobar$ (see
2014: \eref{def_rho}), is the relevant parameter of $\dns$, so that $\dns$
2015: has the same shape for different, sufficiently small $L$ and constant
2016: $\rhobar$. However, as shown in \fref{density_theta}, for any value of
2017: $\theta$, there is a value of $L$, such that $\rhobar$ varies considerably
2018: with decreasing $L$. Especially, there seems to be a maximum tree
2019: density for every system size, so that for large values of $\rhobar$, there
2020: is a smallest system size $L$, below which this density cannot be
2021: reached. This maximum increases monotonically with system size, so that
2022: the maximum for every finite system size is smaller than the expected
2023: average tree density in the thermodynamic limit, which is according to
2024: Tab.~\ref{tab:absolute_results} larger than $0.40777$ and was recently
2025: conjectured to be as large as $0.5927\dots$ \citep{Grassberger:2002},
2026: namely the critical density of site percolation \citep{NewmanZiff:2000}. 
2027: Accepting this limitation, \fref{finite_size_scaling_rho}
2028: shows an example for three $\dns$ with roughly the same $\rhobar$ and
2029: different $L$ and $\theta$. Most surprisingly two of the histograms
2030: collapse already without rescaling, while the third ($L=500$) reveals
2031: the same problems as visible in \Fref{scaling_function}. Hence, finite
2032: size scaling does also not work for fixed $\rhobar$.
2033: 
2034: That large densities of trees cannot be reached by small system sizes
2035: is related to the specific way the histograms are generated and the
2036: density measured: Is it before or after each (successful) burning? For
2037: sufficiently large systems, it becomes irrelevant when to do it, because
2038: two histograms, one measured before, the other one right after the
2039: burning, differ only by one cluster. Also
2040: the question, whether to average only over successful burnings is
2041: irrelevant, because the difference between a histogram before and after
2042: the burning is only one cluster. 
2043: 
2044: \begin{figure}[t]
2045: \begin{center}
2046: \includegraphics[width=0.7\linewidth]{finite_size_scaling_theta.eps}
2047: \caption{\flabel{finite_size_scaling_theta}
2048: Plot of the rescaled PDF $\PCA(s; \theta, L) s^\Tast/\PCA(1; \theta, L)$ for fixed $\INFL=1000$ and
2049: different system sizes, $L=125,250,500,1000$. The different shapes make
2050: it impossible to collapse the data, as would be expected from a finite
2051: size scaling ansatz \eref{scaling_generalised} and
2052: \eref{scutoff_generalised}.
2053: }
2054: \end{center}
2055: \end{figure}
2056: 
2057: Clearly, for small systems, the difference between the histogram before
2058: and after the burning, is just the one enormous cluster of size
2059: $\OC(\INFL)$. \fref{hist_before_after} shows the difference. Even
2060: though in principle every density is reachable for every system
2061: size if the histogram is measured before burning, the newly defined histograms do not have a considerably different
2062: shape, so that a collapse remains impossible. For example, the problems
2063: shown in \fref{finite_size_scaling_theta} become even more
2064: pronounced, if the histogram is taken before burning. 
2065: 
2066: Surprisingly and actually in contradiction to what has been said in
2067: \Eref{converge_pcb}, there is a discrepancy between the cluster
2068: size distribution of burnt clusters, $\PCB$, and the overall cluster
2069: size distribution $\PCA$, even if the latter is measured \emph{before}
2070: the burning takes place. This sounds paradoxical, because the random
2071: picking of a cluster to be burnt is just a sampling of $\PCA$. This cannot
2072: be caused by the correlation between those samples, due to the fact
2073: that $n_{t+1}(s)$ is actually a function of the cluster chosen at $t$ --- a
2074: correlation like this would be equally picked up by $\PCA$. The reason
2075: for this discrepancy is the fact that a site picked randomly as the
2076: starting point of the next fire is necessarily occupied. Therefore
2077: $n_t(s)$ with a low occupation density enter $\PCB$ over-weightedly. As
2078: low density states contain much more small clusters then large ones,
2079: $\PCB$ overestimates the probability of small clusters. 
2080: A sample of $\PCB$ at a low density is indistinguishable from a
2081: sample at high denisty, while a sample for $\PCA$ trivially
2082: contains the information about the density.
2083: To illustrate that, one might imagine a sequence of configurations that
2084: consists of one state, with exactly one cluster of size
2085: $1$, and a second state, with exactly one cluster of size
2086: $L^2$. 
2087: The two configurations appear with a frequency such that a
2088: cluster of size $1$ is burnt down as often as a cluster of
2089: size $L^2$.
2090: The resulting $\PCA$ reports that a randomly chosen site belongs
2091: to a cluster of size $L^2$ with probability $\half$ and to a cluster of
2092: size $1$ with probability $1/(2 L^2)$, while $\PCB$ incorrectly
2093: reports the same probability for both cluster sizes. The problem can
2094: actually already be spotted in \eref{converge_pcb}, which contains a
2095: $\rho$ on the RHS, which should rather be $\rho(t)$. The problem
2096: disappears in the limit where $\rho(t)$ hardly changes in time, i.e. in
2097: the limit of $\INFL \ll L^2$.
2098: 
2099: It is also clear, why \eref{aves_averho} breaks down for small
2100: systems and large $\INFL$: The average size of the burnt cluster tends
2101: to $L^2$, while the density tends to $0$. Apparently
2102: \eref{aves_averho} must be incorrect for $\rho<(L^2 \theta +1)^{-1}$.
2103: 
2104: 
2105: \begin{figure}[t]
2106: \begin{center}
2107: \includegraphics[width=0.7\linewidth]{finite_size_scaling_rho.eps}
2108: \caption{\flabel{finite_size_scaling_rho}
2109: Plot of the rescaled PDF $\PCA(s; \theta, L) s^\Tast/\PCA(1; \theta, L)$ for fixed
2110: $\rhobar\approx 0.397$:
2111:  $L=500$ with $1/\theta=2000$ ($\rhobar=0.396827$),
2112:  $L=1000$ with $1/\theta=940$ ($\rhobar=0.396825$) and
2113:  $L=4000$ with $1/\theta=870$ ($\rhobar=0.396883$). Again,
2114:  a data collapse is impossible. 
2115: }
2116: \end{center}
2117: \end{figure}
2118: 
2119: \begin{figure}[t]
2120: \begin{center}
2121: \includegraphics[width=0.7\linewidth]{density_theta.eps}
2122: \caption{\flabel{density_theta}
2123: The average density of trees, $\rhobar$, as a function of $\theta$ and for
2124: various $L$. For sufficiently small systems, the maximum in
2125: $\rhobar$ is much smaller than the expected density at the ``critical
2126: point'', which is larger than $0.40777$ found as in
2127: Tab.~\ref{tab:absolute_results}. The straight line marks
2128:  $\rho=0.396827$, the density chosen in
2129:  \fref{finite_size_scaling_rho}. The inset is a magnification of the
2130:  crossing of the straight line with the simulation data, and shows all
2131:  three values of $\theta, L$ used in \Fref{finite_size_scaling_rho}.
2132: }
2133: \end{center}
2134: \end{figure}
2135: 
2136: \begin{figure}[t]
2137: \begin{center}
2138: \includegraphics[width=0.7\linewidth]{hist_before_after.eps}
2139: \caption{\flabel{hist_before_after}
2140: Comparison between the rescaled and binned histograms measured before
2141: and after the burning for small $L=125$ and large
2142: $\INFL=1000$. As expected, only the statistics for large $s$ is
2143:  affected. The dashed line shows the data for $\PCB(s)$.
2144: }
2145: \end{center}
2146: \end{figure}
2147: 
2148: 
2149: 
2150: \subsubsection{Scaling of the moments of $\PCA$}
2151: According to \eref{def_tau}, \eref{def_scutoff} and
2152: \eref{converge_pca} the $n$th moment of $\PCA$ should scale like
2153: (this analysis has apparently been introduced to SOC by De Menech \etal
2154: \citep{Pastor-SatorrasVespignani:2000,DeMenechStellaTebaldi:1998,TebaldiDeMenechStella:1999})
2155: \begin{equation} \elabel{moment_scaling}
2156:  \aves{s^n} = \frac{\sum_s s^n s \dnst}{\sum_{s} s \dnst} = q_n \theta^{-\lambda (2+n-\tau)} + \text{ corrections } \quad ,
2157: \end{equation}
2158: where $q_n$ is a non-universal amplitude (see section
2159: \ref{sec:universal_amplitude_ratios}) and 
2160: $\lambda$ is also known as the gap exponent \citep{Pfeuty:77}.  The
2161: corrections are due to the lower cutoff and the asymptotic character of
2162: the scaling, which is expected only for ``sufficiently small $\theta$''
2163: \citep{Wegner:72}. In turn, one can infer a scaling form like
2164: \eref{def_tau} if the moments scale in the form of
2165: \eref{moment_scaling}.
2166: 
2167: 
2168: \begin{figure}[t]
2169: \begin{center}
2170: \includegraphics[width=0.7\linewidth]{moment_scaling.eps}
2171: \caption{\flabel{moment_scaling}
2172: Scaling of the $n$th moments of $\PCA$ in double logarithmic plots. The
2173:  straight lines show the results of a fit as $\exp(a'_n) \theta^{-\sigma_n}$, see \eref{moment_scaling}.
2174: }
2175: \end{center}
2176: \end{figure}
2177: 
2178: 
2179: \begin{figure}[t]
2180: \begin{center}
2181: \includegraphics[width=0.7\linewidth]{moment_expos.eps}
2182: \caption{\flabel{moment_expos}
2183: Exponents $\sigma_n$ of the scaling of $\aves{s^n}$ in $\theta$
2184:  vs. $n$. The slope of this curve gives $\lambda$ and $\tau$ can be derived
2185:  from the offset. The straight, full line shows the results $\lambda=1.0808\dots$
2186:  and $\tau=2.0506\dots$, the dashed line shows $\lambda=1.0998\dots$ and
2187:  $\tau=2.0864\dots$ from a fit excluding $\INFL=64000$.
2188: }
2189: \end{center}
2190: \end{figure}
2191: 
2192: Contrary to what is observed in an attempt of a data collapse, it turns
2193: out that the moments follow beautifully this scaling
2194: behaviour. \fref{moment_scaling} shows the scaling of the moments
2195: for $n=2,3,5,10$. By simply fitting the double logarithmic data to a
2196: straight line, i.e.
2197: \begin{equation}
2198:  \log(\aves{s^n}) = a'_n - \sigma_n \log( \theta)
2199: \end{equation}
2200: one can derive an estimate of the exponents
2201: $\sigma_n$ and in turn compare them to the expected linear
2202: behaviour:
2203: \begin{equation} \elabel{expo_scaling}
2204:  \sigma_n = \lambda (2+n-\tau) \quad .
2205: \end{equation}
2206: The resulting estimates, using $n=2,\dots,8$ and $\sigma_1=1$ from
2207: \eref{ave_s}, are $\lambda=1.0808\dots$ and $\tau=2.0506\dots$,
2208: where no statistical error is given because the systematic error, due to
2209: neglecting of the lower cutoff as well as the corrections
2210: \eref{moment_scaling}, is expected to be much more important. By
2211: using the assumption $\sigma_1=1$, this result is consistent with
2212: \eref{scaling_relation_ltau}. The results are shown in
2213: \fref{moment_expos}.
2214: 
2215: The exponent found for $\tau$ is remarkably close to the accepted value
2216: of standard 2D percolation, $187/91=2.054945\dots$. However, if one
2217: leaves out the results for $\INFL=64000$, which seem to be a bit off the
2218: lines shown in \fref{moment_scaling}, one finds a slightly larger
2219: value for the exponent, namely $\tau=2.0864$ and
2220: $\lambda=1.0998\dots$. This is much closer to the $\Tast=2.10$ used above. For comparison
2221: to values found in the literature, see Tab.~\ref{tab:expos_literature}. 
2222: 
2223: It is very remarkable that the resulting estimates for the exponents are
2224: so impressingly consistent, even though in section \ref{sec:clusterdist}
2225: it turned out, that the scaling assumption \eref{def_tau} does not
2226: actually hold; one would much rather expect a failure of the moments to
2227: comply with \eref{moment_scaling}, or a failure of the exponents to
2228: comply with \eref{expo_scaling}.  Apparently the moments are hiding
2229: the breakdown of simple scaling. Therefore it is interesting to analyse
2230: the behaviour of the presumably universal amplitude ratios, which are
2231: solely a property of the (presumed) scaling function.
2232: 
2233: Another explanation for the moments being well behaved is the
2234: following: According to \citep{JensenPruessner:2002b} one might expect the
2235: moments to behave like
2236: \begin{equation}
2237:  \int_1^{\theta^{-\xmin}} ds f(s) s^n + \int_{\theta^{-\xmin}}^\infty ds s^{n-\tau} \GC(s/\theta^{-\xmax})
2238: \end{equation}
2239: where the first integral describes the behaviour up to the minimum, which
2240: scales like $\theta^{-xmin}$ ($\xmin\approx0.95$) and the second
2241: integral the behaviour from the minimum on. Because \Fref{data_collapse}
2242: indicates already that the scaling function $\GC$ does not collapse
2243: using a scale $\theta^{-\xmax}$ this scaling does not work and can therefore be
2244: only an approximation. While the first integral is bound by
2245: $\OC(\theta^{-\xmin(n+1)})$ the second integral gives $\OC(\theta^{-\xmax(1+n-\tau)})$
2246: asymptotically, which dominates the moments for
2247: $\xmin(n+1)<\xmax(1+n-\tau)$, which leads to $n>9.08$ using
2248: $\xmax\approx1.2$ and $\tau\approx2.1$. \Fref{moment_scaling} shows
2249: clearly a deviation from the straight line behaviour for $\INFL=64000$
2250: and $n=10$ and even for $n=5$. It remains unclear whether this is due to
2251: the effect discussed or simply a finite size problem. According to the
2252: findings presented in section \ref{sec:universal_amplitude_ratios} the
2253: latter might well be the case.
2254: 
2255: It is worthwhile pointing out, that the analysis in this section arrives
2256: at estimates for the critical exponents very close to those obtained by
2257: Pastor-Satorras and Vespignani \citep{Pastor-SatorrasVespignani:2000},
2258: who, however, allow for the corrections in \eref{moment_scaling} which
2259: were omitted above.
2260: 
2261: \begin{table}
2262: \caption{\label{tab:expos_literature} Exponents of the Forest Fire Model
2263: found in the literature. The first column indicates the source, the
2264: second column the method. $P(s)$ denotes a direct analysis of $\dnst$,
2265: which sometimes may have been just an estimate of the slope of $\dnst$
2266: rather than a data collapse. For details the original sources should be consulted. The
2267: entry ``moments'' refers to an analysis of the moments of $P(s)$, the
2268: entry ``theoretical'' to theoretical considerations regarding the
2269: relation of the Forest Fire Model to percolation.}
2270: \begin{ruledtabular}
2271: \newcolumntype{d}[1]{D{.}{.}{#1}}
2272: \begin{tabular}{l|c|d{6}|d{7}}
2273: reference & 
2274: method &
2275: \multicolumn{1}{c|}{$\tau$} &
2276: \multicolumn{1}{c}{$\lambda$} \\
2277: \hline 
2278: Christensen \etal          1993 \citep{Christensen:1993}                & $P(s)$                 & 2.16(5)    & - \\
2279: Henley                     1993 \citep{Henley:1993}                     & $P(s)$                 & 2.150(5)   & 1.167(15) \\
2280: Grassberger                1993 \citep{Grassberger:1993}                & $P(s)$                 & 2.15(2)    & 1.08(2)   \\
2281: Clar \etal                 1994 \citep{ClarDrosselSchwabl:1994}         & $P(s)$                 & 2.14(3)    & 1.15(3)   \\
2282: Honecker and Peschel       1997 \citep{HoneckerPeschel:1997}            & $P(s)$                 & 2.159(6)   & 1.17(2)   \\
2283: Pastor-Satorras Vespignani 2000 \citep{Pastor-SatorrasVespignani:2000}  & moments                & 2.08(1)    & 1.09(1)   \\
2284: Schenk \etal               2002 \citep{Schenk:2002}                     & theoretical and $P(s)$ & 2.45\dots & 1.1       \\
2285: Grassberger                2002 \citep{Grassberger:2002}                & $P(s)$                 & 2.11       & 1.08      \\
2286: \end{tabular}
2287: \end{ruledtabular}
2288: \end{table}
2289: 
2290: \subsubsection{Universal amplitude ratios} \label{sec:universal_amplitude_ratios}
2291: In general simple scaling involves two additional non-universal
2292: parameters $a$ and $b$,
2293: \begin{equation} \elabel{general_scaling}
2294:  \dnst=a s^{-\tau} \GC\left(\frac{s}{b \Scutoff}\right) \quad .
2295: \end{equation}
2296: For $1<\tau<2$ the lower cutoff becomes asymptotically
2297: irrelevant compared to the upper cutoff for all moments $n\ge 1$ ---
2298: indeed the effective $\tau$ of $s \dnst$ fulfils this condition as
2299: $2<\tau<3$ \citep{ClarDrosselSchwabl:1996}. Neglecting the lower cutoff then gives
2300: for the $n$th moment of $s \dnst$
2301: \begin{equation}
2302:  \aves{s^n} = a (b \Scutoff)^{1+n-\tau} g_n
2303: \end{equation}
2304: with
2305: \begin{equation}
2306:  g_n \equiv \int_0^\infty dx x^{1+n-\tau} \GC(x)
2307: \end{equation}
2308: In order to construct universal amplitude ratios, one needs to get rid
2309: of all exponents and parameters. This can be achieved by 
2310: considering
2311: \begin{equation} \elabel{uni_tmp1}
2312: \frac{\aves{s^n}}{(\aves{s^2})^{n/2}} =  \left( a \left(b \theta^{-\lambda} \right)^{(1-\tau)}\right)^{(1-n/2)}
2313:     \frac{g_n}{g_2^{n/2}} 
2314: \end{equation}
2315: and (\Eref{uni_tmp1} with $n=1$)
2316: \begin{equation} \elabel{uni_tmp2}
2317: \frac{\aves{s}}{\sqrt{\aves{s^2}}} = \left( a \left(b \theta^{-\lambda} \right)^{(1-\tau)}\right)^{1/2} 
2318:     \frac{g_1}{g_2^{1/2}} \ .
2319: \end{equation}
2320: If one now multiples \eref{uni_tmp1} with the $n-2$th power of
2321: \eref{uni_tmp2}, everything cancels apart from the $g_n$:
2322: \begin{equation}
2323: \frac{\aves{s^n}}{(\aves{s^2})^{n/2}} \frac{(\aves{s})^{(n-2)}}{(\aves{s^2})^{(n-2)/2}} =
2324: \frac{g_n g_1^{n-2}}{g_2^{n-1}}
2325: \end{equation}
2326: It is worth noting that for a trivial case, where $\aves{s^n} \propto
2327: \aves{s}^n$, the effective exponent $\tau$ is necessarily unity and
2328: \eref{uni_tmp1} as well as \eref{uni_tmp2} are already independent
2329: of $\theta$.
2330: 
2331: A further simplification is to impose $g_1=1$ and $g_2=1$, which fixes
2332: the two free parameters $a$ and $b$ in \eref{general_scaling}, so that
2333: \begin{equation} \elabel{def_g_n}
2334: g_n = \frac{\aves{s^n}\  (\aves{s})^{(n-2)}}{\Big(\aves{s^2}\Big)^{(n-1)}}
2335: \end{equation}
2336: for $n\ge1$. In \fref{uni_amp} this quantity is shown for
2337: $n=3,4,5,6$. Now, for $\INFL=64000$ a deviation is clearly visible ---
2338: in turn that means that $\INFL=64000$ requires at least systems of the
2339: size $L=64000$, which might explain the large value of $\rhobar$
2340: obtained in \citep{Grassberger:2002}. Apart from that, this analysis
2341: agrees with the result found in Sec.~\ref{sec:clusterdist}: The
2342: supposedly universal amplitude ratios keep changing with $\theta$ and an
2343: asymptote cannot be estimated, i.e. the scaling \eref{def_tau} is
2344: broken. 
2345: 
2346: \begin{figure}[t]
2347: \begin{center}
2348: \includegraphics[width=0.7\linewidth]{uni_amp.eps}
2349: \caption{\flabel{uni_amp}
2350: The supposedly universal amplitude ratio $g_n$ \eref{def_g_n} for
2351:  $n=3,4,5,6$. The error bars are based on a Jackknife scheme
2352:  \citep{Efron:82,SchLoiPru:2001} using a roughly estimated correlation time of
2353:  $50$, see Tab.~\ref{tab:corrtimes}.
2354: }
2355: \end{center}
2356: \end{figure}
2357: 
2358: 
2359: 
2360: \subsubsection{Burning time distribution}
2361: Another distribution of interest is the distribution of burning times,
2362: $\PSF_{\manh}(\manh ; \theta)$. The statistics are comparatively small
2363: for this quantity, as the burning time is defined only for the cluster
2364: removed. However, they still seem to be good enough to allow us to make
2365: a statement about their scaling behaviour. The rescaled data,
2366: $\PSF_{\manh}(\manh ; \theta) \manh^{b^\ast}$ with a trial exponent
2367: $b^\ast=1.24$ can be seen in \fref{thisto}. The intermediate part of the
2368: distribution between $\manh=4$ and the maximum seems to bend down as
2369: $\INFL$ increases, but the developing dip is much less pronounced than
2370: in \fref{scaling_function}. Nevertheless, the region where a data
2371: collapse seems possible moves out towards larger values of $\manh$,
2372: which again prohibits simple scaling. Assuming that the bending might
2373: become weaker for sufficiently large $\manh$ leads to a data collapse
2374: shown in \fref{thisto_collapse}, using an exponent $\nu'=0.6$ as defined
2375: in \Eref{def_nu}. However, only for values of $\manh \approx \manh_0$
2376: the data possibly collapse. Again, this violates the assumption of
2377: simple scaling, namely that there is a \emph{constant} lower cutoff
2378: above which the behaviour is universal. 
2379: 
2380: \begin{figure}[t]
2381: \begin{center}
2382: \includegraphics[width=0.7\linewidth]{thisto.eps}
2383: \caption{\flabel{thisto} 
2384: The rescaled probability distribution of the burning time,
2385:  $\PSF_{\manh}(\manh ; \theta)$. Similar to
2386:  \fref{scaling_function} a dip seems to form between the low
2387:  $\manh$ region and the maximum, which again renders a data collapse
2388:  impossible.
2389: }
2390: \end{center}
2391: \end{figure}
2392: 
2393: \begin{figure}[t]
2394: \begin{center}
2395: \includegraphics[width=0.7\linewidth]{thisto_collapse.eps}
2396: \caption{\flabel{thisto_collapse} 
2397: Attempt of a data collapse for $\PSF_{\manh}(\manh ; \theta)$. Only at the
2398:  far end of the scaling function at the descent from the maximum, the
2399:  data seem actually to collapse. This, however, is not sufficient for a
2400:  data collapse. The big arrow points in the direction of increasing $\INFL$.
2401: }
2402: \end{center}
2403: \end{figure}
2404: 
2405: \begin{figure}[ht]
2406: \begin{center}
2407: \resizebox{7cm}{5cm}{
2408: \input{figure_ST_125.tex}
2409: }
2410: \resizebox{7cm}{5cm}{
2411: \input{figure_ST_8000.tex}
2412: }
2413: \caption{\flabel{psf}
2414: Binned density plots of $\PSF(s, \manh ; \theta)$ for different values
2415:  of $\theta$ on a double logarithmic scale. High densities are presented
2416:  as dark areas. For better presentation, $\PSF(s, \manh ; \theta)$ has
2417:  been multiplied by a factor $s^{1.7}$, tilting the distribution similar
2418:  to those shown in \fref{scaling_function}, so that the second
2419:  maxima in the distribution, those at large $s$ and $\manh$, are roughly
2420:  as high as the first maxima, i.e. they show in the plot as dark as
2421:  around $s=5$. Since $\PSF(s, \manh ; \theta)$ is a
2422:  histogram only of burnt clusters, it contains a factor $s$ compared to
2423:  $\dns$ (see discussion around \eref{def_rho}). Therefore, the
2424:  exponent $2.7$ needs to be compared to $\Tast=2.10$, indicating that
2425:  the width of $\PSF(s, \manh ; \theta)$ roughly scales like
2426:  $s^{0.6}$, so that the reduced height of $\PSF(s, \manh ; \theta)$ is
2427:  caused by an increase in width. This coincides well with the slope of the
2428:  distribution, as shown by a straight line. Thus, the relative width
2429:  remains roughly constant.  }
2430: \end{center}
2431: \end{figure}
2432: 
2433: 
2434: The only remaining exponent of those defined in
2435: sec.~\ref{sec:other_dists}, $\mu'$, relates the statistics of $s$ and
2436: $\manh$. It requires the bivariate distribution $\PSF(s, \manh ;
2437: \theta)$, as the exponent is derived from ${\mathsf E}(\manh | s)
2438: \propto s^{1/\mu'}$, 
2439: which is essentially equivalent to \Eref{Esmanh}. The distribution $\PSF(s, \manh ; \theta)$ is shown
2440: in \fref{psf}. At first glance the assumption of a power law
2441: dependence of $s$ and $\manh$ seems to be confirmed. Also the width of
2442: the distribution seems to be very small, with almost no change over $5$
2443: orders of magnitude in $s$. However, the plot is double logarithmic, so
2444: that the width roughly scales like the slope, which is about $0.6$, as
2445: shown by straight lines. This matches perfectly the exponent chosen to
2446: rescale $\PSF$ (see caption of \Fref{psf}).
2447: 
2448: \begin{figure}[t]
2449: \begin{center}
2450: \includegraphics[width=0.7\linewidth]{figure_Esmanh.eps}
2451: \caption{\flabel{Esmanh} 
2452: ${\mathsf E}(\manh | s ; \theta)$ and ${\mathsf E}(\manh | s ; \theta)$,
2453:  based on the binned histogram $\PSF(s, \manh ; \theta)$ for different
2454:  values of $\INFL$. The straight lines in the plots are $1.4 s^{0.615}$
2455:  for $\INFL=125$ (left hand plot) and $1.6 s^{0.57}$ for
2456:  $\INFL=8000$. The two dashed lines in the right hand plot show alternative
2457:  exponents $1/\mu'=0.7$ and $1/\mu'=0.53$, which are consistent with data either for
2458:  small values of $s$ or for large values.
2459: }
2460: \end{center}
2461: \end{figure}
2462: 
2463: By inspecting ${\mathsf E}(\manh | s; \theta)$ and ${\mathsf E}(s |
2464: \manh; \theta)$  for various $\theta$, one can determine $\mu'$ as slope
2465: in a double logarithmic plot. \fref{Esmanh} shows that $\mu'$
2466: remains ambiguous and deviations from the expected behaviour do not
2467: vanish as $\INFL$ is increased. Asymptotically one might expect $1/\mu' \approx 0.62$,
2468: while $(\Tast-2)/(b^\ast-1)$ suggests $1/\mu' \approx 0.417$. The value
2469: of $0.62$ is consistent with the rough estimate $0.6$ made in
2470: \fref{psf}. \fref{Esmanh} also shows two other exponents, $0.53$
2471: and $0.7$, the former being in line with the value found in literature
2472: of $0.529(8)$ \citep{ClarDrosselSchwabl:1994}.
2473: 
2474: Conclusively it is noted that the other observable available in this
2475: study, $\manh$, does not seem to provide an alternative way to ascribe
2476: the DS-FFM critical behaviour in the sense of the scaling behaviour as
2477: proposed in the literature.
2478: 
2479: 
2480: \subsection{Tree density as a function of time} \label{sec:tree_density}
2481: As mentioned above (see section~\ref{sec:finite_size_scaling}), the
2482: density of trees, $\rhobar$, is actually a function of time. Initially, it is
2483: periodic around the average value, with an amplitude that depends mainly
2484: $\theta$. This amplitude decays in time and after sufficiently long
2485: times $\rho(t)$ looks like a random
2486: walk around $\bar{\rho}$.
2487: 
2488: \begin{figure}[t]
2489: \begin{center}
2490: \includegraphics[width=0.7\linewidth]{static_densities.eps}
2491: \caption{\flabel{static_densities}
2492: The density of trees as a function of time, plotted versus the rescaled
2493:  time $(1-\rhobar)t/(\theta \rhobar L^2)$. Upper panel: Plot for $\INFL=125$ and
2494:  $L=1000, 2000, 4000$ with an additional plot for $\INFL=500$ and
2495:  $L=4000$ shown as dashed line, for comparison of period and
2496:  amplitude. Lower panel: Same plot for $\INFL=500$ and $L=1000, 2000, 4000$.
2497: }
2498: \end{center}
2499: \end{figure}
2500: 
2501: \fref{static_densities} illustrates how the period and the amplitude
2502: depends on $\theta$ and $L$: The period is proportional to $\theta L^2$,
2503: while the amplitude mainly depends on $\theta$, i.e. the strength of the
2504: influx $\propto \INFL$. The reason for the
2505: former is easy to understand: $\INFL / L^2$ is proportional to the
2506: fraction of newly grown trees \citep{HoneckerPeschel:1997}; the change of
2507: the tree density is roughly
2508: \begin{equation}
2509:  \frac{d}{dt} \rho = \frac{1-\rho}{\rho} \frac{1}{\theta L^2} - \eta(\rho(t),t)
2510: \end{equation}
2511: assuming that it hardly changes during the growing. Otherwise, one would
2512: have to introduce a microscopic timescales, which 
2513: makes it possible to measure the tree density on the timescale on which the trees are grown. The
2514: pre-factor $(1-\rho)/\rho$ takes into account that only empty sites can
2515: be re-occupied and that an occupied site is required for the burning to
2516: start. The second term on the right hand side, $\eta(\rho(t),t)$, is a
2517: noise, which represents the burning of the trees. From this equation one
2518: can already expect that the period is roughly linear in $\theta L^2
2519: \rhobar/(1-\rhobar)$. This has already been measured in detail by
2520: Honecker and Peschel \citep{HoneckerPeschel:1997}; the numerical results
2521: presented here (\Fref{static_densities}) are fully consistent with their
2522: results.
2523: 
2524: Apart from the relevance of the periodic behaviour for the equilibration
2525: time, the periodic behaviour of $\rho(t)$ is physically of great
2526: significance: What distinguishes the state of the system for a given
2527: $\rho$ at the ascending and the descending branches? Trivially, the
2528: sequence of configurations of the system is Markovian, while the tree density alone
2529: as a time series is certainly not. The configuration somehow manages to
2530: ``remember'' whether the tree density was increasing or decreasing
2531: during the last update, in order to keep $\rho(t)$ periodic.
2532: 
2533: One explanation for this behaviour might be a ``growing-and-harvesting''
2534: concept: From the initially completely random tree distribution larger
2535: and larger patches are formed, so that larger and larger patches are
2536: harvested by lightning. When the density reaches the maximum, for a
2537: while the patches harvested remain large compared to the amount
2538: grown. This is because the growing process does not actually produce
2539: those large patches itself, but makes them available to the harvesting
2540: by continuously connecting smaller patches in areas, where the lightning
2541: has not yet struck. This process goes on, until almost all the trees are
2542: newly grown, i.e. the trees are distributed almost randomly, apart from
2543: the spatial correlation in density. The period of this process would be
2544: proportional to the time it takes to renew the entire system, which is
2545: $L^2 \theta \rhobar/(1-\rhobar)$, namely $L^2$ divided by $\aves{s}$,
2546: see \eref{aves_averho}.
2547: 
2548: The time-dependent tree density gives only a hint of what actually
2549: happens in the system. It would be very instructive to study the
2550: two-point correlation function as a function of time to answer the
2551: question, whether the explanation above is actually valid.
2552: 
2553: \subsection{Discussion}
2554: From the results presented above it becomes clear that the Forest Fire Model
2555: does not show the scaling behaviour expected for a system, which becomes
2556: critical in the appropriate limit (namely $L\to\infty$ and $\INFL \to
2557: \infty$). One might argue that 
2558: another scaling ansatz could lead to a distribution which is
2559: asymptotically scalefree in this limit, for
2560: example a multifractal ansatz \citep{TebaldiDeMenechStella:1999} or the
2561: one proposed in \citep{Schenk:2002}, where more than one scale is assumed
2562: to govern the model. For an asymptotically scalefree distribution,
2563: the scales have to diverge or to vanish in the appropriate limit. It has been
2564: suggested already very early \citep{HoneckerPeschel:1997} that more than
2565: one characteristic length scale can be found in the Forest Fire Model.
2566: 
2567: However, changing the scaling assumption would entail a new
2568: \emph{definition} of the exponents $\tau$, $D$ etc., which would
2569: therefore prohibit comparison with other results, which are based on the
2570: assumption of simple scaling \eref{def_tau}. Moreover, introducing
2571: multiple scales would stretch the notion of universality, especially the
2572: universality of the scaling function, to its limits. As can be seen in
2573: \fref{scaling_function}, the shape of the distribution function
2574: \emph{is not universal}, i.e. the shape
2575: of this function is unique for every single $\INFL$, even for $L\to\infty$. This is in
2576: direct contradiction to the concept of universality, scaling and scale
2577: invariance. 
2578: 
2579: However, it might be possible to reestablish simple scaling by
2580: introducing another mechanism in the model, as was done for example in
2581: the ``autoignition Forest Fire model'' \citep{Sinha-RayJensen:2000}. If
2582: there were, for example, a mechanism parameterized by $u$, such that
2583: \begin{equation}
2584:  \dn(s; \theta, u) = s^\tau \GC(s/\Scutoff(\theta, u))
2585: \end{equation}
2586: then simple scaling might be reestablished possibly by choosing an
2587: appropriate $u=u(\theta)$; even the cutoff, $\Scutoff$, which
2588: were assumed to diverge with $\theta^{-1}$, would then effectively depend
2589: only on $\theta$. Currently, there is no hint, what this new parameter
2590: $u$ could be. 
2591: 
2592: Lise and Paczuski \citep{LisePaczuski:2001} suggested for a
2593: similar problem in the OFC model \citep{OlamiFederChristensen:1992} to
2594: define an exponent $\tau$ by the slope of the distribution $\PCA(s)$,
2595: imposing the remaining background, $\mathcal{F}(s, L, \INFL)$, to be as
2596: straight as possible: 
2597: \begin{equation}
2598:  \ln\left(\PCA(s)\right) = -\tau ln(s) + \mathcal{F}(s, L, \INFL)
2599: \end{equation}
2600: This ansatz, in fact based on a multiscaling ansatz,
2601: would indeed allow the measurement of an exponent, however, with some
2602: degree of ambiguity. The crucial problem with this approach is that,
2603: firstly, it again does not allow any direct comparison to other models,
2604: where the exponents are defined via \eref{def_tau} and that, secondly, the
2605: notion of a presumably universal exponent hides the fact of broken
2606: scaling.
2607: 
2608: From Section~\ref{sec:clusterdist} one might conclude that there does
2609: not even exists a limiting distribution for $\dnst$. However, even if it
2610: exists, that does not mean that simple scaling is obeyed and if it does,
2611: it is still open whether the exponents are non-trivial or not and
2612: whether the model posses any spatio-temporal correlation which do not
2613: vanish on sufficiently large scales.
2614: 
2615: 
2616: \section{Summary}
2617: Using a new method for simulating the Forest Fire Model on large scales,
2618: it is possible to make clear statements about the validity of the
2619: scaling assumption of this model. The two observables investigated in
2620: this paper suggest the model does not develop into a scale invariant state.
2621: 
2622: The method is based on the Hoshen-Kopelman algorithm
2623: \citep{HoshenKopelman:1976} and uses a master/slave parallelisation
2624: scheme to simulate the model on very large scales and very large sample
2625: sizes. The key to the parallelisation is to decompose the lattice in
2626: strips and to encode the connectivity of these strips in the border
2627: sites. Clusters crossing these strips are then maintained by the master
2628: node, while clusters within a strip are maintained on the local
2629: nodes. There is almost no data exchange apart from the border
2630: configuration, which lowers the impact on the network linking the nodes.
2631: 
2632: The resulting distribution $\PCA(s)$ is, different from other
2633: simulations found in the literature, the distribution of \emph{all}
2634: clusters in the system, rather than just the burnt clusters. The
2635: resulting statistics then allows to draw clear conclusions as to what
2636: extend the model does actually obey the scaling assumption. This turns
2637: out not to be case. The violation of scaling is also observed in the
2638: distribution of the burning time. Conclusively we find that there is no
2639: reason to assume that the Drossel-Schwabl Forest Fire Model develops
2640: into a critical state. This is in line with the conclusion by
2641: Grassberger \citep{Grassberger:2002}, who however, still finds some signs
2642: that the Forest Fire Model will finally show some characteristics of
2643: standard percolation.
2644: 
2645: \begin{acknowledgments}
2646: The authors wish to thank Andy Thomas for his fantastic technical
2647: support. A great deal of the results in this paper was possible
2648: only because of his work. This paper partly relies on resources
2649: provided by the Imperial College Parallel Computing Centre. We
2650: want to thank especially K. M. Sephton for his support.
2651: 
2652: Another part of this work was possible only because of the
2653: generous donation made by ``I-D Media AG, Application Servers \&
2654: Distributed Applications Architectures, Berlin''. We especially
2655: thank M. Kaulke and O. Kilian for their support.
2656: 
2657: G.P. wishes to thank P. Grassberger, A. Honecker, I. Peschel and
2658: K. Schenk for very helpful communication, as well as P. Anderson
2659: and K. Dahlstedt for their advice.
2660: 
2661: The authors gratefully acknowledge the support of EPSRC.
2662: \end{acknowledgments}
2663: 
2664: \bibliography{articles,books}
2665: \end{document}
2666: 
2667: 
2668: