1: \documentclass[twocolumn,showpacs,preprintnumbers,amsmath,amssymb]{revtex4}
2:
3: \usepackage{graphicx}
4: \usepackage{dcolumn}
5: \usepackage{bm}
6:
7: \begin{document}
8:
9: \title{Parallelization of Cellular Automata
10: for Surface Reactions}
11:
12: \author{R. Salazar}
13: \email{R.Salazar@tue.nl}
14: \author{A.P.J. Jansen}
15: \email{tgtatj@chem.tue.nl}
16: \affiliation{
17: Schuit Institute of Catalysis (ST/SKA), Eindhoven University of
18: Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands.}
19:
20: \author{V.N. Kuzovkov}
21: \email{kuzovkov@latnet.lv}
22: \affiliation{
23: Institute for Solid State Physics, University of Latvia,
24: Kengaraga 8, LV-1063, Riga, Latvia.}
25:
26: \date{\today}
27:
28: \begin{abstract}
29: We present a parallel implementation of cellular automata
30: to simulate chemical reactions on surfaces. The scaling
31: of the computer time with the number of processors for this parallel
32: implementation is quite close to the ideal $T/P$, where $T$ is the computer
33: time used for one single processor and $P$ the number of processors. Two
34: examples are presented to test the algorithm, the simple A$+$B$\to0$ model
35: and a realistic model for CO oxidation on Pt$(110)$.
36: By using large parallel simulations, it is possible to derive
37: scaling laws which allow us to extrapolate to even larger system sizes
38: and faster diffusion coefficients allowing us to make direct comparisons
39: with experiments.
40: \end{abstract}
41:
42: \pacs{ 82.65.+r; 82.20.Wt; 02.70.Tt; 82.40.Np; 89.75.Da}
43:
44: \keywords{Parallel Cellular Automata, Heterogeneous catalysis,
45: Scaling laws}
46:
47: \maketitle
48:
49: \section{Introduction}
50:
51: One of the most interesting features of surface reactions
52: is that in a large number of cases produce pattern formation,
53: structures with some well--defined length scale, sometimes with symmetries
54: and temporal behavior, such as oscillations, traveling waves,
55: spirals, Turing patterns, etc \cite{rab2000,imb1995}.
56: A usual approach to study this
57: pattern formation is reaction--diffusion (RD) equations \cite{wal1997},
58: which simulate the dynamic behavior of chemical reactions on surfaces.
59: However, these partial
60: differential equations give only approximate solutions and in
61: several cases completely wrong results, because they are based on
62: the local mean field approximation, meaning well--mixed reactants
63: at microscopic level, ignoring all the correlation terms between
64: reactants locally, fluctuations and lateral interactions in the
65: adsorbate \cite{win2002}.
66: The RD equations describe the coverage which are macroscopic continuum
67: variables which neglect the discrete structure of matter,
68: and do not describe the actual chemical process underlying pattern
69: formation.
70: In fact from experimental studies \cite{win2002} it is known that
71: to model correctly, a modified kinetic has to be assumed different
72: from the prescribed for RD.
73:
74: Based in some general assumptions of the physics processes involved,
75: an esencial master equation can be derived that completely describes the
76: microscopic dynamic of the system \cite{kam1981}.
77: An exact method to solve this master equation is the Monte Carlo
78: (MC) method \cite{gil1977,jan1995,luk1998,zve2001}. In a Monte Carlo method a
79: sequence of discrete events (reactions, including diffusion) is generated
80: on a 2D lattice which represents the surface sites. These events
81: are generated in general in a random way, looking for possible
82: enabled reactions between nearest neighbors on the lattice, and
83: doing the reactions with some probability in correspondence with
84: some defined reaction rates.
85:
86: To compare MC simulations with experimental
87: pattern formation it is necessary to fill the gap between the length scale of
88: the individual particles and the diffusion length.
89: The regime where spatio--temporal patterns usually occurs, from
90: $\mu$m to mm scales, is orders of magnitude
91: larger than the nm scale of individual particles.
92: However, some new experiments \cite{win1997,vol1999} show that the
93: fast kinetics processes
94: are typically accompanied by the appearance of nanostructures.
95: Only microscopic simulations could deal with this two--scales behavior.
96: However this regime implies lattices sizes above $10^6 \times 10^6$
97: and very large values for the diffusion rates to produce agreement with
98: the experimental observed scales on pattern formation.
99: This would be a very large and slow simulation, due to the huge number
100: of particles involved and the fast diffusion rates which means that most
101: of the simulation time is spend doing diffusion of particles instead of
102: chemical reactions.
103: Fortunately, we do not need to simulate every time such a macroscopic system
104: or experimental diffusion rates. We only need to find scaling laws
105: for lengths and diffusion coefficients, i.e from nm scale to $\mu$m
106: or mm scale, and from microscopic to real time. This is the possibility
107: which we explore in this paper by using large simulations within parallel
108: computers.
109:
110: Although only MC simulations provides solutions to the exact master
111: equations for the surface reactions, they are not suitable for efficient
112: parallelization due to the random selection of lattice sites used.
113: However, there is another important approach to simulate discrete
114: events on lattices, the Cellular Automata (CA)
115: \cite{tof1987,wei1997}. This approach is fully parallel
116: in the sense that all the lattice sites can be updated simultaneously.
117: This has the advantages that fewer random numbers are required,
118: only those for the reaction probabilities (CA codes are faster than MC),
119: and the global updating is easier to implement in a parallel code.
120:
121: Under some well--defined conditions a CA is equivalent to a MC
122: simulation (see ref. \cite{kor1998} and section II).
123: Then a CA could be used as
124: a MC simulation and its parallelization will provide with the
125: required scaling laws.
126: In this paper we present for first time an attempt to provide a tool
127: to get these scaling laws.
128: The fact that CA are ideal simulation
129: methods for parallelization is shown in
130: a recent special issue dedicated to CA in {\em Parallel Computing}
131: \cite{rev2000}.
132: In particular a quite interesting paper by J.R.~Weimar \cite{wei2000}
133: presents an object oriented parallel CA, also based in ref.
134: \cite{kor1998}, and explores the possibility of divide the surface in
135: regions to be simulated by RD or CA according to the level of detail
136: required.
137:
138: In this paper we will describe in detail the implementation of the
139: parallel version of a CA simulating
140: chemical reactions on surfaces, but the ideas discussed here are
141: also applicable to more general CA simulations.
142: In section II we will describe briefly the Cellular Automata method.
143: In section III we explain the key points to implement the parallelization
144: of the algorithm. In section IV we will present results of the application
145: of the parallel algorithm to the A$+$B$\to0$ model, and to a realistic
146: model of oxidation of CO on Pt$(110)$.
147: Finally in section V we will draw some conclusion.
148:
149:
150: \section{Cellular Automata}
151:
152: A cellular automaton (CA) is a regular array of cells.
153: Each cell can be in one of a set of possible states.
154: The CA evolves in time in discrete steps by changing the
155: states of all cells simultaneously.
156: The next state which a cell will take is based on the previous
157: state of the cell and the states of some neighboring cells.
158: A CA is defined by providing prescriptions for
159: the lattice, the set of states, the neighbors, and the
160: transition rules \cite{tof1987,wei1997}.
161: The idea of CA deserves by itself a lot of study, since
162: in general the evolution of a CA cannot be predicted other than
163: by executing it. Additionally the amount of possible CAs is
164: quite large. For instance, in the simplest one--dimensional case,
165: with two states and two neighbors, there are $256$ possible CA.
166:
167: Our CA uses the standard square two--dimensional lattice
168: of size $L\times L$.
169: The states of each cell represent occupation with some kind of particle.
170: The Margolus neighborhood \cite{tof1987} is used
171: instead of the common von Neumann neighborhood. Both definitions of
172: neighborhood are shown in Fig.\ref{margolus}.
173: In order to obey the CA laws in von Neumann neighborhood, it is
174: necessary \cite{cho1988} disobey the laws of stoichiometry because
175: one particle could participate in more than one reactive pair.
176: Similar problems arise with the diffusion of particles \cite{cho1991}.
177: The use of a Margolus neighborhood overcomes these difficulties
178: \cite{mai1991,mai1992,mai1992b,mai1993}.
179: Using the value of the chemical rates we set up some
180: probabilistic transition rules to change the states of the cells
181: inside each Margolus block.
182:
183: \begin{figure}
184: \includegraphics[width=6cm]{fig_1}
185: \caption{
186: \label{margolus}
187: Von Neumann neighborhood (left) and Margolus neighborhood (right).
188: The lines joining points show the nearest neighbor couples in each case.
189: }
190: \end{figure}
191:
192: The Margolus blocks, Fig.\ref{margolus}, are used in the following way.
193: The blocks are periodically repeated to build up a tiled mask over the
194: whole lattice, considering periodic boundary conditions.
195: In this way there are four possible tilings as shown in
196: Fig.\ref{tiling}.
197: Only neighbor sites belonging to the same
198: block can react, so all the blocks can be accessed in parallel.
199: Inside each block a Monte Carlo update is done. After a full lattice
200: update the whole procedure starts over again using another of the
201: four possible tilings. The dynamics is not confined to blocks,
202: because the boundary between blocks is changing from one global sweep
203: to the next.
204:
205: For the four tilings shown in Fig.\ref{tiling}, we chose
206: randomly one of the four possible sequences of tilings:
207: ($1$,$2$,$3$,$4$), ($2$,$3$,$4$,$1$), ($3$,$4$,$1$,$2$), and
208: ($4$,$1$,$2$,$3$).
209: They show always the same clockwise cyclic sequence, but starting at a
210: different tiling. This produces better boundary diffusion and mixing between
211: blocks.
212:
213: \begin{figure}
214: \includegraphics[width=8cm]{fig_2}
215: \caption{
216: \label{tiling}
217: The four possible tilings using Margolus blocks. The arrows show the
218: sequence order.
219: The lines joining points illustrate the MC updating of pairs
220: inside blocks.
221: }
222: \end{figure}
223:
224: A schematic description of the steps to implement this CA is given
225: at the following:
226: \begin{itemize}
227: \item[1. ] Choose randomly one of the four possible
228: sequences of tilings: ($1$,$2$,$3$,$4$), ($2$,$3$,$4$,$1$), ($3$,$4$,$1$,$2$),
229: ($4$,$1$,$2$,$3$).
230: \item[2. ] Choose consecutive tilings from the sequence of step 1.
231: \item[3. ] Sweep over all the blocks and in each block make a single
232: Monte Carlo update:
233: \begin{itemize}
234: \item[3.1. ] Choose randomly one pair of neighbors.
235: \item[3.2. ] Choose a reaction $i$ from the set of all the
236: possible reactions with a probability proportional to the reaction
237: rate $k_i$.
238: \item[3.3. ] Check if the reaction chosen in step 3.2 is possible on the
239: sites chosen in step 3.1. If it is possible do the reaction,
240: otherwise do nothing.
241: \end{itemize}
242: \item[4. ] Increase the time by $\Delta t/4$, and return to step 2 until
243: the sequence of tilings is completed.
244: \item[5. ] Return to step 1.
245: \end{itemize}
246:
247: The updating scheme inside each block is the same as in the
248: Dynamic Monte Carlo algorithm called Random Selection Method
249: \cite{luk1998,zve2001}.
250: Consequently, the time increment is the same as one Monte Carlo
251: Step (MCS) \cite{luk1998}:
252: \begin{equation}
253: \Delta t = \frac 1 {L^2 \sum_i k_i}
254: \end{equation}
255: where the sum in the denominator corresponds to the sum of all the
256: possible reaction rates.
257:
258: There is a large number of possible CA prescriptions which could try to
259: reproduce a MC simulation of chemical reactions on surfaces.
260: In fact, an extensive study made by J.~Mai
261: \cite{mai1991,mai1992,mai1992b,mai1993}
262: shows that in general it is difficult produce good CAs reproducing
263: chemical reactions and diffusion.
264: However, in a recent paper Kortl\"uke \cite{kor1998} studies under
265: which conditions a CA could reproduce adequately a MC simulation
266: of chemical reactions on surfaces.
267: He found that the main requirement is using large diffusion coefficients.
268: It is required some sort of compromise between MC and CA:
269: it is necessary use a regular array of blocks as in CA, and a MC update
270: scheme has to be used inside each block.
271: The CA which we use here is a small modification of that used by Kortl\"uke
272: originally \cite{kor1998}.
273: We use blocks of $4$--sites (Margolus blocks) instead of
274: $2$--sites (Hantels blocks).
275: The main advantage of using this $4$--sites instead of $2$--sites
276: is that reduces the level of CA noise (see \cite{kor1998}) inside
277: each block.
278: This CA noise is the difference between the diffusion simulated with
279: CA and the correct diffusion simulated with MC.
280: In fact using larger blocks the noise will be smaller.
281:
282: Additionally we present here a full realization of the CA parallel
283: computing idea by implementing this in a parallel code as is shown in
284: the next section.
285: This is the main reason for use CA instead MC as a solution for large--time
286: and large--sizes simulations. Otherwise the only use of CA instead MC in
287: a serial simulation does not speed up the simulation substantially, in
288: the best cases only $\sim 10$--$20\%$. This means that the implementation of
289: an efficient CA is very important.
290:
291: Despite that we have to accept that our CA is an approximation to exact MC
292: microscopic simulations, we known when the approximation is good
293: and we can always check that the CA is reproducing the MC results.
294: In this way we can think about the CA as the MC realization of the exact
295: master equation for the chemical processes, getting in this way,
296: the possibility of obtain the scaling laws mentioned at the introduction
297: for these physical systems. In this paper we do not discus the problem
298: of quality of the CA approximation to the MC realization. This was done in
299: \cite{kor1998} and we have made for our modified CA a similar study with
300: similar results.
301:
302: The fact that blocks can be updated without referring to other
303: blocks allows us to do CA as a parallel algorithm, because
304: the whole set of blocks can be updated at the same time.
305: This is the subject of the next section.
306:
307: \section{Parallelization}
308:
309: The aim of the parallelization of any code is to distribute the whole
310: simulation over several computer processors, also called nodes.
311: We define $speed(P)=1/T_P$, where $T_P$ is the computer time used by
312: $P$--processors to complete a simulation.
313: The optimum result is that this multiplies the speed of the simulation
314: with the number of nodes, $speed(P)= P \times speed(1)$.
315: However, the usual result of parallelize a code is not the optimum.
316: The key point to achieve a good speed up is that the time spent by
317: each node sending and receiving information from/to the other nodes is
318: small in comparison with the time consumed doing computing in each node.
319:
320: In order to distribute the work, we use here a geometrical division
321: as usual \cite{rev2000} in spatial extended systems, i.e., the full
322: lattice is divide in sublattices of equal area.
323: The time spent sending data is proportional to the length of
324: the borders, $\sim L$, and the time doing computing is proportional
325: to the system size $\sim L^2$.
326: We divide the lattice in strips as is shown in Fig.\ref{strips}.
327: Considering the global periodic boundary conditions, each node has
328: periodicity in one direction and in the other directions has to share
329: information with only two neighbor nodes.
330: Another possibility is divide the lattice in squares. This choice is less
331: convenient. Each node has to share information with 4 nodes instead 2,
332: and it is only possible to use a perfect square number of nodes
333: $P=4,9,16,\dots$.
334: Using the strips sublattices requires sending a factor $4/\sqrt{P}$
335: less data than using
336: the square sublattices \cite{note2}, provided to have $P \le 16$.
337:
338: \begin{figure}
339: \includegraphics[width=8cm]{fig_3}
340: \caption{
341: \label{strips}
342: Distribution of the lattice in strips sublattices for
343: computing in each node. Division for each tiling
344: shown in Fig.\ref{tiling}.
345: }
346: \end{figure}
347:
348: Note that due to the periodicity in the vertical direction inside
349: each node, it is not necessary to interchange border information between nodes
350: when we pass from the tiling $1 \to 2$ and $3 \to 4$. It is only necessary
351: when we pass from $2 \to 3$ and $4 \to 1$. This reduces in half the
352: amount of communication data.
353: From Fig.\ref{strips} we see the direction in which the first column of
354: data in each sublattice has to be
355: be sent and received in each node. In the change of tilings $2 \to 3$ the
356: data goes from left to right and in $4 \to 1$ from right to left.
357: Instead of shifting the data of each node horizontally we use the
358: first column for sending to or receiving from other nodes.
359: As a consequence, and to get interaction between sites in the
360: same block, we consider the first column neighbor of the last one.
361: For this Blocking CA, where the reactions are confined inside the blocks,
362: this produces an effective periodicity in the horizontal direction.
363: This extra periodic boundary condition inside each sublattice,
364: makes the final parallel code very similar to a full lattice
365: implementation.
366:
367: We use the SPMD (single program, multiple data) model to implement the
368: parallel simulation, in which every node executes the same code using
369: different sublattice.
370: Also, we use a main node, called node zero, dedicated additionally to
371: distribute and collect the global information needed for the input/output
372: of the simulation and other global computing required.
373: Every node executes the same code, and
374: when some special code has to be
375: executed for only some nodes, is necessary use a node
376: identifier number $p$. From the global periodic boundary conditions and
377: the shape of the sublattices, there is periodicity also in the sequence of
378: nodes: the node left with respect to node $p=0$ is node $p=P-1$ (P is
379: the total number of nodes) and the node right with respect to node $p=P-1$
380: is node $p=0$.
381:
382: In the following we present the CA code for a single node.
383: \begin{itemize}
384: \item[1. ] $p=0$: Chose randomly one of the four possible tiling
385: sequences: ($1$,$2$,$3$,$4$), ($2$,$3$,$4$,$1$), ($3$,$4$,$1$,$2$),
386: ($4$,$1$,$2$,$3$). Send that choice to the other nodes.\\
387: $p>0$: Receive the information of which tiling sequences has been
388: chosen.
389: \item[2. ] $\forall p$: Choose consecutive tilings from the sequence
390: of step 1.
391: \item[N1. ] $\forall p$: If the previous tiling was $1$ or $2$ and
392: the new tiling is $3$ or $4$ then \\
393: send the first column to the left node $p-1$, \\
394: receive the data from right node $p+1$ and put it in the first column.
395: \item[N2. ] $\forall p$: If the previous tiling was $3$ or $4$ and
396: the new tiling is $1$ or $2$ then \\
397: send the first column to the right node $p+1$ \\
398: receive the data from left node $p-1$ and put it in the first column.
399: \item[3. ] $\forall p$: Sweep over all the blocks and in each block make a
400: single Monte Carlo update.
401: \item[4. ] $\forall p$: Increase the time by $\Delta t/4$, and return to
402: step 2 until the tiling sequence is completed.
403: \item[5. ] $\forall p$: Return to step 1.
404: \end{itemize}
405:
406: We have modified step $1$ and added two new steps $N1$ and $N2$, to
407: interchange information between the nodes. The rest is basically the same as
408: the single processor code, but applied to the respective sublattices for each
409: node. In order to avoid undesired correlations between different sublattices,
410: special attention should be given to a good random number generator,
411: producing different sequence of random numbers within each node
412: \cite{knu1998}.
413:
414: For implementation of the interprocess communication and synchronization,
415: the message passing interface MPI library has been selected,
416: because it provides source portability to different kind of computers.
417: This library provide, amongst a set of specialized and complete communication
418: routines, a so called six--basic set of routines for interprocessor
419: communication:
420: initialization, termination, getting the set of nodes, getting its own node
421: number, sending data to other nodes, and receiving data from other nodes.
422:
423: There are computers with connection between nodes using giga--ethernet or
424: several processors inside the same machine, sharing the same
425: memory.
426: These computers represent optimal environments to test parallel
427: algorithms, i.e. high--end supercomputers like Cray T3E, and middle--end
428: supercomputers like Silicon Origin 2000.
429: However, we will show in the next section that the results of running
430: this parallel CA algorithm in a low--end Beowulf cluster of PCs connected
431: only via fast--ethernet produce already almost the ideal speed up.
432:
433: \section{Results}
434:
435: The parallel algorithm was tested on our local cluster of PCs,
436: (17 Athlon 1.1Ghz/256Mb, fast--ethernet, Linux 2.4.18, MPICH 1.1.2).
437: From the results we see that the improvement of the performance
438: using $P$ processors with respect to a single processor is almost the
439: ideal, $speed(P) = P \times speed(1)$.
440: In order to test the speed up of the algorithm, we will use two
441: systems from surface catalysis:
442: the A$+$B$\to0$ model and a model for CO oxidation on Pt$(110)$ surface
443: \cite{kuz1998}, which has produced several important results
444: \cite{kor1998b,kor1999,kuz1999,kor1999b}.
445:
446: The A$+$B$\to0$ model has been studied for a long time
447: \cite{ovc1978,kuz1988,kot1996,pri1997,mar1999,arg2001}
448: In the pioneering analitycal paper by Ovchinnikov and Zeldovich
449: \cite{ovc1978} it was shown for first time that the kinetic law of
450: mass action is violated in this model, producing incorrect results
451: when standard chemical kinetics is used. An illustrative case, also used
452: in our paper, is a situation with equal concentrations of
453: both reactants $C_A=C_B=C$, where the standard kinetics predicts
454: an asymptotic behavior $C \propto t^{-1}$. This prediction correspond
455: to the mean--field approximation and it is only valid for high
456: dimensional systems \cite{ovc1978}, ${\cal D}\ge$4.
457: In the low dimensional systems ${\cal D}<$4 with diffusion controlled
458: processes it has been proved, using renormalization group
459: arguments \cite{lee1995}, that the correct asymptotic behavior is:
460: $C \propto t^{-{\cal D}/4}$. A qualitative agreement with this behavior
461: was shown for first time using MC simulations in reference \cite{tou1983}.
462:
463: The asymptotic law needs large simulation time
464: $t_{max}$. The diffusion length $\xi(t)=\sqrt{Dt}$ defines the pattern
465: formation scale. A simulation until $t=t_{max}$ needs a lattice
466: length $L \gg \xi(t_{max})$, which correspond to a large
467: simulation time of the order of $t_{max}L^2 \sim t_{max}^2$.
468: This case provides a good example of large--time and
469: large--size systems with pattern formation to test our
470: parallel algorithm described in the previous section.
471:
472: In the corresponding lattice model we consider two kinds of
473: particles A and B.
474: The only possible chemical reaction is desorption of AB, which happen
475: when two particles A and B are next to each other, creating two
476: empty sites. This process occurs with a rate constant $k$.
477: Additionally the particles are allowed to diffuse with
478: rate $D$. This happen when a particle is next to an empty site.
479: We simulate the behavior produced for an initial condition without
480: empty sites and where the same number of A and B particles are
481: initially randomly distributed on the surface: $N_A=N_B=L^2/2$.
482:
483: \begin{figure}
484: \includegraphics[width=8cm]{fig_4}
485: \caption{
486: \label{fig1}
487: Temporal behavior of the concentration of particles for the
488: A$+$B$\to0$ model starting with randomly mixed A and B.
489: The solid line is the result from the simulation using $16$
490: processors and the dots are from using a single processor.
491: The $t^{-1/2}$ curve shows the asymptotic behavior.
492: The two snapshots show the system at different times.
493: }
494: \end{figure}
495:
496: In Fig.\ref{fig1} we present the temporal behavior and also illustrate
497: the segregation process forming regions with high
498: concentration of A or B, which increase in size with time.
499: Also we show how the global concentration, $C=N_A/L^2=N_B/L^2$,
500: diminishes with time following the asymptotic power--law
501: $C \sim t^{-\frac 1 2}$.
502: The system size used is $L=8192$ and the parameters are $k=D=1$.
503: We present two sets of data in Fig.\ref{fig1}.
504: The points correspond to a single
505: processor simulation, and the solid line corresponds to a parallel simulation
506: using $16$ processors. Both simulations start with identical random
507: initial distribution of particles. It is noticeable that this initial
508: distribution almost completely determines the following behavior of the system.
509: The snapshots shown as insert in Fig.\ref{fig1} are from the
510: parallel simulation. They are very similar to the ones obtained from the single
511: processor simulation, which uses the same initial conditions but
512: a different sequence of random numbers to simulate the dynamics.
513: Moreover, in order to check quantitatively the agreement between
514: spatial structures
515: between the single and parallel simulations, we present in Fig.\ref{fig2}
516: the radial correlation function. Again the points correspond
517: to the single processor case and the solid line to the parallel case.
518: A correlation length $r_c$ could be obtained by fitting these correlation
519: functions with $\exp[-(r/r_c)^2]$, which we show in Fig.\ref{fig2}
520: by dashed lines. The obtained values of $r_c$ are plotted in the
521: insert. This shows that the correlation length for this dynamic also
522: follows a power--law $r_c \sim t^{\alpha}$. By numerical fitting we
523: obtain $\alpha = 0.5122 \pm 0.012$.
524: An analytical asymptotical solution for this correlation functions
525: is given in \cite{kot1996}, $\exp(-r^2/4Dt)$ or $\exp[-c(r/\xi(t))^2]$.
526: The previous value obtained for $\alpha$ means that the diffusion length
527: $\xi(t)=\sqrt{Dt}$ define the scale of pattern formation for this
528: reaction model.
529:
530: \begin{figure}
531: \includegraphics[width=8cm]{fig_5}
532: \caption{
533: \label{fig2}
534: The radial correlation function for the A$+$B$\to 0$ model
535: The same times as the snapshots in Fig.\ref{fig1}.
536: The lines are the $16$ processor results and the dots the
537: single processor results.
538: The dashed lines are fits with $\exp[-(r/r_c)^2]$.
539: The respective values $r_c(t)$ are plotted in the insert.
540: }
541: \end{figure}
542:
543: The performance or speed up of the parallel algorithm
544: for the A$+$B$\to 0$ model is shown in Fig.\ref{fig3},
545: using different system sizes $L=256,512,1024,4096,8192$
546: and number of processors $P=1,2,4,8,16$.
547: The simulated time for each computation was set to $t_{max}=500$.
548: The speed was normalized to the speed of the single processor case.
549: In the insert we can see the behavior of the single processor speed
550: for each system sizes.
551: The advantage of use a large number of processors increases when the
552: system size increases, as we expect from the discussion in the previous
553: section.
554:
555: \begin{figure}
556: \includegraphics[width=8cm]{fig_6}
557: \caption{
558: \label{fig3}
559: The speed of the simulation using $P$ processors normalized to the
560: speed of one single processor for the A$+$B$\to 0$ model.
561: Different system sizes
562: $L=256,512,1024,4096,8192$ and number of processors $P=1,2,4,8,16$.
563: In the insert the behavior of the speed of one single processor.
564: }
565: \end{figure}
566:
567: The second model used to test the parallel algorithm is a model
568: for CO oxidation on Pt$(100)$ and Pt$(110)$ surfaces
569: \cite{kuz1998,kor1998b,kor1999,kuz1999,kor1999b}.
570: This system shows different types of kinetic oscillations.
571: On Pt$(100)$ local, irregular oscillations occur in a wide parameter
572: interval, whereas on Pt$(110)$ globally synchronized oscillations exist
573: only in a very narrow parameter interval. Both surfaces exhibit an
574: $\alpha \rightleftharpoons \beta$ surface reconstruction, where $\alpha$
575: denotes the {\it hex} or $1 \times 2$ phase on Pt$(100)$ or Pt$(110)$,
576: respectively. $\beta$ denotes the unreconstructed $1 \times 1$ phase in
577: both cases. Both surfaces have qualitatively quite similar properties
578: with the exception of the dissociative adsorption of O$_2$. The ratio
579: of the sticking coefficients of O$_2$ on the two phases is
580: $s_{\alpha}:s_{\beta} \approx 0.5:1$ for Pt$(110)$ and
581: $s_{\alpha}:s_{\beta} \approx 10^{-2}:1$ for Pt$(100)$ \cite{imb1995}.
582: From the experiments \cite{imb1995} it is known that kinetic
583: oscillations are closely connected with the
584: $\alpha \rightleftharpoons \beta$ reconstruction of the Pt surfaces.
585:
586: In the model \cite{kuz1998,kor1998b,kor1999,kuz1999,kor1999b},
587: CO is able to absorb onto a free surface
588: site with rate constant $y$ and to desorb from the surface with rate
589: constant $k$,
590: independent of the surface phase to which the site belongs.
591: O$_2$ adsorbs dissociatively onto two nearest neighbor sites with
592: rate constant $(1-y)s_{\chi}$ with $\chi = \alpha$,$\beta$.
593: In addition CO is able to diffuse via hopping onto a
594: vacant nearest neighbor site with rate constant $D$. The CO$+$O reaction
595: occurs with rate constant $R$, when CO and O are in nearest neighbor sites
596: desorbing the reaction product CO$_2$.
597: The $\alpha \rightleftharpoons \beta$ surface phase transition is modeled as a
598: linear front propagation induced by the presence of CO in the border
599: between phases with rate constant $V$.
600: Consider two nearest neighbor surface sites in the state $\alpha \beta$.
601: The transition $\alpha \beta \to \alpha \alpha$
602: ($\alpha \beta \to \beta \beta$) occurs if none (at least one) of
603: these two sites is occupied by CO.
604: Summarizing the above transition definitions written in the more usual
605: form of reaction equations gives:
606: \begin{eqnarray*}
607: && \mbox{CO(g)} + S^{\chi} \rightleftharpoons \mbox{CO(a)}, \\
608: && \mbox{O$_2$(g)} + 2S^{\alpha} \to 2\mbox{O(a)}, \\
609: && \mbox{O$_2$(g)} + 2S^{\beta} \to 2\mbox{O(a)}, \\
610: && \mbox{CO(a)} + S^{\chi} \to S^{\chi} + \mbox{CO(a)}, \\
611: && \mbox{CO(a)} + \mbox{O(a)} \to \mbox{CO$_2$(g)} + 2S^{\chi}, \\
612: && S^{\alpha} \rightleftharpoons S^{\beta},
613: \end{eqnarray*}
614: where $S$ stands for a free adsorption site, $\chi$ stands for either
615: $\alpha$ or $\beta$ and (a) or (g) for a particle adsorbed on the
616: surface or in the gas phase, respectively. For additional
617: details see ref. \cite{kuz1998}.
618:
619: Amongst several successful results of this model, we can mention
620: that it was one of the first microscopic models for CO oxidation on Pt
621: including surface
622: reconstruction, which is nowadays widely accepted as the key element
623: in order to get oscillatory behavior. This model reproduce correctly
624: oscillatory regimes for both surfaces Pt$(100)$ and Pt$(110)$, by
625: changing only one parameter $s_{\alpha}$.
626: The diffusion of CO is consider explicitly and could be applied to
627: the fast diffusion regime without modification.
628: With this model an alternative mechanism for
629: global synchronization of oscillation has been suggested
630: \cite{kor1999}, different from the traditional
631: gas--phase coupling. This new mechanism is stochastic resonance,
632: obtained by including a spontaneous nucleation of one surface phase
633: in the other $\alpha \rightleftharpoons \beta$ at very low rates.
634: One unique result reproducing experimental observations is
635: the transition into chaotic behavior via the Feigenbaum route
636: or period doubling \cite{kor1998b}.
637: It is also for this model that the compatibility of both microscopic
638: simulations MC and CA has been study extensively \cite{kor1998}.
639:
640: \begin{figure}
641: \includegraphics[width=8cm]{fig_7a}
642: \includegraphics[width=8cm]{fig_7b}
643: \includegraphics[width=8cm]{fig_7c}
644: \includegraphics[width=8cm]{fig_7d}
645: \caption{
646: \label{fig5}
647: Sequence of snapshots of the model of CO oxidation on Pt$(110)$.
648: The left part shows the chemical species: CO particles are dark--grey,
649: O particles are light--grey, and empty sites are black. The right part
650: shows the structure of the surface: $\alpha$ phase sites are black, and
651: $\beta$ phase sites are white.
652: The parameters are $L=8192$, $D=250$, and $V=1$, $y=0.494$, $k=0.1$, $R=D$.
653: From top to bottom, we show sections from the upper--left corner
654: with sizes $4096 \times 4096$, $1024 \times 1024$, $256 \times 256$.
655: }
656: \end{figure}
657:
658: In Fig.\ref{fig5} we show snapshots from a simulation
659: on Pt$(110)$ and a system size $L=8192$ using the parameter values,
660: $D=250$, $V=1$, $y=0.494$, $k=0.1$, $R=D$. In the left part
661: we plot the chemical species: CO particles are dark--grey,
662: O particles are light--grey, and empty sites are black.
663: The right part shows the structure of the surface: $\alpha$ phase
664: sites are black, and $\beta$ phase sites are white.
665: The pattern formation in this regime shows a spatio--temporal behavior,
666: where a spiral dynamics is the dominant phenomena.
667: It is interesting see the different
668: structures at different spatial scales. For this purpose
669: we include in Fig.\ref{fig5} sections from the upper--left corner
670: with sizes $4096 \times 4096$, $1024 \times 1024$, $256 \times 256$,
671: from top to bottom.
672: This sequence shows that the spiral dynamics occurs on a
673: slowly varying island structure of sizes of the order of $\sqrt{D/V}$.
674: The fact that we can see both mesoscopic and microscopic pattern formation
675: is a quite interesting feature of the model, which has some experimental
676: evidence \cite{win1997,vol1999} and has been studied theoretically in
677: \cite{hil2002} by including lateral interactions between adsorbed particles.
678: The model used here is simpler, because does not need that consideration
679: in order to obtain nanostructures.
680:
681: \begin{figure}
682: \includegraphics[width=8cm]{fig_8}
683: \caption{
684: \label{fig4}
685: The same as Fig.\ref{fig3}, but for the model of CO
686: oxidation on Pt$(110)$.
687: }
688: \end{figure}
689:
690: In Fig.\ref{fig4} we analyze the speed up of the parallel
691: algorithm of this realistic model. We use system sizes,
692: $L=256,512,1024,4096,8192$ and number of nodes $P=1,2,4,8,16$.
693: The simulated time for each computation was also set to $t_{max}=500$.
694: Here we can see that the speed up of the parallel algorithm is
695: good even for small system sizes.
696: This is because the amount of computing
697: in each node for this model is larger than for the A$+$B$\to0$ model,
698: while the amount of communication data is the same in both models.
699:
700: \section{Conclusions}
701:
702: In this paper we present a tool to obtain scaling laws
703: connecting experimental system sizes and diffusion coefficients to
704: standard values in microscopic MC simulations. By using a CA
705: equivalent to MC simulation we provide an efficient parallelization
706: algorithm. We have explained in detail how to implement the parallelization.
707: The speed up of the algorithm is almost ideal and it is much better
708: for larger system sizes and more complex models.
709: A full description and analysis of the scaling laws for the second
710: model used here is in preparation \cite{sal2002}.
711:
712: \begin{acknowledgments}
713: We thank J.J.~Lukkien and S.~Nedea for stimulating discussions.
714: This work was supported by the Nederlanse Organisatie voor
715: Wetenschapperl\"yk Onderzoek (NWO), and the EC Excellence Center of
716: Advanced Material Research and Technology (contract N 1CA1-CT-2080-7007).
717: We thank the National Research School Combination Catalysis (NRSCC)
718: for computational facilities.
719: \end{acknowledgments}
720:
721: \begin{thebibliography}{3}
722:
723: \bibitem{rab2000} M.I.~Rabinovich, A.B.~Ezersky, P.D.~Weidman,
724: \newblock{\it The dynamics of patterns}. Word Scientific, Singapure, (2000).
725:
726: \bibitem{imb1995} R.~Imbihl, G.~Ertl,
727: \newblock{\em Chem. Rev.} {\bf 95}, 697 (1995).
728:
729: \bibitem{wal1997} D.~Walgraef, \newblock{\it Spatio--Temporal Pattern
730: Formation: With Examples from Physics, Chemistry, and Materials Science}.
731: Springer Verlag, Berlin, (1997).
732:
733: \bibitem{win2002} J.~Wintterlin, \newblock{\em Chaos}, {\bf 12}, 108 (2002).
734:
735: \bibitem{kam1981} N.G.~van Kampen \newblock{\it Stochastic Processes
736: in Physics and Chemistry}. North--Holland, Amsterdam, (1981).
737:
738: \bibitem{gil1977} D.T.~Gillespie, \newblock{\em J. Phys. Chem.}
739: {\bf 81}, 2340 (1977).
740:
741: \bibitem{jan1995} A.P.J.~Jansen, \newblock{\em Comp. Phys. Comm.}
742: {\bf 86}, 1 (1995).
743:
744: \bibitem{luk1998} J.J.~Lukkien, J.P.L~Segers, P.A.J.~Hilbers, R.J.~Gelten, A.P.J.~Jansen, \newblock{\em Phys. Rev. E}
745: {\bf 58}, 2598 (1998).
746:
747: \bibitem{zve2001} G.~Zvejnieks and V.N.~Kuzovkov, Phys. Rev. E,
748: {\bf 65}, 051104 (2001).
749:
750: \bibitem{win1997} J.~Wintterlin, S.~V\"olkening, T.V.W.~Janssens,
751: T.~Zambelli, G.~Ertl, \newblock{\em Science} {\bf 278}, 1931 (1997).
752:
753: \bibitem{vol1999} S.~V\"olkening, K.~Bed\"urftig, K.~Jacobi, J.~Wintterlin,
754: G.~Ertl, \newblock{\em Phys. Rev. Lett.} {\bf 83}, 2672 (1999).
755:
756: \bibitem{tof1987} T.~Toffoli, N.~Margolus, \newblock{\it Cellular
757: Automata Machines}. MIT Press, Massachusetts, (1987).
758:
759: \bibitem{wei1997} J.R.~Weimar, \newblock{\it Simulation with Cellular
760: Automata}. Logos Verlag, Berlin (1997).
761:
762: \bibitem{kor1998} O.~Kortl\"uke, \newblock{\em J. Phys. A}
763: {\bf 31}, 9185 (1998).
764:
765: \bibitem{rev2000} \newblock{\it Cellular automata; From modeling to
766: applications}. Special Issue, Parallel Computing 27 (2001).
767:
768: \bibitem{wei2000} J.R.~Weimar, \newblock{\em Parallel Computing}
769: {\bf 27}, 601 (2001).
770:
771: \bibitem{cho1988} B.~Chopard, M. Droz, \newblock{\em J. Phys. A}
772: {\bf 21}, 205 (1988).
773:
774: \bibitem{cho1991} B.~Chopard, M. Droz, \newblock{\em J. Stat. Phys.}
775: {\bf 64}, 859 (1991).
776:
777: \bibitem{mai1991} J.~Mai, W.~von Niessen, \newblock{\em Phys. Rev. A}
778: {\bf 44}, R6165 (1991).
779:
780: \bibitem{mai1992} J.~Mai, W.~von Niessen, \newblock{\em Chem. Phys.}
781: {\bf 165}, 57 (1992).
782:
783: \bibitem{mai1992b} J.~Mai, W.~von Niessen, \newblock{\em Chem. Phys.}
784: {\bf 165}, 65 (1992).
785:
786: \bibitem{mai1993} J.~Mai, W.~von Niessen, \newblock{\em J. Chem. Phys.}
787: {\bf 98}, 2032 (1993).
788:
789: %\bibitem{mai1993b} J.~Mai, PhD. Thesis (1993).
790:
791: %\bibitem{mai1994} J.~Mai, V.~Kuzovkov, W.~von Niessen
792: %\newblock{\em Physica A} {\bf 203}, 298 (1994).
793:
794: \bibitem{note2} By using $P$ processors and a system size $L^2$
795: the total interface between blocks is for the squares
796: $2 L \sqrt{P}$ and for the strips $L P$. However, due
797: to the cyclic tiling sequence shown in Fig.\ref{strips} we
798: reduce the amount of data to be send in half for the strips case.
799: The ratio of data to be sent between squares and strips then is:
800: $(2L\sqrt{P})/(LP/2)=4/\sqrt{P}$.
801:
802: \bibitem{knu1998} D.E.~Knuth, \newblock{\em The art of computer
803: programming, Vol.2: Seminumerical algorithms}. Addison--Wesley,
804: Amsterdam, (1998).
805:
806: \bibitem{kuz1998} V.N.~Kuzovkov, O.~Kortl\"uke, W.~von Niessen,
807: \newblock{\em J. Chem. Phys} {\bf 108}, 5571 (1998).
808:
809: \bibitem{kor1998b} O.~Kortl\"uke, V.N.~Kuzovkov, W.~von Niessen,
810: \newblock{\em Phys. Rev. Lett.} {\bf 81}, 2164 (1998).
811:
812: \bibitem{kor1999} O.~Kortl\"uke, V.N.~Kuzovkov, W.~von Niessen,
813: \newblock{\em Phys. Rev. Lett.} {\bf 83}, 3089 (1999).
814:
815: \bibitem{kuz1999} V.N.~Kuzovkov, O.~Kortl\"uke, W.~von Niessen,
816: \newblock{\em Phys. Rev. Lett.} {\bf 83}, 1636 (1999).
817:
818: \bibitem{kor1999b} O.~Kortl\"uke, V.N.~Kuzovkov, W.~von Niessen,
819: \newblock{\em J. Chem. Phys} {\bf 110}, 11523 (1999).
820:
821: \bibitem{ovc1978} A.A.~Ovchinnikov, Ya.B.~Zeldovich, Chem. Phys.
822: {\bf 28}, 215 (1978).
823:
824: \bibitem{kuz1988} V.N.~Kuzovkov, E.A.~Kotomin
825: \newblock{\em Rep. Prog. Phys.} {\bf 51}, 1479 (1988).
826:
827: \bibitem{kot1996} E.A.~Kotomin, V.N.~Kuzovkov, {\it Modern Aspects
828: of Diffusion-Controlled Reactions: Cooperative Phenomena in
829: Bimolecular Processes}. North--Holland, Amsterdam, Vol.34, (1996).
830:
831: \bibitem{pri1997} V.~Privman, \newblock{\it Nonequilibrium Statistical
832: Mechanics in One Dimension}. Universitary Press, Cambridge (1997).
833:
834: \bibitem{mar1999} J.~Marro, R.~Dickman, \newblock{\it Nonequilibrium
835: phase transitions in lattice models}. Universitary Press, Cambridge (1999).
836:
837: \bibitem{arg2001} P.~Argyrakis, S.F.~Burlatsky, E.~Clement, G.~Oshanin,
838: Phys. Rev. E. {\bf 63}, 021110 (2001).
839:
840: \bibitem{lee1995} B.P.~Lee, J.~Cardy,
841: J. Stat. Phys. {\bf 80}, 971 (1995).
842:
843: \bibitem{tou1983} D.~Toussaint, F.~Wilczek, J. Chem.
844: Phys. {\bf 78}, 2642 (1983).
845:
846: \bibitem{hil2002} M.~Hildebrand, Chaos, {\bf 12}, 144 (2002).
847:
848: \bibitem{sal2002} R.~Salazar, A.P.J.~Jansen, V.N.~Kuzovkov
849: Preprint.
850:
851: \end{thebibliography}
852:
853: \newpage
854:
855: \end{document}
856:
857:
858: