0207:nlin0207059/paper.tex

1: \documentclass[twocolumn,showpacs,preprintnumbers,amsmath,amssymb]{revtex4}

2:

3: \usepackage{graphicx}

4: \usepackage{dcolumn}

5: \usepackage{bm}

6:

7: \begin{document}

8:

9: \title{Parallelization of Cellular Automata

10: for Surface Reactions}

11:

12: \author{R. Salazar}

13: \email{R.Salazar@tue.nl}

14: \author{A.P.J. Jansen}

15: \email{tgtatj@chem.tue.nl}

16: \affiliation{

17: Schuit Institute of Catalysis (ST/SKA), Eindhoven University of

18: Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands.}

19:

20: \author{V.N. Kuzovkov}

21: \email{kuzovkov@latnet.lv}

22: \affiliation{

23: Institute for Solid State Physics, University of Latvia,

24: Kengaraga 8, LV-1063, Riga, Latvia.}

25:

26: \date{\today}

27:

28: \begin{abstract}

29: We present a parallel implementation of cellular automata

30: to simulate chemical reactions on surfaces. The scaling

31: of the computer time with the number of processors for this parallel

32: implementation is quite close to the ideal $T/P$, where $T$ is the computer

33: time used for one single processor and $P$ the number of processors. Two

34: examples are presented to test the algorithm, the simple A$+$B$\to0$ model

35: and a realistic model for CO oxidation on Pt$(110)$.

36: By using large parallel simulations, it is possible to derive

37: scaling laws which allow us to extrapolate to even larger system sizes

38: and faster diffusion coefficients allowing us to make direct comparisons

39: with experiments.

40: \end{abstract}

41:

42: \pacs{ 82.65.+r; 82.20.Wt; 02.70.Tt; 82.40.Np; 89.75.Da}

43:

44: \keywords{Parallel Cellular Automata, Heterogeneous catalysis,

45: Scaling laws}

46:

47: \maketitle

48:

49: \section{Introduction}

50:

51: One of the most interesting features of surface reactions

52: is that in a large number of cases produce pattern formation,

53: structures with some well--defined length scale, sometimes with symmetries

54: and temporal behavior, such as oscillations, traveling waves,

55: spirals, Turing patterns, etc \cite{rab2000,imb1995}.

56: A usual approach to study this

57: pattern formation is reaction--diffusion (RD) equations \cite{wal1997},

58: which simulate the dynamic behavior of chemical reactions on surfaces.

59: However, these partial

60: differential equations give only approximate solutions and in

61: several cases completely wrong results, because they are based on

62: the local mean field approximation, meaning well--mixed reactants

63: at microscopic level, ignoring all the correlation terms between

64: reactants locally, fluctuations and lateral interactions in the

65: adsorbate \cite{win2002}.

66: The RD equations describe the coverage which are macroscopic continuum

67: variables which neglect the discrete structure of matter,

68: and do not describe the actual chemical process underlying pattern

69: formation.

70: In fact from experimental studies \cite{win2002} it is known that

71: to model correctly, a modified kinetic has to be assumed different

72: from the prescribed for RD.

73:

74: Based in some general assumptions of the physics processes involved,

75: an esencial master equation can be derived that completely describes the

76: microscopic dynamic of the system \cite{kam1981}.

77: An exact method to solve this master equation is the Monte Carlo

78: (MC) method \cite{gil1977,jan1995,luk1998,zve2001}. In a Monte Carlo method a

79: sequence of discrete events (reactions, including diffusion) is generated

80: on a 2D lattice which represents the surface sites. These events

81: are generated in general in a random way, looking for possible

82: enabled reactions between nearest neighbors on the lattice, and

83: doing the reactions with some probability in correspondence with

84: some defined reaction rates.

85:

86: To compare MC simulations with experimental

87: pattern formation it is necessary to fill the gap between the length scale of

88: the individual particles and the diffusion length.

89: The regime where spatio--temporal patterns usually occurs, from

90: $\mu$m to mm scales, is orders of magnitude

91: larger than the nm scale of individual particles.

92: However, some new experiments \cite{win1997,vol1999} show that the

93: fast kinetics processes

94: are typically accompanied by the appearance of nanostructures.

95: Only microscopic simulations could deal with this two--scales behavior.

96: However this regime implies lattices sizes above $10^6 \times 10^6$

97: and very large values for the diffusion rates to produce agreement with

98: the experimental observed scales on pattern formation.

99: This would be a very large and slow simulation, due to the huge number

100: of particles involved and the fast diffusion rates which means that most

101: of the simulation time is spend doing diffusion of particles instead of

102: chemical reactions.

103: Fortunately, we do not need to simulate every time such a macroscopic system

104: or experimental diffusion rates. We only need to find scaling laws

105: for lengths and diffusion coefficients, i.e from nm scale to $\mu$m

106: or mm scale, and from microscopic to real time. This is the possibility

107: which we explore in this paper by using large simulations within parallel

108: computers.

109:

110: Although only MC simulations provides solutions to the exact master

111: equations for the surface reactions, they are not suitable for efficient

112: parallelization due to the random selection of lattice sites used.

113: However, there is another important approach to simulate discrete

114: events on lattices, the Cellular Automata (CA)

115: \cite{tof1987,wei1997}. This approach is fully parallel

116: in the sense that all the lattice sites can be updated simultaneously.

117: This has the advantages that fewer random numbers are required,

118: only those for the reaction probabilities (CA codes are faster than MC),

119: and the global updating is easier to implement in a parallel code.

120:

121: Under some well--defined conditions a CA is equivalent to a MC

122: simulation (see ref. \cite{kor1998} and section II).

123: Then a CA could be used as

124: a MC simulation and its parallelization will provide with the

125: required scaling laws.

126: In this paper we present for first time an attempt to provide a tool

127: to get these scaling laws.

128: The fact that CA are ideal simulation

129: methods for parallelization is shown in

130: a recent special issue dedicated to CA in {\em Parallel Computing}

131: \cite{rev2000}.

132: In particular a quite interesting paper by J.R.~Weimar \cite{wei2000}

133: presents an object oriented parallel CA, also based in ref.

134: \cite{kor1998}, and explores the possibility of divide the surface in

135: regions to be simulated by RD or CA according to the level of detail

136: required.

137:

138: In this paper we will describe in detail the implementation of the

139: parallel version of a CA simulating

140: chemical reactions on surfaces, but the ideas discussed here are

141: also applicable to more general CA simulations.

142: In section II we will describe briefly the Cellular Automata method.

143: In section III we explain the key points to implement the parallelization

144: of the algorithm. In section IV we will present results of the application

145: of the parallel algorithm to the A$+$B$\to0$ model, and to a realistic

146: model of oxidation of CO on Pt$(110)$.

147: Finally in section V we will draw some conclusion.

148:

149:

150: \section{Cellular Automata}

151:

152: A cellular automaton (CA) is a regular array of cells.

153: Each cell can be in one of a set of possible states.

154: The CA evolves in time in discrete steps by changing the

155: states of all cells simultaneously.

156: The next state which a cell will take is based on the previous

157: state of the cell and the states of some neighboring cells.

158: A CA is defined by providing prescriptions for

159: the lattice, the set of states, the neighbors, and the

160: transition rules \cite{tof1987,wei1997}.

161: The idea of CA deserves by itself a lot of study, since

162: in general the evolution of a CA cannot be predicted other than

163: by executing it. Additionally the amount of possible CAs is

164: quite large. For instance, in the simplest one--dimensional case,

165: with two states and two neighbors, there are $256$ possible CA.

166:

167: Our CA uses the standard square two--dimensional lattice

168: of size $L\times L$.

169: The states of each cell represent occupation with some kind of particle.

170: The Margolus neighborhood \cite{tof1987} is used

171: instead of the common von Neumann neighborhood. Both definitions of

172: neighborhood are shown in Fig.\ref{margolus}.

173: In order to obey the CA laws in von Neumann neighborhood, it is

174: necessary \cite{cho1988} disobey the laws of stoichiometry because

175: one particle could participate in more than one reactive pair.

176: Similar problems arise with the diffusion of particles \cite{cho1991}.

177: The use of a Margolus neighborhood overcomes these difficulties

178: \cite{mai1991,mai1992,mai1992b,mai1993}.

179: Using the value of the chemical rates we set up some

180: probabilistic transition rules to change the states of the cells

181: inside each Margolus block.

182:

183: \begin{figure}

184: \includegraphics[width=6cm]{fig_1}

185: \caption{

186: \label{margolus}

187: Von Neumann neighborhood (left) and Margolus neighborhood (right).

188: The lines joining points show the nearest neighbor couples in each case.

189: }

190: \end{figure}

191:

192: The Margolus blocks, Fig.\ref{margolus}, are used in the following way.

193: The blocks are periodically repeated to build up a tiled mask over the

194: whole lattice, considering periodic boundary conditions.

195: In this way there are four possible tilings as shown in

196: Fig.\ref{tiling}.

197: Only neighbor sites belonging to the same

198: block can react, so all the blocks can be accessed in parallel.

199: Inside each block a Monte Carlo update is done. After a full lattice

200: update the whole procedure starts over again using another of the

201: four possible tilings. The dynamics is not confined to blocks,

202: because the boundary between blocks is changing from one global sweep

203: to the next.

204:

205: For the four tilings shown in Fig.\ref{tiling}, we chose

206: randomly one of the four possible sequences of tilings:

207: ($1$,$2$,$3$,$4$), ($2$,$3$,$4$,$1$), ($3$,$4$,$1$,$2$), and

208: ($4$,$1$,$2$,$3$).

209: They show always the same clockwise cyclic sequence, but starting at a

210: different tiling. This produces better boundary diffusion and mixing between

211: blocks.

212:

213: \begin{figure}

214: \includegraphics[width=8cm]{fig_2}

215: \caption{

216: \label{tiling}

217: The four possible tilings using Margolus blocks. The arrows show the

218: sequence order.

219: The lines joining points illustrate the MC updating of pairs

220: inside blocks.

221: }

222: \end{figure}

223:

224: A schematic description of the steps to implement this CA is given

225: at the following:

226: \begin{itemize}

227: \item[1. ] Choose randomly one of the four possible

228: sequences of tilings: ($1$,$2$,$3$,$4$), ($2$,$3$,$4$,$1$), ($3$,$4$,$1$,$2$),

229: ($4$,$1$,$2$,$3$).

230: \item[2. ] Choose consecutive tilings from the sequence of step 1.

231: \item[3. ] Sweep over all the blocks and in each block make a single

232: Monte Carlo update:

233: \begin{itemize}

234: \item[3.1. ] Choose randomly one pair of neighbors.

235: \item[3.2. ] Choose a reaction $i$ from the set of all the

236: possible reactions with a probability proportional to the reaction

237: rate $k_i$.

238: \item[3.3. ] Check if the reaction chosen in step 3.2 is possible on the

239: sites chosen in step 3.1. If it is possible do the reaction,

240: otherwise do nothing.

241: \end{itemize}

242: \item[4. ] Increase the time by $\Delta t/4$, and return to step 2 until

243: the sequence of tilings is completed.

244: \item[5. ] Return to step 1.

245: \end{itemize}

246:

247: The updating scheme inside each block is the same as in the

248: Dynamic Monte Carlo algorithm called Random Selection Method

249: \cite{luk1998,zve2001}.

250: Consequently, the time increment is the same as one Monte Carlo

251: Step (MCS) \cite{luk1998}:

252: \begin{equation}

253: \Delta t = \frac 1 {L^2 \sum_i k_i}

254: \end{equation}

255: where the sum in the denominator corresponds to the sum of all the

256: possible reaction rates.

257:

258: There is a large number of possible CA prescriptions which could try to

259: reproduce a MC simulation of chemical reactions on surfaces.

260: In fact, an extensive study made by J.~Mai

261: \cite{mai1991,mai1992,mai1992b,mai1993}

262: shows that in general it is difficult produce good CAs reproducing

263: chemical reactions and diffusion.

264: However, in a recent paper Kortl\"uke \cite{kor1998} studies under

265: which conditions a CA could reproduce adequately a MC simulation

266: of chemical reactions on surfaces.

267: He found that the main requirement is using large diffusion coefficients.

268: It is required some sort of compromise between MC and CA:

269: it is necessary use a regular array of blocks as in CA, and a MC update

270: scheme has to be used inside each block.

271: The CA which we use here is a small modification of that used by Kortl\"uke

272: originally \cite{kor1998}.

273: We use blocks of $4$--sites (Margolus blocks) instead of

274: $2$--sites (Hantels blocks).

275: The main advantage of using this $4$--sites instead of $2$--sites

276: is that reduces the level of CA noise (see \cite{kor1998}) inside

277: each block.

278: This CA noise is the difference between the diffusion simulated with

279: CA and the correct diffusion simulated with MC.

280: In fact using larger blocks the noise will be smaller.

281:

282: Additionally we present here a full realization of the CA parallel

283: computing idea by implementing this in a parallel code as is shown in

284: the next section.

285: This is the main reason for use CA instead MC as a solution for large--time

286: and large--sizes simulations. Otherwise the only use of CA instead MC in

287: a serial simulation does not speed up the simulation substantially, in

288: the best cases only $\sim 10$--$20\%$. This means that the implementation of

289: an efficient CA is very important.

290:

291: Despite that we have to accept that our CA is an approximation to exact MC

292: microscopic simulations, we known when the approximation is good

293: and we can always check that the CA is reproducing the MC results.

294: In this way we can think about the CA as the MC realization of the exact

295: master equation for the chemical processes, getting in this way,

296: the possibility of obtain the scaling laws mentioned at the introduction

297: for these physical systems. In this paper we do not discus the problem

298: of quality of the CA approximation to the MC realization. This was done in

299: \cite{kor1998} and we have made for our modified CA a similar study with

300: similar results.

301:

302: The fact that blocks can be updated without referring to other

303: blocks allows us to do CA as a parallel algorithm, because

304: the whole set of blocks can be updated at the same time.

305: This is the subject of the next section.

306:

307: \section{Parallelization}

308:

309: The aim of the parallelization of any code is to distribute the whole

310: simulation over several computer processors, also called nodes.

311: We define $speed(P)=1/T_P$, where $T_P$ is the computer time used by

312: $P$--processors to complete a simulation.

313: The optimum result is that this multiplies the speed of the simulation

314: with the number of nodes, $speed(P)= P \times speed(1)$.

315: However, the usual result of parallelize a code is not the optimum.

316: The key point to achieve a good speed up is that the time spent by

317: each node sending and receiving information from/to the other nodes is

318: small in comparison with the time consumed doing computing in each node.

319:

320: In order to distribute the work, we use here a geometrical division

321: as usual \cite{rev2000} in spatial extended systems, i.e., the full

322: lattice is divide in sublattices of equal area.

323: The time spent sending data is proportional to the length of

324: the borders, $\sim L$, and the time doing computing is proportional

325: to the system size $\sim L^2$.

326: We divide the lattice in strips as is shown in Fig.\ref{strips}.

327: Considering the global periodic boundary conditions, each node has

328: periodicity in one direction and in the other directions has to share

329: information with only two neighbor nodes.

330: Another possibility is divide the lattice in squares. This choice is less

331: convenient. Each node has to share information with 4 nodes instead 2,

332: and it is only possible to use a perfect square number of nodes

333: $P=4,9,16,\dots$.

334: Using the strips sublattices requires sending a factor $4/\sqrt{P}$

335: less data than using

336: the square sublattices \cite{note2}, provided to have $P \le 16$.

337:

338: \begin{figure}

339: \includegraphics[width=8cm]{fig_3}

340: \caption{

341: \label{strips}

342: Distribution of the lattice in strips sublattices for

343: computing in each node. Division for each tiling

344: shown in Fig.\ref{tiling}.

345: }

346: \end{figure}

347:

348: Note that due to the periodicity in the vertical direction inside

349: each node, it is not necessary to interchange border information between nodes

350: when we pass from the tiling $1 \to 2$ and $3 \to 4$. It is only necessary

351: when we pass from $2 \to 3$ and $4 \to 1$. This reduces in half the

352: amount of communication data.

353: From Fig.\ref{strips} we see the direction in which the first column of

354: data in each sublattice has to be

355: be sent and received in each node. In the change of tilings $2 \to 3$ the

356: data goes from left to right and in $4 \to 1$ from right to left.

357: Instead of shifting the data of each node horizontally we use the

358: first column for sending to or receiving from other nodes.

359: As a consequence, and to get interaction between sites in the

360: same block, we consider the first column neighbor of the last one.

361: For this Blocking CA, where the reactions are confined inside the blocks,

362: this produces an effective periodicity in the horizontal direction.

363: This extra periodic boundary condition inside each sublattice,

364: makes the final parallel code very similar to a full lattice

365: implementation.

366:

367: We use the SPMD (single program, multiple data) model to implement the

368: parallel simulation, in which every node executes the same code using

369: different sublattice.

370: Also, we use a main node, called node zero, dedicated additionally to

371: distribute and collect the global information needed for the input/output

372: of the simulation and other global computing required.

373: Every node executes the same code, and

374: when some special code has to be

375: executed for only some nodes, is necessary use a node

376: identifier number $p$. From the global periodic boundary conditions and

377: the shape of the sublattices, there is periodicity also in the sequence of

378: nodes: the node left with respect to node $p=0$ is node $p=P-1$ (P is

379: the total number of nodes) and the node right with respect to node $p=P-1$

380: is node $p=0$.

381:

382: In the following we present the CA code for a single node.

383: \begin{itemize}

384: \item[1. ] $p=0$: Chose randomly one of the four possible tiling

385: sequences: ($1$,$2$,$3$,$4$), ($2$,$3$,$4$,$1$), ($3$,$4$,$1$,$2$),

386: ($4$,$1$,$2$,$3$). Send that choice to the other nodes.\\

387: $p>0$: Receive the information of which tiling sequences has been

388: chosen.

389: \item[2. ] $\forall p$: Choose consecutive tilings from the sequence

390: of step 1.

391: \item[N1. ] $\forall p$: If the previous tiling was $1$ or $2$ and

392: the new tiling is $3$ or $4$ then \\

393: send the first column to the left node $p-1$, \\

394: receive the data from right node $p+1$ and put it in the first column.

395: \item[N2. ] $\forall p$: If the previous tiling was $3$ or $4$ and

396: the new tiling is $1$ or $2$ then \\

397: send the first column to the right node $p+1$ \\

398: receive the data from left node $p-1$ and put it in the first column.

399: \item[3. ] $\forall p$: Sweep over all the blocks and in each block make a

400: single Monte Carlo update.

401: \item[4. ] $\forall p$: Increase the time by $\Delta t/4$, and return to

402: step 2 until the tiling sequence is completed.

403: \item[5. ] $\forall p$: Return to step 1.

404: \end{itemize}

405:

406: We have modified step $1$ and added two new steps $N1$ and $N2$, to

407: interchange information between the nodes. The rest is basically the same as

408: the single processor code, but applied to the respective sublattices for each

409: node. In order to avoid undesired correlations between different sublattices,

410: special attention should be given to a good random number generator,

411: producing different sequence of random numbers within each node

412: \cite{knu1998}.

413:

414: For implementation of the interprocess communication and synchronization,

415: the message passing interface MPI library has been selected,

416: because it provides source portability to different kind of computers.

417: This library provide, amongst a set of specialized and complete communication

418: routines, a so called six--basic set of routines for interprocessor

419: communication:

420: initialization, termination, getting the set of nodes, getting its own node

421: number, sending data to other nodes, and receiving data from other nodes.

422:

423: There are computers with connection between nodes using giga--ethernet or

424: several processors inside the same machine, sharing the same

425: memory.

426: These computers represent optimal environments to test parallel

427: algorithms, i.e. high--end supercomputers like Cray T3E, and middle--end

428: supercomputers like Silicon Origin 2000.

429: However, we will show in the next section that the results of running

430: this parallel CA algorithm in a low--end Beowulf cluster of PCs connected

431: only via fast--ethernet produce already almost the ideal speed up.

432:

433: \section{Results}

434:

435: The parallel algorithm was tested on our local cluster of PCs,

436: (17 Athlon 1.1Ghz/256Mb, fast--ethernet, Linux 2.4.18, MPICH 1.1.2).

437: From the results we see that the improvement of the performance

438: using $P$ processors with respect to a single processor is almost the

439: ideal, $speed(P) = P \times speed(1)$.

440: In order to test the speed up of the algorithm, we will use two

441: systems from surface catalysis:

442: the A$+$B$\to0$ model and a model for CO oxidation on Pt$(110)$ surface

443: \cite{kuz1998}, which has produced several important results

444: \cite{kor1998b,kor1999,kuz1999,kor1999b}.

445:

446: The A$+$B$\to0$ model has been studied for a long time

447: \cite{ovc1978,kuz1988,kot1996,pri1997,mar1999,arg2001}

448: In the pioneering analitycal paper by Ovchinnikov and Zeldovich

449: \cite{ovc1978} it was shown for first time that the kinetic law of

450: mass action is violated in this model, producing incorrect results

451: when standard chemical kinetics is used. An illustrative case, also used

452: in our paper, is a situation with equal concentrations of

453: both reactants $C_A=C_B=C$, where the standard kinetics predicts

454: an asymptotic behavior $C \propto t^{-1}$. This prediction correspond

455: to the mean--field approximation and it is only valid for high

456: dimensional systems \cite{ovc1978}, ${\cal D}\ge$4.

457: In the low dimensional systems ${\cal D}<$4 with diffusion controlled

458: processes it has been proved, using renormalization group

459: arguments \cite{lee1995}, that the correct asymptotic behavior is:

460: $C \propto t^{-{\cal D}/4}$. A qualitative agreement with this behavior

461: was shown for first time using MC simulations in reference \cite{tou1983}.

462:

463: The asymptotic law needs large simulation time

464: $t_{max}$. The diffusion length $\xi(t)=\sqrt{Dt}$ defines the pattern

465: formation scale.  A simulation until $t=t_{max}$ needs a lattice

466: length $L \gg \xi(t_{max})$, which correspond to a large

467: simulation time of the order of $t_{max}L^2 \sim t_{max}^2$.

468: This case provides a good example of large--time and

469: large--size systems with pattern formation to test our

470: parallel algorithm described in the previous section.

471:

472: In the corresponding lattice model we consider two kinds of

473: particles A and B.

474: The only possible chemical reaction is desorption of AB, which happen

475: when two particles A and B are next to each other, creating two

476: empty sites. This process occurs with a rate constant $k$.

477: Additionally the particles are allowed to diffuse with

478: rate $D$. This happen when a particle is next to an empty site.

479: We simulate the behavior produced for an initial condition without

480: empty sites and where the same number of A and B particles are

481: initially randomly distributed on the surface: $N_A=N_B=L^2/2$.

482:

483: \begin{figure}

484: \includegraphics[width=8cm]{fig_4}

485: \caption{

486: \label{fig1}

487: Temporal behavior of the concentration of particles for the

488: A$+$B$\to0$ model starting with randomly mixed A and B.

489: The solid line is the result from the simulation using $16$

490: processors and the dots are from using a single processor.

491: The $t^{-1/2}$ curve shows the asymptotic behavior.

492: The two snapshots show the system at different times.

493: }

494: \end{figure}

495:

496: In Fig.\ref{fig1} we present the temporal behavior and also illustrate

497: the segregation process forming regions with high

498: concentration of A or B, which increase in size with time.

499: Also we show how the global concentration, $C=N_A/L^2=N_B/L^2$,

500: diminishes with time following the asymptotic power--law

501: $C \sim t^{-\frac 1 2}$.

502: The system size used is $L=8192$ and the parameters are $k=D=1$.

503: We present two sets of data in Fig.\ref{fig1}.

504: The points correspond to a single

505: processor simulation, and the solid line corresponds to a parallel simulation

506: using $16$ processors. Both simulations start with identical random

507: initial distribution of particles. It is noticeable that this initial

508: distribution almost completely determines the following behavior of the system.

509: The snapshots shown as insert in Fig.\ref{fig1} are from the

510: parallel simulation. They are very similar to the ones obtained from the single

511: processor simulation, which uses the same initial conditions but

512: a different sequence of random numbers to simulate the dynamics.

513: Moreover, in order to check quantitatively the agreement between

514: spatial structures

515: between the single and parallel simulations, we present in Fig.\ref{fig2}

516: the radial correlation function. Again the points correspond

517: to the single processor case and the solid line to the parallel case.

518: A correlation length $r_c$ could be obtained by fitting these correlation

519: functions with $\exp[-(r/r_c)^2]$, which we show in Fig.\ref{fig2}

520: by dashed lines. The obtained values of $r_c$ are plotted in the

521: insert. This shows that the correlation length for this dynamic also

522: follows a power--law $r_c \sim t^{\alpha}$. By numerical fitting we

523: obtain $\alpha = 0.5122 \pm 0.012$.

524: An analytical asymptotical solution for this correlation functions

525: is given in \cite{kot1996}, $\exp(-r^2/4Dt)$ or $\exp[-c(r/\xi(t))^2]$.

526: The previous value obtained for $\alpha$ means that the diffusion length

527: $\xi(t)=\sqrt{Dt}$ define the scale of pattern formation for this

528: reaction model.

529:

530: \begin{figure}

531: \includegraphics[width=8cm]{fig_5}

532: \caption{

533: \label{fig2}

534: The radial correlation function for the A$+$B$\to 0$ model

535: The same times as the snapshots in Fig.\ref{fig1}.

536: The lines are the $16$ processor results and the dots the

537: single processor results.

538: The dashed lines are fits with $\exp[-(r/r_c)^2]$.

539: The respective values $r_c(t)$ are plotted in the insert.

540: }

541: \end{figure}

542:

543: The performance or speed up of the parallel algorithm

544: for the A$+$B$\to 0$ model is shown in Fig.\ref{fig3},

545: using different system sizes $L=256,512,1024,4096,8192$

546: and number of processors $P=1,2,4,8,16$.

547: The simulated time for each computation was set to $t_{max}=500$.

548: The speed was normalized to the speed of the single processor case.

549: In the insert we can see the behavior of the single processor speed

550: for each system sizes.

551: The advantage of use a large number of processors increases when the

552: system size increases, as we expect from the discussion in the previous

553: section.

554:

555: \begin{figure}

556: \includegraphics[width=8cm]{fig_6}

557: \caption{

558: \label{fig3}

559: The speed of the simulation using $P$ processors normalized to the

560: speed of one single processor for the A$+$B$\to 0$ model.

561:  Different system sizes

562: $L=256,512,1024,4096,8192$ and number of processors $P=1,2,4,8,16$.

563: In the insert the behavior of the speed of one single processor.

564: }

565: \end{figure}

566:

567: The second model used to test the parallel algorithm is a model

568: for CO oxidation on Pt$(100)$ and Pt$(110)$ surfaces

569: \cite{kuz1998,kor1998b,kor1999,kuz1999,kor1999b}.

570: This system shows different types of kinetic oscillations.

571: On Pt$(100)$ local, irregular oscillations occur in a wide parameter

572: interval, whereas on Pt$(110)$ globally synchronized oscillations exist

573: only in a very narrow parameter interval. Both surfaces exhibit an

574: $\alpha \rightleftharpoons \beta$ surface reconstruction, where $\alpha$

575: denotes the {\it hex} or $1 \times 2$ phase on Pt$(100)$ or Pt$(110)$,

576: respectively. $\beta$ denotes the unreconstructed $1 \times 1$ phase in

577: both cases. Both surfaces have qualitatively quite similar properties

578: with the exception of the dissociative adsorption of O$_2$. The ratio

579: of the sticking coefficients of O$_2$ on the two phases is

580: $s_{\alpha}:s_{\beta} \approx 0.5:1$ for Pt$(110)$ and

581: $s_{\alpha}:s_{\beta} \approx 10^{-2}:1$ for Pt$(100)$ \cite{imb1995}.

582: From the experiments \cite{imb1995} it is known that kinetic

583: oscillations are closely connected with the

584: $\alpha \rightleftharpoons \beta$ reconstruction of the Pt surfaces.

585:

586: In the model \cite{kuz1998,kor1998b,kor1999,kuz1999,kor1999b},

587: CO is able to absorb onto a free surface

588: site with rate constant $y$ and to desorb from the surface with rate

589: constant $k$,

590: independent of the surface phase to which the site belongs.

591: O$_2$ adsorbs dissociatively onto two nearest neighbor sites with

592: rate constant $(1-y)s_{\chi}$ with $\chi = \alpha$,$\beta$.

593: In addition CO is able to diffuse via hopping onto a

594: vacant nearest neighbor site with rate constant $D$. The CO$+$O reaction

595: occurs with rate constant $R$, when CO and O are in nearest neighbor sites

596: desorbing the reaction product CO$_2$.

597: The $\alpha \rightleftharpoons \beta$ surface phase transition is modeled as a

598: linear front propagation induced by the presence of CO in the border

599: between phases with rate constant $V$.

600: Consider two nearest neighbor surface sites in the state $\alpha \beta$.

601: The transition $\alpha \beta \to \alpha \alpha$

602: ($\alpha \beta \to \beta \beta$) occurs if none (at least one) of

603: these two sites is occupied by CO.

604: Summarizing the above transition definitions written in the more usual

605: form of reaction equations gives:

606: \begin{eqnarray*}

607: && \mbox{CO(g)} + S^{\chi} \rightleftharpoons \mbox{CO(a)}, \\

608: && \mbox{O$_2$(g)} + 2S^{\alpha} \to 2\mbox{O(a)}, \\

609: && \mbox{O$_2$(g)} + 2S^{\beta} \to 2\mbox{O(a)}, \\

610: && \mbox{CO(a)} + S^{\chi} \to S^{\chi} + \mbox{CO(a)}, \\

611: && \mbox{CO(a)} + \mbox{O(a)} \to \mbox{CO$_2$(g)} + 2S^{\chi}, \\

612: && S^{\alpha} \rightleftharpoons S^{\beta},

613: \end{eqnarray*}

614: where $S$ stands for a free adsorption site, $\chi$ stands for either

615: $\alpha$ or $\beta$ and (a) or (g) for a particle adsorbed on the

616: surface or in the gas phase, respectively. For additional

617: details see ref. \cite{kuz1998}.

618:

619: Amongst several successful results of this model, we can mention

620: that it was one of the first microscopic models for CO oxidation on Pt

621: including surface

622: reconstruction, which is nowadays widely accepted as the key element

623: in order to get oscillatory behavior. This model reproduce correctly

624: oscillatory regimes for both surfaces Pt$(100)$ and Pt$(110)$, by

625: changing only one parameter $s_{\alpha}$.

626: The diffusion of CO is consider explicitly and could be applied to

627: the fast diffusion regime without modification.

628: With this model an alternative mechanism for

629: global synchronization of oscillation has been suggested

630: \cite{kor1999}, different from the traditional

631: gas--phase coupling. This new mechanism is stochastic resonance,

632: obtained by including a spontaneous nucleation of one surface phase

633: in the other $\alpha \rightleftharpoons \beta$ at very low rates.

634: One unique result reproducing experimental observations is

635: the transition into chaotic behavior via the Feigenbaum route

636: or period doubling \cite{kor1998b}.

637: It is also for this model that the compatibility of both microscopic

638: simulations MC and CA has been study extensively \cite{kor1998}.

639:

640: \begin{figure}

641: \includegraphics[width=8cm]{fig_7a}

642: \includegraphics[width=8cm]{fig_7b}

643: \includegraphics[width=8cm]{fig_7c}

644: \includegraphics[width=8cm]{fig_7d}

645: \caption{

646: \label{fig5}

647: Sequence of snapshots of the model of CO oxidation on Pt$(110)$.

648: The left part shows the chemical species: CO particles are dark--grey,

649: O particles are light--grey, and empty sites are black. The right part

650: shows the structure of the surface: $\alpha$ phase sites are black, and

651: $\beta$ phase sites are white.

652: The parameters are $L=8192$, $D=250$, and $V=1$, $y=0.494$, $k=0.1$, $R=D$.

653: From top to bottom, we show sections from the upper--left corner

654: with sizes $4096 \times 4096$, $1024 \times 1024$, $256 \times 256$.

655: }

656: \end{figure}

657:

658: In Fig.\ref{fig5} we show snapshots from a simulation

659: on Pt$(110)$ and a system size $L=8192$ using the parameter values,

660: $D=250$, $V=1$, $y=0.494$, $k=0.1$, $R=D$. In the left part

661: we plot the chemical species: CO particles are dark--grey,

662: O particles are light--grey, and empty sites are black.

663: The right part shows the structure of the surface: $\alpha$ phase

664: sites are black, and $\beta$ phase sites are white.

665: The pattern formation in this regime shows a spatio--temporal behavior,

666: where a spiral dynamics is the dominant phenomena.

667: It is interesting see the different

668: structures at different spatial scales. For this purpose

669: we include in Fig.\ref{fig5} sections from the upper--left corner

670: with sizes $4096 \times 4096$, $1024 \times 1024$, $256 \times 256$,

671: from top to bottom.

672: This sequence shows that the spiral dynamics occurs on a

673: slowly varying island structure of sizes of the order of $\sqrt{D/V}$.

674: The fact that we can see both mesoscopic and microscopic pattern formation

675: is a quite interesting feature of the model, which has some experimental

676: evidence \cite{win1997,vol1999} and has been studied theoretically in

677: \cite{hil2002} by including lateral interactions between adsorbed particles.

678: The model used here is simpler, because does not need that consideration

679: in order to obtain nanostructures.

680:

681: \begin{figure}

682: \includegraphics[width=8cm]{fig_8}

683: \caption{

684: \label{fig4}

685: The same as Fig.\ref{fig3}, but for the model of CO

686: oxidation on Pt$(110)$.

687: }

688: \end{figure}

689:

690: In Fig.\ref{fig4} we analyze the speed up of the parallel

691: algorithm of this realistic model. We use system sizes,

692: $L=256,512,1024,4096,8192$ and number of nodes $P=1,2,4,8,16$.

693: The simulated time for each computation was also set to $t_{max}=500$.

694: Here we can see that the speed up of the parallel algorithm is

695: good even for small system sizes.

696: This is because the amount of computing

697: in each node for this model is larger than for the A$+$B$\to0$ model,

698: while the amount of communication data is the same in both models.

699:

700: \section{Conclusions}

701:

702: In this paper we present a tool to obtain scaling laws

703: connecting experimental system sizes and diffusion coefficients to

704: standard values in microscopic MC simulations. By using a CA

705: equivalent to MC simulation we provide an efficient parallelization

706: algorithm. We have explained in detail how to implement the parallelization.

707: The speed up of the algorithm is almost ideal and it is much better

708: for larger system sizes and more complex models.

709: A full description and analysis of the scaling laws for the second

710: model used here is in preparation \cite{sal2002}.

711:

712: \begin{acknowledgments}

713: We thank J.J.~Lukkien and S.~Nedea for stimulating discussions.

714: This work was supported by the Nederlanse Organisatie voor

715: Wetenschapperl\"yk Onderzoek (NWO), and the EC Excellence Center of

716: Advanced Material Research and Technology (contract N  1CA1-CT-2080-7007).

717: We thank the National Research School Combination Catalysis (NRSCC)

718: for computational facilities.

719: \end{acknowledgments}

720:

721: \begin{thebibliography}{3}

722:

723: \bibitem{rab2000} M.I.~Rabinovich, A.B.~Ezersky, P.D.~Weidman,

724: \newblock{\it The dynamics of patterns}. Word Scientific, Singapure, (2000).

725:

726: \bibitem{imb1995} R.~Imbihl, G.~Ertl,

727: \newblock{\em Chem. Rev.} {\bf 95}, 697 (1995).

728:

729: \bibitem{wal1997} D.~Walgraef, \newblock{\it Spatio--Temporal Pattern

730: Formation: With Examples from Physics, Chemistry, and Materials Science}.

731: Springer Verlag, Berlin, (1997).

732:

733: \bibitem{win2002} J.~Wintterlin, \newblock{\em Chaos}, {\bf 12},  108 (2002).

734:

735: \bibitem{kam1981} N.G.~van Kampen \newblock{\it Stochastic Processes

736: in Physics and Chemistry}. North--Holland, Amsterdam, (1981).

737:

738: \bibitem{gil1977} D.T.~Gillespie, \newblock{\em J. Phys. Chem.}

739: {\bf 81}, 2340 (1977).

740:

741: \bibitem{jan1995} A.P.J.~Jansen, \newblock{\em Comp. Phys. Comm.}

742: {\bf 86}, 1 (1995).

743:

744: \bibitem{luk1998} J.J.~Lukkien, J.P.L~Segers, P.A.J.~Hilbers, R.J.~Gelten, A.P.J.~Jansen, \newblock{\em Phys. Rev. E}

745: {\bf 58}, 2598 (1998).

746:

747: \bibitem{zve2001} G.~Zvejnieks and V.N.~Kuzovkov, Phys. Rev. E,

748: {\bf 65}, 051104 (2001).

749:

750: \bibitem{win1997} J.~Wintterlin, S.~V\"olkening, T.V.W.~Janssens,

751: T.~Zambelli, G.~Ertl, \newblock{\em Science} {\bf 278}, 1931 (1997).

752:

753: \bibitem{vol1999} S.~V\"olkening, K.~Bed\"urftig, K.~Jacobi, J.~Wintterlin,

754: G.~Ertl, \newblock{\em Phys. Rev. Lett.} {\bf 83}, 2672 (1999).

755:

756: \bibitem{tof1987} T.~Toffoli, N.~Margolus, \newblock{\it Cellular

757: Automata Machines}. MIT Press, Massachusetts, (1987).

758:

759: \bibitem{wei1997} J.R.~Weimar, \newblock{\it Simulation with Cellular

760: Automata}. Logos Verlag, Berlin (1997).

761:

762: \bibitem{kor1998} O.~Kortl\"uke, \newblock{\em J. Phys. A}

763: {\bf 31}, 9185 (1998).

764:

765: \bibitem{rev2000} \newblock{\it Cellular automata; From modeling to

766: applications}. Special Issue, Parallel Computing 27 (2001).

767:

768: \bibitem{wei2000} J.R.~Weimar, \newblock{\em Parallel Computing}

769: {\bf 27}, 601 (2001).

770:

771: \bibitem{cho1988} B.~Chopard, M. Droz, \newblock{\em J. Phys. A}

772: {\bf 21}, 205 (1988).

773:

774: \bibitem{cho1991} B.~Chopard, M. Droz, \newblock{\em J. Stat. Phys.}

775: {\bf 64}, 859 (1991).

776:

777: \bibitem{mai1991} J.~Mai, W.~von Niessen, \newblock{\em Phys. Rev. A}

778: {\bf 44}, R6165 (1991).

779:

780: \bibitem{mai1992} J.~Mai, W.~von Niessen, \newblock{\em Chem. Phys.}

781: {\bf 165}, 57 (1992).

782:

783: \bibitem{mai1992b} J.~Mai, W.~von Niessen, \newblock{\em Chem. Phys.}

784: {\bf 165}, 65 (1992).

785:

786: \bibitem{mai1993} J.~Mai, W.~von Niessen, \newblock{\em J. Chem. Phys.}

787: {\bf 98}, 2032 (1993).

788:

789: %\bibitem{mai1993b} J.~Mai, PhD. Thesis (1993).

790:

791: %\bibitem{mai1994} J.~Mai, V.~Kuzovkov, W.~von Niessen

792: %\newblock{\em Physica A} {\bf 203}, 298 (1994).

793:

794: \bibitem{note2} By using $P$ processors and a system size $L^2$

795: the total interface between blocks is for the squares

796: $2 L \sqrt{P}$ and for the strips $L P$. However, due

797: to the cyclic tiling sequence shown in Fig.\ref{strips} we

798: reduce the amount of data to be send in half for the strips case.

799: The ratio of data to be sent between squares and strips then is:

800: $(2L\sqrt{P})/(LP/2)=4/\sqrt{P}$.

801:

802: \bibitem{knu1998} D.E.~Knuth, \newblock{\em The art of computer

803: programming, Vol.2: Seminumerical algorithms}. Addison--Wesley,

804: Amsterdam, (1998).

805:

806: \bibitem{kuz1998} V.N.~Kuzovkov, O.~Kortl\"uke, W.~von Niessen,

807: \newblock{\em J. Chem. Phys} {\bf 108}, 5571 (1998).

808:

809: \bibitem{kor1998b} O.~Kortl\"uke, V.N.~Kuzovkov, W.~von Niessen,

810: \newblock{\em Phys. Rev. Lett.} {\bf 81}, 2164 (1998).

811:

812: \bibitem{kor1999} O.~Kortl\"uke, V.N.~Kuzovkov, W.~von Niessen,

813: \newblock{\em Phys. Rev. Lett.} {\bf 83}, 3089 (1999).

814:

815: \bibitem{kuz1999} V.N.~Kuzovkov, O.~Kortl\"uke, W.~von Niessen,

816: \newblock{\em Phys. Rev. Lett.} {\bf 83}, 1636 (1999).

817:

818: \bibitem{kor1999b} O.~Kortl\"uke, V.N.~Kuzovkov, W.~von Niessen,

819: \newblock{\em J. Chem. Phys} {\bf 110}, 11523 (1999).

820:

821: \bibitem{ovc1978}  A.A.~Ovchinnikov, Ya.B.~Zeldovich, Chem. Phys.

822: {\bf 28}, 215 (1978).

823:

824: \bibitem{kuz1988} V.N.~Kuzovkov, E.A.~Kotomin

825: \newblock{\em Rep. Prog. Phys.} {\bf 51}, 1479 (1988).

826:

827: \bibitem{kot1996}  E.A.~Kotomin, V.N.~Kuzovkov, {\it Modern Aspects

828: of Diffusion-Controlled Reactions: Cooperative Phenomena in

829: Bimolecular Processes}. North--Holland, Amsterdam, Vol.34, (1996).

830:

831: \bibitem{pri1997} V.~Privman, \newblock{\it Nonequilibrium Statistical

832: Mechanics in One Dimension}. Universitary Press, Cambridge (1997).

833:

834: \bibitem{mar1999} J.~Marro, R.~Dickman, \newblock{\it Nonequilibrium

835: phase transitions in lattice models}. Universitary Press, Cambridge (1999).

836:

837: \bibitem{arg2001} P.~Argyrakis, S.F.~Burlatsky, E.~Clement, G.~Oshanin,

838: Phys. Rev. E. {\bf 63}, 021110 (2001).

839:

840: \bibitem{lee1995} B.P.~Lee, J.~Cardy,

841: J. Stat. Phys. {\bf 80}, 971 (1995).

842:

843: \bibitem{tou1983} D.~Toussaint, F.~Wilczek, J. Chem.

844: Phys. {\bf 78}, 2642 (1983).

845:

846: \bibitem{hil2002} M.~Hildebrand, Chaos, {\bf 12},  144 (2002).

847:

848: \bibitem{sal2002} R.~Salazar, A.P.J.~Jansen, V.N.~Kuzovkov

849: Preprint.

850:

851: \end{thebibliography}

852:

853: \newpage

854:

855: \end{document}

856:

857:

858: