0606:physics0606226/ps6.tex

1: %\documentclass[preprint,preprintnumbers,amsmath,amssymb,superscriptaddress]{revtex4}

2: \documentclass[twocolumn,preprintnumbers,amsmath,amssymb,superscriptaddress]{revtex4}

3:

4: \usepackage{setspace}

5:

6: \usepackage{graphicx}% Include figure files

7: \usepackage{graphicx,epsf} %Include figure files

8: \usepackage{dcolumn}% Align table columns on decimal point

9: \usepackage{bm}% bold math \usepackagehyperref}

10: \usepackage{latexsym}

11: \newcommand{\xsize}{\epsfxsize=9.0cm}

12:

13: \begin{document}

14:

15: \title{A Complexity O(1) Priority Queue for Event Driven Molecular Dynamics

16: Simulations }

17:

18: \author{Gerald~Paul}

19: \affiliation{Center for Polymer Studies and Dept.\ of Physics, Boston

20:   University, Boston, MA 02215, USA}

21: \email{gerryp@bu.edu}

22:

23:

24:

25: \begin{abstract}

26:

27: We propose and implement a priority queue suitable for use in event

28: driven molecular dynamics simulations.  All operations on the queue take

29: on average O(1) time per collision.  In comparison, previously studied

30: queues for event driven molecular dynamics simulations require O(log

31: $N$) time per collision for systems of $N$ particles.

32:

33: \end{abstract}

34:

35: \maketitle

36:

37: \section{Introduction}

38:

39:

40:

41: Molecular dynamics simulations are a powerful tool for determining the

42: behavior of multiparticle systems and are used in a wide range of

43: applications \cite{Allen87,Becker,Daggett,Frenkel,Rapaport,Sadus,Urbanc}.

44:

45: There are two basic approaches to these simulations:

46:

47: (i) Time driven simulations \cite{Allen87} in which equations of motion of all

48: particles are solved for a series of small time slices.  The positions and

49: velocities of the particles are determined at

50: the end of each time slice and used as input to the calculation for the

51: next time slice.

52:

53: (ii) Event driven simulations

54: \cite{Alder,Allen89,Erpenbeck,Krantz,Donev1} which are applicable to

55: systems of hard spheres or more generally to systems with interparticle

56: potentials which are piecewise constant.  The approach with event driven

57: simulations is to determine when the next collision between two

58: particles occurs, determine the positions and velocities of these

59: particles after the collision and then repeat this process.  A collision

60: is defined as the event in which the hard spheres collide or more

61: generally when two particles reach a discontinuity in their

62: interparticle potential.

63:

64: We focus here on event driven simulations which, where applicable,

65: provide exact results and typically run faster than time driven

66: simulations.  Determination of the next event is usually composed of two

67: steps \cite{Krantz}:

68:

69: (i) determination of the collision event with the shortest time for

70: each particle.  By dividing the system into cells and/or maintaining

71: lists of particles within a certain distance of a given

72: particle (neighbor lists), the time taken for calculation of the first

73: collision event for a given particle can be made independent of N, the

74:  total number of particles in the system \cite{Erpenbeck,Donev1}.

75:

76: (ii) determination of the collision event with the shortest time among

77:  all the particles, given the events with the shortest time for each

78:  particle obtained in (i). Approaches have been proposed and implemented

79:  which allow this determination in O($\log N$) time.

80:

81: The subject of this paper is an approach to determining the next

82: collision event among all particles.  This has been a heavily researched

83: subject \cite{Lubachevsky,Marin93,Marin,Rapaport80,Shida}.  The

84: requirements for a queue to allow this determination is as follows. The

85: queue must support:

86:

87: (i) addition of an event to the queue;

88:

89: (ii) identification and deletion from the queue of the event with the

90: shortest collision time;

91:

92:  (iii) deletion of a given event from the queue (e.g. when a collision

93: (p,q) occurs we may want to remove the event (q,p) from the queue.)

94: These requirements define abstractly the concept of a {\it priority

95: queue}.

96:

97:

98: Implementations of priority queues for molecular dynamics

99: simulations have for the most part been based on various types of binary

100: trees. They all share the property that determining the event in the

101: queue with the smallest value requires O($\log N$) time \cite{Marin}.

102:

103:

104: The early work on priority queues is reviewed in

105: Ref.~\cite{Jones}.  The earliest implementations of priority queues used

106: linked lists which results in $O(N)$ performance.  Implementations with

107: $O(N^{0.5})$ performance were introduced and analyzed in

108: Refs.~\cite{Blackstone, Hendriksen77,Hendriksen83,Kingston}.  The oldest

109: priority queue implementations with $O(\log N)$ performance used {\it

110: implicit heaps} binary trees in which each item always has a priority

111: higher than its children and the tree is embedded in an array

112: \cite{Bentley, Knuth, Williams}. Other $O(\log N)$ implementations

113: include {\it leftist trees} \cite{Knuth}, {\it binomial queues}

114: \cite{Brown78, Vuillemin}, {\it pagodas} \cite{Francon,Nix}, {\it skew

115: heaps} \cite{Sleator83,Tarjan}, {\it splay trees}

116: \cite{Sleator83,Sleator86,Tarjan} and {\it pairing heaps}

117: \cite{Fredman}.

118:

119: Marin et al. \cite{Marin93,Marin} introduced a version of

120: the {\it complete binary tree} which also has $O(\log N)$ performance

121: and compared it to earlier priority queue implementations explicitly in

122: the context of molecular dynamics simulations.  They find that over a

123: wide range of densities their complete binary tree variant has the best

124: performance in terms of the coefficient of the $\log N$ term and large $N$

125: behavior.

126:

127:

128:

129:

130: In this work, we propose a priority queue for use in event driven

131: molecular dynamics simulations for which all operations require O(1)

132: time.  The approach is inspired by the concept of a {\it bounded priority

133: queue} which is typically implemented as an array of linear lists and

134: which is applicable to problems in which the values associated with

135: queue items are integers and are bounded (i.e. the values $t$ associated

136: with events obey $a < t< b$ where $a$ and $b$ are constants). Bounded

137: priority queues are not directly applicable to the molecular dynamics

138: queueing problem because neither of these requirements are met.

139:

140: We show, however, that with a hybrid approach that employs both a normal

141: priority queue and a bounded priority queue we can ensure all operations

142: on the queue take O(1) time.  We make use of the facts that for

143: molecular dynamics simulations:

144:

145: (i) The time associated with an event to be added to the queue is

146: always later than the time associated with the last event removed from

147: the top of the queue.  That is,

148: %

149: \begin{equation}

150: t -t_{\rm last} \ge 0,

151: \label{tge}

152: \end{equation}

153: %

154: where $t$ is the time associated with the  event to be added to the queue

155: and $t_{\rm last}$ is the last event removed from the top of the queue.

156:

157: (ii) There exists a constant, $\Delta t_{\rm max}$ such that

158: %

159: \begin{equation}

160: t-t_{\rm last}  <  \Delta t_{\rm max}.

161: \label{tmax}

162: \end{equation}

163: %

164: We call a priority queue which supports such events a BIPQ (Bounded

165: Increasing Priority Queue).

166:

167: \section{Approach}

168:

169: The basic idea is to:

170:

171: (i) perform a gross sort of the events using an array

172: of linear lists and

173:

174: (ii) to use a binary tree to perform a fine sort of only those events which are

175: currently candidates for the event with the shortest time.

176:

177:

178: More specifically, our priority queue is composed of the following

179: components:

180:

181: 1.  An array, $A$, of $n$ linear lists $l_i$, $0 \le i < n $. (Section

182: \ref{choose} below discusses how to determine the the size $n$ of the

183: array.)  The array is treated in a circular manner.  That is, the last

184: linear list in the array is followed logically by the first linear list.

185: We implement each linear list as a doubly linked list.

186:

187: 2.  A binary tree which is used to implement a conventional priority queue.

188:

189:

190: We also maintain two additional quantities: the {\it current index},

191: $i^*$, and $i_0$ a {\it base index} associated with the queue.

192: Initially, all linear lists and the binary tree are empty and $i^*$ and

193: $i_0$ are $0$.

194:

195:

196: \section{Queue Operation}

197: \label{QueueOperation}

198:

199: Here we describe how operations on the queue are implemented using the

200: data structures described above.

201:

202: (i) Addition. Events are added to either one of the linear lists or to

203: 	the binary tree as follows: An index $i$ for the event to be

204: 	added is determined by

205: %

206: \begin{equation}

207: i = \lfloor s*t-i_0 \rfloor,

208: \end{equation}

209: %

210: where: $t$ is the time associated with the event; $i_0$ is the base

211: index; and $s$ is a {\it scale factor} the value of which is such the

212: binary tree never contains more than a relatively small number of events

213: ( $\approx 10-20$).  If $i$ is equal to the {\it current index }, $i^*$,

214: the event is added to the binary tree, otherwise it is added to linear

215: list $l_i$.

216:

217: (ii) Identification of the event with shortest time.  The event with the

218: shortest time is simply the root of the binary tree, as is the case with

219: a normal priority queue implemented using a binary tree.  If a request

220: is made for the event with the smallest time and the binary tree is

221: empty, the current index is incremented by one (wrapping around to

222: $i^*=0$ if we reach the end of the array) and all events in the linear

223: list $l_{i^*}$are inserted in the binary tree.  If there are none, we

224: continue to increment $i^*$ until a non-empty linear list is found.  If

225: we wrap around to the beginning of the list, $i_0$ is incremented by

226: $n$.  We find that in practice, when the binary tree becomes empty the

227: next linear list is always non-empty (see Section \ref{choose} in which

228: we show the distribution of event times).

229:

230:

231: (iii) Deletion of an event.  We simply delete the event from the array

232: of linear lists or from the binary tree depending on the structure in which

233: it is located.

234:

235: The fact that the time associated with an event to be added to the queue

236:  is always greater than or equal to the time associated with the last event

237:  removed allows us to use the array of linear lists in a circular

238:  fashion.

239:

240: The requirement that there exists a constant, $\Delta t_{max}$ such that

241: %

242: \begin{equation}

243: t - t_{\rm last} > \Delta t_{\rm max}

244: %\label{tmax}

245: \end{equation}

246: %

247: allows us to use a finite number of linear lists. The number of linear

248: lists required is proportional to $\Delta t_{\rm max}$.  In practice we

249: find that we can always find a reasonable value of $\Delta t_{\rm max}$

250: such that Eq.~(\ref{tmax})~ holds.  If a rare event occurs which

251: violates this constraint or we want to use less memory for linear lists

252: causing the constraint to be violated, the event is handled on an

253: exception basis as implemented in the {\it processOverflowList} function

254: in code contained in the Appendix.  Alternatively, the application which

255: calls the priority queue code can guarantee that such an event never

256: occurs by creating an earlier fictitious collision with a time which

257: does not violate the constraint.

258:

259: Thus all of the events, except for those deleted before they are placed

260: in the binary tree, will eventually be added to the binary tree, but at

261: any given time the tree, instead of containing O($N$) entries, will

262: contain only a relatively small number of entries.  The number of events

263: maintained in the binary tree is only a fraction of the total number of

264: particles $N$ in the system and can be made independent of $N$.

265:

266: Our priority queue is similar to a {\it calendar queue} \cite{Brown};

267:  however, the calender queue does not employ a binary tree -- events are

268:  sorted in each of the linear lists.

269:

270: \section{How to choose parameters}

271: \label{choose}

272:

273: Two parameters, $n$ the number of linear lists and $s$ the scale factor,

274: must be chosen to specify the implementation of the queue.

275: Operationally, they can be chosen as follows:

276:

277: (i) First, by instrumenting the queue to count the number of events in

278: the binary tree, determine a value of $s$ such that the number of events

279: in the binary tree is relatively small ($\approx 10-20$).  Table I. and

280: Fig.~\ref{mem}(a) summarize the values of $s$ we have used for our

281: simulations. The figure is consistent with a scale factor linear in $N$

282: with a different coefficient of linearity dependent on density.  Because

283: we use a binary tree to store events with the soonest times, the

284: performance of the algorithm is somewhat insensitive to the choice of

285: $s$.  For example, a choice of $s$ which results in a doubling of the

286: number of events in the binary tree results in only one additional level

287: in the tree.

288:

289: (ii) Instrument the queue to find $\Delta t_{\rm max}$, the maximum

290: difference between the time associated with an event to be added and the

291: time associated with the last event removed and set

292: %

293: \begin{equation}

294: n=s* \Delta t_{\rm max}

295: \end{equation}

296: %

297: to ensure that (Eq.~\ref{tmax}) is met.  Table I. and Fig.~\ref{mem}(b)

298: summarize the number of linear lists $n$ we have used for our

299: simulations. As with $s$, $n$ is linear in $N$ with a different coefficient of

300: linearity dependent on density.  We note that while memory requirements

301: are $O(N)$ as in the conventional implementation of priority queues, the

302: hybrid implementation does require significantly more memory than the

303: conventional implementation due to the memory required for the linear

304: lists.  Tradefoffs can be made of cpu time for memory by increasing the

305: scale factor and/or reducing the number of linear lists (resulting in

306: more exception conditions).

307:

308: Figure {\ref{pdist}(a) plots $\langle m_{\hat i} \rangle$ the average

309: number of events with index $\hat i$ versus $\hat i$ for various $N$.

310: Here

311: %

312: \begin{equation}

313: \hat i\equiv (i-i^*+n) \mod n.

314: \end{equation}

315: %

316: That is, $\hat i$ is the distance of $i$ from the current index taking

317: into account the circular nature of the array of linear lists.  The data

318: was obtained by sampling the queue many times at regular intervals.

319: With the choice of scale factors shown in Table I we achieve our goal

320: of having $\approx 10-20$ events with index $i=i^*$ and thus in the

321: binary tree.  Note that to achieve this, the scale factor increases with

322: increasing $N$ resulting in the cutoff of the distributions also

323: increasing with increasing $N$.  (In fact, if the $x$-axis is

324: transformed by $x=x/N$, the plots collapse as shown in

325: Fig.~\ref{pdist}(b) reflecting the fact that the probability

326: distribution of collision times is independent of $N$.)  Thus, the

327: number of linear lists required to ensure that Eq.~(\ref{tmax}) holds

328: also increases with $N$.  In Fig.~\ref{pBucket0}, we plot the

329: distribution $P(m^*)$ the probability that the number of events with

330: the current index, $i^*$ is $m^*$ versus $m^*$.  The distributions are

331: strongly peaked indicating that the number of events in the binary tree

332: do not vary much from the average.

333:

334: \section{Complexity Analysis}

335: \label{complexity}

336:

337: The basic operations involved in the queue are:

338:

339: (i) insertion into and deletion from the linear lists.  Use of doubly

340: linked lists allows these operations to be implemented to take O(1) time.

341:

342: (ii) binary tree operations.  We use the code of Ref.~\cite{Marin} to

343: implement the binary tree operations.  When a leaf representing an item

344: in the priority queue is added to or deleted from the tree, the tree

345: must be traversed from the affected leaf possibly all the way to the

346: root node and adjustments made to reflect the presence or absence of the

347: affected leaf.  Thus a bound on the number of levels which must be

348: traversed is $\log_2 m$ where $m$ is the number of items in the priority

349: queue.  In Sec.~ \ref{choose} we show that by choosing the scale

350: factor $s$ appropriately, $m$ can be made to be independent of $N$ (and

351: have a relatively small value, $\approx 10-20$).  Thus binary tree

352: operations will be O(1).

353:

354: (iii) identification of the next non-empty linear list, after the

355: current linear list is exhausted.  As explained in item (ii) of

356: Sec. \ref{QueueOperation}, when the binary tree is empty, we search

357: forward through the array of linear lists until a non-empty list is

358: found.  If the number of lists we must search through increases with

359: $N$, this process will not be O(1).  We show below that with the proper

360: choice of $s$, the number of lists we must search does not grow with $N$

361: and in fact show that the next linear list after the current one almost

362: always is non-empty.  Thus the complexity of identification of the next

363: non-empty list will be O(1).

364:

365: Thus the overall time taken by queue operations per collision is O(1).

366:

367: \section{Experiments /Simulations }

368:

369: We run simulations using both a conventional priority queue and our new

370:  hybrid approach.  For simplicity the simulation was of identical size

371:  hard spheres of radius one and unit mass.  The sizes $L$ of the cubic

372:  systems are set to maintain equal densities.  The parameters of the

373:  simulation are as shown in Table I.

374:

375: To demonstrate the performance of our approach, we run simulations for

376: cubic systems at four volume densities $\rho=0.01, 0.12, 0.40$ and $0.70$.

377: The first density represents a rarefied gas and the last density

378: represents a jammed system.  The jamming density for hard sphere systems

379: is $\approx 0.64$ \cite{Donev2004}. For both the conventional priority

380: queue and the hybrid queue we used the binary tree code from

381: Ref.~\cite{Marin}.

382:

383: Figure \ref{pcbt} shows the time taken for  $10^7$ collisions for queue

384: operations with both a conventional priority queue and the hybrid queue.

385: As expected, the time for the conventional priority queue increases as

386: $\log N$ while the time for the hybrid queue is essentially constant.

387:

388: There is, however, a slight upward trend in the hybrid queue results.

389: To determine if this trend is a feature of the algorithm or of the

390: the benchmark environment, we proceed as follows.

391:

392: We first study the only two places in the hybrid code where looping is

393: involved:

394:

395: (i) in the {\it updateCBT} function of Ref. ~\cite{Marin} we loop as we

396: traverse the binary tree.  If we traverse more levels as the N grows, the

397: algorithm will not be O(1).  To explore this possibility, we instrument

398: the function to count the number of number of levels we traverse in the

399: tree.  The results are shown in Fig.~ \ref{pcbt}.  The number of loop

400: iterations is essentially constant, independent of $N$.

401:

402: (ii) in the {\it deleteFirstFromEventQ} function, after the

403: priority queue for the current linear list is exhausted, we loop until we

404: find a non-empty list.  If the number of lists we must examine before we

405: find the first non-empty list grows with the system size, the algorithm

406: will not be O(1).  We examine this possibility by counting the number of

407: times we encounter an empty list and find that on average the

408: probability of encountering an empty list does not grow with $N$ and

409: that the probability of encountering an empty list is very small: we

410: encounter an empty list only $10^{-4}$ of the times after exhausting the

411: priority queue.

412:

413: Having ruled out dependence of the number of loop iterations on $N$ as

414: the source of the upward trend in the execution times, we now consider

415: whether the larger memory needed as $N$ increases is the cause of the

416: trend.  All modern computer processors employ high speed cpu cache

417: memory to reduce the average time to access memory

418: \cite{Handy,Hennessy}.  In fact, the processor we use in our

419: simulations, the AMD Opteron, employs a two-level memory cache (64 KB

420: level 1 cache, 1 MB level 2 cache) \cite{Opteron}.  A similar cache

421: structure is used in the Intel Xeon processor \cite{Xeon}.  Because

422: memory caches are finite size, if the memory access is random the larger

423: the memory used by a program, the lower the probability that data will

424: be found in the cache resulting in slower instruction execution.  The

425: effect of cache in benchmark runtimes has been studied in

426: Ref. ~\cite{Saavedra}.  We study the effect of the finite size of the

427: cache in our system as follows: Instead of running the molecular

428: dynamics simulations, we run a small test program which randomly

429: accesses the data structures used by the molecular dynamics simulations.

430: For each value of $N$, the test program executes exactly the same number

431: of instructions but uses data structures of the size used by the

432: molecular dynamics simulations for that value of $N$.  The results are

433: shown in Fig.~\ref{pcbt} and show an upward trend similar to that of the

434: simulation results for all of the densities studied.

435:

436: The above results thus suggest that the complexity of the hybrid

437: algorithm is, in fact, O(1) and that the upward trend in the results is

438: due to the finite size of the high speed memory cache.

439:

440: \section{Discussion and Summary}

441:

442: We have defined a new abstract data type, the Bounded Increasing

443: Priority Queue (BIPQ) having the same operations as a conventional

444: priority queue but which takes advantage of the fact that the value

445: associated with an item to be added to the queue has the properties that:

446: (i) the value is greater than or equal to the value associated with the

447: last item removed from the top of the queue and (ii) the value minus the

448: value of the last item removed from the top of the queue is bounded.

449: These properties are obeyed for events in event driven molecular dynamic

450: simulations.  We implement a BIPQ using a hybrid approach incorporating

451: a conventional priority queue (implemented with a binary tree) and a

452: bounded priority queue.  All operations on the BIPQ take an average O(1)

453: time per collision. This type of queue should provide performance

454: speedups for molecular dynamics simulations in which the event queue is

455: the bottleneck.

456:

457:

458: \section{Acknowledgments}

459:

460: We thank Sergey Buldyrev, Pradeep Kumar, Sameet Sreenivasan, and Brigita

461: Urbanc for helpful discussions.  We ONR, NSF and NIH for support.

462:

463: \newpage

464: \appendix

465: \begin{center}

466: \bf {APPENDIX}

467: \end{center}

468:

469: The following code implements the hybrid queue proposed here.  The calls

470: to Insert and Delete are to the functions contained in

471: Ref.~\cite{Marin}, which update NP and the complete binary tree, CBT.

472: Any code providing the same functions could be substituted for Insert

473: and Delete.

474:

475: \begin{singlespace}

476: \begin{verbatim}

477:

478: #define nlists 50000

479: #define scale 50

480:

481: typedef struct

482: {

483:   int next;

484:   int previous;

485:   int p;

486:   int q;

487:   int c;

488:   double t;

489:   unsigned int qMarker;

490:   int qIndex;

491:   statusType status;

492: }eventQEntry;

493:

494: eventQEntry * eventQEntries;

495: double baseIndex;

496:

497: int * CBT;  /* complete binary tree

498:               implemented in an array of

499:               2*N integers */

500: int NP;     /*current number of particles*/

501:

502: int linearLists[nlists+1];/*+1 for overflow*/

503: int currentIndex;

504:

505:

506: //----------------------------------------

507: int insertInEventQ(int p)

508: {

509:   int i,oldFirst;

510:   eventQEntry * pt;

511:   pt=eventQEntries+p;   /* use pth entry */

512:

513:   i=(int)(scale*pt->t-baseIndex);

514:   if(i>(nlists-1))     /* account for wrap */

515:   {

516:     i-=nlists;

517:     if(i>=currentIndex-1)

518:     {

519:       i=nlists;	/* store in overflow list */

520:     }

521:   }

522:   pt->qIndex=i;

523:

524:   if(i==currentIndex)

525:   {

526:     Insert(p);   /* insert in PQ  */

527:   }

528:   else

529:   {

530:     /* insert in linked list */

531:

532:     oldFirst=linearLists[i];

533:     pt->previous=-1;

534:     pt->next=oldFirst;

535:     linearLists[i]=p;

536:

537:     if(oldFirst!=-1)

538:       eventQEntries[oldFirst].previous=p;

539:    }

540:     return p;

541: }

542:

543: //----------------------------------------

544:

545: processOverflowList()

546: {

547:   int i,e,eNext;

548:   i=nlists;  /* overflow list */

549:   e=linearLists[i];

550:   linearLists[i]=-1;  /* mark empty; we will

551:      treat all entries and may re-add some */

552:   while(e!=-1)

553:   {

554:     eNext=eventQEntries[e].next; /* save next */

555:     insertInEventQ(e);	/* try add to regular list now */

556:     e=eNext;

557:   }

558: }

559:

560: //---------------------------------------

561:

562: void deleteFromEventQ(int e)

563: {

564:   int prev,next,i;

565:   eventQEntry * pt=eventQEntries+e;

566:

567:   i=pt->qIndex;

568:   if(i==currentIndex)

569:     Delete(e);   /* delete from pq */

570:   else

571:   {

572:     /* remove from linked list */

573:

574:     prev=pt->previous;

575:     next=pt->next;

576:     if(prev==-1)

577:       linearLists[i]=pt->next;

578:     else

579:       eventQEntries[prev].next=next;

580:

581:     if(next!=-1)

582:       eventQEntries[next].previous=prev;

583:   }

584: }

585:

586: //---------------------------------------

587:

588: int deleteFirstFromEventQ()

589: {

590:   int e;

591:

592:   while(NP==0)/*if priority queue exhausted*/

593:   {

594:     /* change current index */

595:

596:     currentIndex++;

597:     if(currentIndex==nlists)

598:     {

599:       currentIndex=0;

600:       baseIndex+=nlists;

601:       processOverflowList();

602:     }

603:

604:     /* populate pq */

605:

606:     e=linearLists[currentIndex];

607:     while(e!=-1)

608:     {

609:       Insert(e);

610:       e=eventQEntries[e].next;

611:     }

612:     linearLists[currentIndex]=-1;

613:   }

614:

615:   e=CBT[1];    /* root contains shortest

616:                   time entry */

617:

618:   Delete(CBT[1]);

619:   return e;

620: }

621: //---------------------------------------

622:

623:

624: \end{verbatim}

625: \end{singlespace}

626:

627: \newpage

628:

629: \begin{thebibliography}{99}

630:

631: \bibitem{Alder}B.~J.~Alder and  T.~E.~Wainwright, Studies in molecular

632: dynamics. I. General method, J. Chem. Phys. 31 (1959) 459.

633:

634: \bibitem{Allen87} M. P. Allen and  D. J. Tildesley, Computer Simulations of

635: Liquids, Oxford Science Publications, (1987).

636:

637: \bibitem{Allen89} M. P. Allen, D. Frenkel, J.Talbot, Molecular dynamics

638: simulation using hard particles, Comput. Phys. Rep. 9 (1989) 301.

639:

640: \bibitem{Opteron} AMD Opteron Product Data Sheet. Publication 23932. (2004)

641:

642: \bibitem{Becker} O. M. Becker, A. D. Mackerell, B. Roux, B. M. Becker(Eds.),

643:   Computational Biochemistry and Biophysics, Marcel Dekker (2001).

644:

645: \bibitem{Bentley} J. Bentley, Programming pearls: Thanks, heaps.

646: Commun. ACM (1985) 245-250.

647:

648: \bibitem{Blackstone} J. H. Blackstone, G. L. Hogg and D. T. Phillips, A

649: two-list synchronization procedure for discrete event simulation,

650: Comm. ACM 24, (1981) 825-829.

651:

652: \bibitem{Brown78} M. R. Brown, The analysis of a practical and nearly

653: optimal priority queue algorithms, SIAM J. Comput. 7 (1978) 298-319.

654:

655: \bibitem{Brown} R. Brown, Calendar queues. Commun. ACM 31, 10

656: (1988), 1220-1227.

657:

658: \bibitem{Daggett} V. Daggett and A. R. Fersht, Transition states in

659:   protein folding, in Mechanisms of Protein Folding, R. H. Pain (Ed.),

660:   Oxford University Press, (2000).

661:

662: \bibitem{Donev2004} A. Donev, S. Torquato, F. H. Stillinger, and

663: R. Connelly, Jamming in hard sphere and disk packings, J. Appl. Phys. 95

664: (2004) 989-999.

665:

666: \bibitem{Donev1} A.~Donev, S.~Torquato, and F.~H.~Stillinger. Neighbor

667:   list collision-driven molecular dynamics simulation for nonspherical

668:   hard particles. I. Algorithmic details, J.  Comput. Phys. 202(2005)

669:   737-764.

670:

671: \bibitem{Erpenbeck} J. J. Erpenbeck and  W. W. Wood, in Statistical mechanics

672: B: Modern theoretical chemistry,B.J.Berne (Ed.), Molecular Dynamics

673: Techniques for Hard Core Systems, vol.6, Institute of Physics

674: Publishing, London, (1977) 1-40.

675:

676: \bibitem{Francon} J. Francon, G. Viennot, and J. Vuillemin, Description

677: and analysis of an efficient priority queue representation.  In

678: Proceeding of the 19th Annual Symposium on Foundations of Computer

679: Science.  IEEE (1978) 1-7.

680:

681: \bibitem{Fredman} M. L. Fredman, R. Sedgewick, D. Sleator and

682: R. Tarjan, The pairing heap: A new form of self-adjusting

683: heap. Algorithmica 1 (1986) 111-129.

684:

685: \bibitem{Frenkel} D. Frenkel and B. Smit, Understanding Molecular Simulation,

686:   Academic Press, New York, (2002).

687:

688: \bibitem{Handy} H. Handy, Cache Memory Book, Academic Press (1998).

689:

690: \bibitem{Hendriksen77} J. O. Henriksen, An improved events list

691: algorithm.  In Proceedings of the 1977 Winter Simulation Conference, IEEE

692: (1977) 547-557.

693:

694: \bibitem{Hendriksen83} J. O. Henriksen, Event list management - A

695: tutorial.  In Proceeding of the 1983 Winter Simulation Conference, IEEE,

696: (1983) 543-551.

697:

698: \bibitem{Hennessy} J. L. Hennessy and D. A. Patterson, Computer

699:   Architecture: A Quantitative Approach, Elsevier (2002).

700:

701: \bibitem{Xeon} Intel Xeon Processor with 800 MHZ System Bus

702: Datasheet. Document Number 302355-001 (2004).

703:

704: \bibitem{Jones} D. W. Jones, An empirical comparison of priority-queue

705: and event-set implementations. Comm. ACM 29 (1986) 300-311.

706:

707: \bibitem{Kingston} J. H. Kingston, The amortized complexity of

708: Henriksens algorithm.  Comput. Sci. Tech. Rep.85-06, Dept. of Computer

709: Science, University of Iowa (1985).

710:

711: \bibitem{Knuth} D. E. Knuth, The Art of Computer Programming Vol. 3,

712: Sorting and Searching, Addison-Wesley (1973).

713:

714: \bibitem{Krantz} A. T. Krantz, Analysis of an efficient algorithm for the

715: hard-sphere problem, TOMACS 6 (1996) 185-229.

716:

717: \bibitem{Lubachevsky}B. D. Lubachevsky, How to simulate billiards and similar

718: systems, J. Comput. Phys. 94 (1991) 255.

719:

720: \bibitem{Marin93} M. Mar\'in, D. Risso, and P. Cordero, Efficient

721: algorithms for many-body hard particle molecular dynamics,

722: J. Comput. Phys. 109 (1993) 306-317.

723:

724: \bibitem{Marin} M. Mar\'in and P. Cordero, An empirical assessment of

725: priority queues in event-driven molecular dynamics

726: simulation. Comput. Phys. Comm. 92 (1995) 214-224.

727:

728: \bibitem{Nix} R. Nix An evaluation of pagodas. Res. Rep. 164, Dept. of

729: Computer Science, Yale Univ.

730:

731: \bibitem{Rapaport80} D. C. Rapaport, The event scheduling problem in

732: molecular dynamics simulation, J. Comput. Phys. 34 (1980)184-201.

733:

734: \bibitem{Rapaport} D. C. Rapaport, Art of Molecular Dynamics Simulation,

735:   Cambridge University Press, (2004).

736:

737: \bibitem{Sadus} R. J. Sudus, Molecular Simulation of Fluids, Elsevier

738:   (1999).

739:

740: \bibitem{Saavedra} R. H. Saavedra and A. J. Smith, Measuring cache and TLB

741: performance and their effects on benchmark runtimes, IEEE Trans. Comp. 44

742: (1995)1223-1235.

743:

744: \bibitem{Shida} K. Shida, Y. Anzai, Reduction of the event-list for

745: molecular dynamic simulation, Comput. Phys. Commun. 69 (1992) 317-329.

746:

747: \bibitem{Sleator83} D. D. Sleator and R. E. Tarjan, Self-adjusting binary

748: trees. In Proceedings of the ACM SIGACT Symposium on theory of Computing

749: (1983) 235-245.

750:

751: \bibitem{Sleator86} D. D. Sleator and R. E. Tarjan, Self-adjusting heaps,

752: SIAM J. Comput. 15 (1986) 52-69.

753:

754: \bibitem{Tarjan} R. E. Tarjan and D. D. Sleator, Self-adjusting binary

755: search trees, JACM 32, (1985) 652-686.

756:

757: \bibitem{Urbanc} B. Urbanc, J. M. Borreguero, L. Cruz, and

758: H. E. Stanley, Ab initio discrete molecular dynamics approach to protein

759: folding and aggregation, Methods in Enzymology (2006) (in press).

760:

761: \bibitem{Vuillemin} J. A. Vuillemin A data structure for manipulating

762: priority queues.  Commun. ACM 21 (1978) 309-315.

763:

764: \bibitem{Williams} J. W. J. Williams Algorithm 232:

765: Heapsort. Commun. ACM 7 (1964) 347-348.

766:

767: \end{thebibliography}

768:

769:

770:

771: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

772: %\begin{widetext}

773: \begin{table}[htb]

774: \caption{Parameters of molecular dynamics simulations.}

775: ~

776: \begin{tabular}{ccc}

777:

778:  {Number} &  {Scale Factor}& {Number} \\

779:  {of Particles}  & & {of lists} \\

780: ~

781:  $N$      & $s$ & $n$ \\  \hline

782: \multicolumn 3 c {$\rho=0.01$} \\  \hline

783: $1000$ &  $100$ &  $25000$ \\

784: $8000$ &  $700$ &  $200000$    \\

785: $64000$ & $5000$  &   $2.5 \times 10^6$ \\

786: $512000$ &  $45000$ &   $25 \times 10^6$ \\  \hline

787: \multicolumn 3 c {$\rho=0.12$} \\  \hline

788: $1000$ &  $50$ &  $50000$ \\

789: $8000$ &  $500$ &  $400000$    \\

790: $64000$ & $3400$  &   $5 \times 10^6$ \\

791: $512000$ &  $25000$ &   $50 \times 10^6$ \\  \hline

792: \multicolumn 3 c {$\rho=0.4$} \\  \hline

793: $1372$ &  $1000$ &  $250000$ \\

794: $8788$ &  $7500$ &  $2 \time 10^6$    \\

795: $70304$ & $60000$  &   $16 \times 10^6$ \\

796: $530604$ &  $500000$ &   $130 \times 10^6$ \\  \hline

797: \multicolumn 3 c {$\rho=0.7$} \\  \hline

798: $1372$ &  $15000$ &  $500000$ \\

799: $8788$ &  $75000$ &  $200000$    \\

800: $70304$ & $500000$  &   $35 \times 10^6$ \\

801: $530604$ &  $4 \times 10^6$ &   $300 \times 10^6$ \\  \hline

802: \end{tabular}

803: \end{table}

804: %\end{widetext}

805:

806: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

807:

808:

809: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

810: \begin{figure}

811: \centerline{

812: \xsize

813: \epsfclipon

814: \epsfbox{pdista.ps}

815: }

816:

817: \centerline{

818: \xsize

819: \epsfclipon

820: \epsfbox{pdistb.ps}

821: }

822:

823:

824: \caption{(a) For $\rho=0.12$, the average number of events $\langle m_{\hat i} \rangle$

825:   with index $\hat i$ versus $\hat i$(the distance of $i$ from the

826:   current index $i^*$) for (from left to right $N$=1000, 8000, and 64000.

827:   (b) Same as (a) with the x-axis scaled by $1/N$ which results in a

828:   collapse of the plots. }

829: \label{pdist}

830: \end{figure}

831: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

832:

833: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

834: \begin{figure}

835: \centerline{

836: \xsize

837: \epsfclipon

838: \epsfbox{pBucket0.ps}

839: }

840: \caption{For $\rho=0.12$, $P(m^*)$, the probability that the number of events in linear

841: list $i=0$ is $m^*$, vs. $m^*$ for $N=$1000(squares), 8000(triangles),

842: and 64000(disks).  }

843: \label{pBucket0}

844: \end{figure}

845:

846: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

847: \begin{figure}

848: \centerline{

849: \xsize

850: \epsfclipon

851: \epsfbox{pms.ps}

852: }

853: \centerline{

854: \xsize

855: \epsfclipon

856: \epsfbox{pmn.ps}

857: }

858:

859: \caption{(a) Scale factor, $s$, vs $N$ for (from bottom to top)  $\rho=0.12,

860:  0.01, 0.4$ and $0.7$. (b) Number of linear lists, $n$, vs $N$ for (from bottom to top)  $\rho=0.01,

861:  0.12, 0.4$ and $0.7$.}

862: \label{mem}

863: \end{figure}

864:

865:

866:

867:

868: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

869: \begin{widetext}

870:

871: \begin{figure}

872: \centerline{

873: \xsize

874: \epsfclipon

875: \epsfbox{pa.ps}

876: \xsize

877: \epsfclipon

878: \epsfbox{pb.ps}

879: }

880:

881: \centerline{

882: \xsize

883: \epsfclipon

884: \epsfbox{pc.ps}

885: \xsize

886: \epsfclipon

887: \epsfbox{pd.ps}

888: }

889:

890: \caption{Processing time for queue operations vs. $N$, the number of

891: particles in the system.  (a) Volume density $ \rho =0.01$.  The higher

892: solid line is the processing time for queue operation for a normal

893: priority queue; the lower solid line is for the hybrid queuing system

894: introduced here.  The dashed line represents the benchmark test timing

895: to execute a fixed number of instructions independent of $N$ but with

896: memory sizes corresponding to the memory used for the hybrid system.

897: The dotted line represents the number of tree levels traversed ($ \times

898: 10^{-7})$ in the binary tree for the hybrid system.  (b),(c) and (d)

899: Same as (a) for $\rho=0.12, 0.4$, and $0.7$, respectively.}

900: \label{pcbt}

901: \end{figure}

902:

903:

904: \end{widetext}

905: }

906:

907: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

908:

909: \end{document}

910:

911: