1: %\documentclass[preprint,preprintnumbers,amsmath,amssymb,superscriptaddress]{revtex4}
2: \documentclass[twocolumn,preprintnumbers,amsmath,amssymb,superscriptaddress]{revtex4}
3:
4: \usepackage{setspace}
5:
6: \usepackage{graphicx}% Include figure files
7: \usepackage{graphicx,epsf} %Include figure files
8: \usepackage{dcolumn}% Align table columns on decimal point
9: \usepackage{bm}% bold math \usepackagehyperref}
10: \usepackage{latexsym}
11: \newcommand{\xsize}{\epsfxsize=9.0cm}
12:
13: \begin{document}
14:
15: \title{A Complexity O(1) Priority Queue for Event Driven Molecular Dynamics
16: Simulations }
17:
18: \author{Gerald~Paul}
19: \affiliation{Center for Polymer Studies and Dept.\ of Physics, Boston
20: University, Boston, MA 02215, USA}
21: \email{gerryp@bu.edu}
22:
23:
24:
25: \begin{abstract}
26:
27: We propose and implement a priority queue suitable for use in event
28: driven molecular dynamics simulations. All operations on the queue take
29: on average O(1) time per collision. In comparison, previously studied
30: queues for event driven molecular dynamics simulations require O(log
31: $N$) time per collision for systems of $N$ particles.
32:
33: \end{abstract}
34:
35: \maketitle
36:
37: \section{Introduction}
38:
39:
40:
41: Molecular dynamics simulations are a powerful tool for determining the
42: behavior of multiparticle systems and are used in a wide range of
43: applications \cite{Allen87,Becker,Daggett,Frenkel,Rapaport,Sadus,Urbanc}.
44:
45: There are two basic approaches to these simulations:
46:
47: (i) Time driven simulations \cite{Allen87} in which equations of motion of all
48: particles are solved for a series of small time slices. The positions and
49: velocities of the particles are determined at
50: the end of each time slice and used as input to the calculation for the
51: next time slice.
52:
53: (ii) Event driven simulations
54: \cite{Alder,Allen89,Erpenbeck,Krantz,Donev1} which are applicable to
55: systems of hard spheres or more generally to systems with interparticle
56: potentials which are piecewise constant. The approach with event driven
57: simulations is to determine when the next collision between two
58: particles occurs, determine the positions and velocities of these
59: particles after the collision and then repeat this process. A collision
60: is defined as the event in which the hard spheres collide or more
61: generally when two particles reach a discontinuity in their
62: interparticle potential.
63:
64: We focus here on event driven simulations which, where applicable,
65: provide exact results and typically run faster than time driven
66: simulations. Determination of the next event is usually composed of two
67: steps \cite{Krantz}:
68:
69: (i) determination of the collision event with the shortest time for
70: each particle. By dividing the system into cells and/or maintaining
71: lists of particles within a certain distance of a given
72: particle (neighbor lists), the time taken for calculation of the first
73: collision event for a given particle can be made independent of N, the
74: total number of particles in the system \cite{Erpenbeck,Donev1}.
75:
76: (ii) determination of the collision event with the shortest time among
77: all the particles, given the events with the shortest time for each
78: particle obtained in (i). Approaches have been proposed and implemented
79: which allow this determination in O($\log N$) time.
80:
81: The subject of this paper is an approach to determining the next
82: collision event among all particles. This has been a heavily researched
83: subject \cite{Lubachevsky,Marin93,Marin,Rapaport80,Shida}. The
84: requirements for a queue to allow this determination is as follows. The
85: queue must support:
86:
87: (i) addition of an event to the queue;
88:
89: (ii) identification and deletion from the queue of the event with the
90: shortest collision time;
91:
92: (iii) deletion of a given event from the queue (e.g. when a collision
93: (p,q) occurs we may want to remove the event (q,p) from the queue.)
94: These requirements define abstractly the concept of a {\it priority
95: queue}.
96:
97:
98: Implementations of priority queues for molecular dynamics
99: simulations have for the most part been based on various types of binary
100: trees. They all share the property that determining the event in the
101: queue with the smallest value requires O($\log N$) time \cite{Marin}.
102:
103:
104: The early work on priority queues is reviewed in
105: Ref.~\cite{Jones}. The earliest implementations of priority queues used
106: linked lists which results in $O(N)$ performance. Implementations with
107: $O(N^{0.5})$ performance were introduced and analyzed in
108: Refs.~\cite{Blackstone, Hendriksen77,Hendriksen83,Kingston}. The oldest
109: priority queue implementations with $O(\log N)$ performance used {\it
110: implicit heaps} binary trees in which each item always has a priority
111: higher than its children and the tree is embedded in an array
112: \cite{Bentley, Knuth, Williams}. Other $O(\log N)$ implementations
113: include {\it leftist trees} \cite{Knuth}, {\it binomial queues}
114: \cite{Brown78, Vuillemin}, {\it pagodas} \cite{Francon,Nix}, {\it skew
115: heaps} \cite{Sleator83,Tarjan}, {\it splay trees}
116: \cite{Sleator83,Sleator86,Tarjan} and {\it pairing heaps}
117: \cite{Fredman}.
118:
119: Marin et al. \cite{Marin93,Marin} introduced a version of
120: the {\it complete binary tree} which also has $O(\log N)$ performance
121: and compared it to earlier priority queue implementations explicitly in
122: the context of molecular dynamics simulations. They find that over a
123: wide range of densities their complete binary tree variant has the best
124: performance in terms of the coefficient of the $\log N$ term and large $N$
125: behavior.
126:
127:
128:
129:
130: In this work, we propose a priority queue for use in event driven
131: molecular dynamics simulations for which all operations require O(1)
132: time. The approach is inspired by the concept of a {\it bounded priority
133: queue} which is typically implemented as an array of linear lists and
134: which is applicable to problems in which the values associated with
135: queue items are integers and are bounded (i.e. the values $t$ associated
136: with events obey $a < t< b$ where $a$ and $b$ are constants). Bounded
137: priority queues are not directly applicable to the molecular dynamics
138: queueing problem because neither of these requirements are met.
139:
140: We show, however, that with a hybrid approach that employs both a normal
141: priority queue and a bounded priority queue we can ensure all operations
142: on the queue take O(1) time. We make use of the facts that for
143: molecular dynamics simulations:
144:
145: (i) The time associated with an event to be added to the queue is
146: always later than the time associated with the last event removed from
147: the top of the queue. That is,
148: %
149: \begin{equation}
150: t -t_{\rm last} \ge 0,
151: \label{tge}
152: \end{equation}
153: %
154: where $t$ is the time associated with the event to be added to the queue
155: and $t_{\rm last}$ is the last event removed from the top of the queue.
156:
157: (ii) There exists a constant, $\Delta t_{\rm max}$ such that
158: %
159: \begin{equation}
160: t-t_{\rm last} < \Delta t_{\rm max}.
161: \label{tmax}
162: \end{equation}
163: %
164: We call a priority queue which supports such events a BIPQ (Bounded
165: Increasing Priority Queue).
166:
167: \section{Approach}
168:
169: The basic idea is to:
170:
171: (i) perform a gross sort of the events using an array
172: of linear lists and
173:
174: (ii) to use a binary tree to perform a fine sort of only those events which are
175: currently candidates for the event with the shortest time.
176:
177:
178: More specifically, our priority queue is composed of the following
179: components:
180:
181: 1. An array, $A$, of $n$ linear lists $l_i$, $0 \le i < n $. (Section
182: \ref{choose} below discusses how to determine the the size $n$ of the
183: array.) The array is treated in a circular manner. That is, the last
184: linear list in the array is followed logically by the first linear list.
185: We implement each linear list as a doubly linked list.
186:
187: 2. A binary tree which is used to implement a conventional priority queue.
188:
189:
190: We also maintain two additional quantities: the {\it current index},
191: $i^*$, and $i_0$ a {\it base index} associated with the queue.
192: Initially, all linear lists and the binary tree are empty and $i^*$ and
193: $i_0$ are $0$.
194:
195:
196: \section{Queue Operation}
197: \label{QueueOperation}
198:
199: Here we describe how operations on the queue are implemented using the
200: data structures described above.
201:
202: (i) Addition. Events are added to either one of the linear lists or to
203: the binary tree as follows: An index $i$ for the event to be
204: added is determined by
205: %
206: \begin{equation}
207: i = \lfloor s*t-i_0 \rfloor,
208: \end{equation}
209: %
210: where: $t$ is the time associated with the event; $i_0$ is the base
211: index; and $s$ is a {\it scale factor} the value of which is such the
212: binary tree never contains more than a relatively small number of events
213: ( $\approx 10-20$). If $i$ is equal to the {\it current index }, $i^*$,
214: the event is added to the binary tree, otherwise it is added to linear
215: list $l_i$.
216:
217: (ii) Identification of the event with shortest time. The event with the
218: shortest time is simply the root of the binary tree, as is the case with
219: a normal priority queue implemented using a binary tree. If a request
220: is made for the event with the smallest time and the binary tree is
221: empty, the current index is incremented by one (wrapping around to
222: $i^*=0$ if we reach the end of the array) and all events in the linear
223: list $l_{i^*}$are inserted in the binary tree. If there are none, we
224: continue to increment $i^*$ until a non-empty linear list is found. If
225: we wrap around to the beginning of the list, $i_0$ is incremented by
226: $n$. We find that in practice, when the binary tree becomes empty the
227: next linear list is always non-empty (see Section \ref{choose} in which
228: we show the distribution of event times).
229:
230:
231: (iii) Deletion of an event. We simply delete the event from the array
232: of linear lists or from the binary tree depending on the structure in which
233: it is located.
234:
235: The fact that the time associated with an event to be added to the queue
236: is always greater than or equal to the time associated with the last event
237: removed allows us to use the array of linear lists in a circular
238: fashion.
239:
240: The requirement that there exists a constant, $\Delta t_{max}$ such that
241: %
242: \begin{equation}
243: t - t_{\rm last} > \Delta t_{\rm max}
244: %\label{tmax}
245: \end{equation}
246: %
247: allows us to use a finite number of linear lists. The number of linear
248: lists required is proportional to $\Delta t_{\rm max}$. In practice we
249: find that we can always find a reasonable value of $\Delta t_{\rm max}$
250: such that Eq.~(\ref{tmax})~ holds. If a rare event occurs which
251: violates this constraint or we want to use less memory for linear lists
252: causing the constraint to be violated, the event is handled on an
253: exception basis as implemented in the {\it processOverflowList} function
254: in code contained in the Appendix. Alternatively, the application which
255: calls the priority queue code can guarantee that such an event never
256: occurs by creating an earlier fictitious collision with a time which
257: does not violate the constraint.
258:
259: Thus all of the events, except for those deleted before they are placed
260: in the binary tree, will eventually be added to the binary tree, but at
261: any given time the tree, instead of containing O($N$) entries, will
262: contain only a relatively small number of entries. The number of events
263: maintained in the binary tree is only a fraction of the total number of
264: particles $N$ in the system and can be made independent of $N$.
265:
266: Our priority queue is similar to a {\it calendar queue} \cite{Brown};
267: however, the calender queue does not employ a binary tree -- events are
268: sorted in each of the linear lists.
269:
270: \section{How to choose parameters}
271: \label{choose}
272:
273: Two parameters, $n$ the number of linear lists and $s$ the scale factor,
274: must be chosen to specify the implementation of the queue.
275: Operationally, they can be chosen as follows:
276:
277: (i) First, by instrumenting the queue to count the number of events in
278: the binary tree, determine a value of $s$ such that the number of events
279: in the binary tree is relatively small ($\approx 10-20$). Table I. and
280: Fig.~\ref{mem}(a) summarize the values of $s$ we have used for our
281: simulations. The figure is consistent with a scale factor linear in $N$
282: with a different coefficient of linearity dependent on density. Because
283: we use a binary tree to store events with the soonest times, the
284: performance of the algorithm is somewhat insensitive to the choice of
285: $s$. For example, a choice of $s$ which results in a doubling of the
286: number of events in the binary tree results in only one additional level
287: in the tree.
288:
289: (ii) Instrument the queue to find $\Delta t_{\rm max}$, the maximum
290: difference between the time associated with an event to be added and the
291: time associated with the last event removed and set
292: %
293: \begin{equation}
294: n=s* \Delta t_{\rm max}
295: \end{equation}
296: %
297: to ensure that (Eq.~\ref{tmax}) is met. Table I. and Fig.~\ref{mem}(b)
298: summarize the number of linear lists $n$ we have used for our
299: simulations. As with $s$, $n$ is linear in $N$ with a different coefficient of
300: linearity dependent on density. We note that while memory requirements
301: are $O(N)$ as in the conventional implementation of priority queues, the
302: hybrid implementation does require significantly more memory than the
303: conventional implementation due to the memory required for the linear
304: lists. Tradefoffs can be made of cpu time for memory by increasing the
305: scale factor and/or reducing the number of linear lists (resulting in
306: more exception conditions).
307:
308: Figure {\ref{pdist}(a) plots $\langle m_{\hat i} \rangle$ the average
309: number of events with index $\hat i$ versus $\hat i$ for various $N$.
310: Here
311: %
312: \begin{equation}
313: \hat i\equiv (i-i^*+n) \mod n.
314: \end{equation}
315: %
316: That is, $\hat i$ is the distance of $i$ from the current index taking
317: into account the circular nature of the array of linear lists. The data
318: was obtained by sampling the queue many times at regular intervals.
319: With the choice of scale factors shown in Table I we achieve our goal
320: of having $\approx 10-20$ events with index $i=i^*$ and thus in the
321: binary tree. Note that to achieve this, the scale factor increases with
322: increasing $N$ resulting in the cutoff of the distributions also
323: increasing with increasing $N$. (In fact, if the $x$-axis is
324: transformed by $x=x/N$, the plots collapse as shown in
325: Fig.~\ref{pdist}(b) reflecting the fact that the probability
326: distribution of collision times is independent of $N$.) Thus, the
327: number of linear lists required to ensure that Eq.~(\ref{tmax}) holds
328: also increases with $N$. In Fig.~\ref{pBucket0}, we plot the
329: distribution $P(m^*)$ the probability that the number of events with
330: the current index, $i^*$ is $m^*$ versus $m^*$. The distributions are
331: strongly peaked indicating that the number of events in the binary tree
332: do not vary much from the average.
333:
334: \section{Complexity Analysis}
335: \label{complexity}
336:
337: The basic operations involved in the queue are:
338:
339: (i) insertion into and deletion from the linear lists. Use of doubly
340: linked lists allows these operations to be implemented to take O(1) time.
341:
342: (ii) binary tree operations. We use the code of Ref.~\cite{Marin} to
343: implement the binary tree operations. When a leaf representing an item
344: in the priority queue is added to or deleted from the tree, the tree
345: must be traversed from the affected leaf possibly all the way to the
346: root node and adjustments made to reflect the presence or absence of the
347: affected leaf. Thus a bound on the number of levels which must be
348: traversed is $\log_2 m$ where $m$ is the number of items in the priority
349: queue. In Sec.~ \ref{choose} we show that by choosing the scale
350: factor $s$ appropriately, $m$ can be made to be independent of $N$ (and
351: have a relatively small value, $\approx 10-20$). Thus binary tree
352: operations will be O(1).
353:
354: (iii) identification of the next non-empty linear list, after the
355: current linear list is exhausted. As explained in item (ii) of
356: Sec. \ref{QueueOperation}, when the binary tree is empty, we search
357: forward through the array of linear lists until a non-empty list is
358: found. If the number of lists we must search through increases with
359: $N$, this process will not be O(1). We show below that with the proper
360: choice of $s$, the number of lists we must search does not grow with $N$
361: and in fact show that the next linear list after the current one almost
362: always is non-empty. Thus the complexity of identification of the next
363: non-empty list will be O(1).
364:
365: Thus the overall time taken by queue operations per collision is O(1).
366:
367: \section{Experiments /Simulations }
368:
369: We run simulations using both a conventional priority queue and our new
370: hybrid approach. For simplicity the simulation was of identical size
371: hard spheres of radius one and unit mass. The sizes $L$ of the cubic
372: systems are set to maintain equal densities. The parameters of the
373: simulation are as shown in Table I.
374:
375: To demonstrate the performance of our approach, we run simulations for
376: cubic systems at four volume densities $\rho=0.01, 0.12, 0.40$ and $0.70$.
377: The first density represents a rarefied gas and the last density
378: represents a jammed system. The jamming density for hard sphere systems
379: is $\approx 0.64$ \cite{Donev2004}. For both the conventional priority
380: queue and the hybrid queue we used the binary tree code from
381: Ref.~\cite{Marin}.
382:
383: Figure \ref{pcbt} shows the time taken for $10^7$ collisions for queue
384: operations with both a conventional priority queue and the hybrid queue.
385: As expected, the time for the conventional priority queue increases as
386: $\log N$ while the time for the hybrid queue is essentially constant.
387:
388: There is, however, a slight upward trend in the hybrid queue results.
389: To determine if this trend is a feature of the algorithm or of the
390: the benchmark environment, we proceed as follows.
391:
392: We first study the only two places in the hybrid code where looping is
393: involved:
394:
395: (i) in the {\it updateCBT} function of Ref. ~\cite{Marin} we loop as we
396: traverse the binary tree. If we traverse more levels as the N grows, the
397: algorithm will not be O(1). To explore this possibility, we instrument
398: the function to count the number of number of levels we traverse in the
399: tree. The results are shown in Fig.~ \ref{pcbt}. The number of loop
400: iterations is essentially constant, independent of $N$.
401:
402: (ii) in the {\it deleteFirstFromEventQ} function, after the
403: priority queue for the current linear list is exhausted, we loop until we
404: find a non-empty list. If the number of lists we must examine before we
405: find the first non-empty list grows with the system size, the algorithm
406: will not be O(1). We examine this possibility by counting the number of
407: times we encounter an empty list and find that on average the
408: probability of encountering an empty list does not grow with $N$ and
409: that the probability of encountering an empty list is very small: we
410: encounter an empty list only $10^{-4}$ of the times after exhausting the
411: priority queue.
412:
413: Having ruled out dependence of the number of loop iterations on $N$ as
414: the source of the upward trend in the execution times, we now consider
415: whether the larger memory needed as $N$ increases is the cause of the
416: trend. All modern computer processors employ high speed cpu cache
417: memory to reduce the average time to access memory
418: \cite{Handy,Hennessy}. In fact, the processor we use in our
419: simulations, the AMD Opteron, employs a two-level memory cache (64 KB
420: level 1 cache, 1 MB level 2 cache) \cite{Opteron}. A similar cache
421: structure is used in the Intel Xeon processor \cite{Xeon}. Because
422: memory caches are finite size, if the memory access is random the larger
423: the memory used by a program, the lower the probability that data will
424: be found in the cache resulting in slower instruction execution. The
425: effect of cache in benchmark runtimes has been studied in
426: Ref. ~\cite{Saavedra}. We study the effect of the finite size of the
427: cache in our system as follows: Instead of running the molecular
428: dynamics simulations, we run a small test program which randomly
429: accesses the data structures used by the molecular dynamics simulations.
430: For each value of $N$, the test program executes exactly the same number
431: of instructions but uses data structures of the size used by the
432: molecular dynamics simulations for that value of $N$. The results are
433: shown in Fig.~\ref{pcbt} and show an upward trend similar to that of the
434: simulation results for all of the densities studied.
435:
436: The above results thus suggest that the complexity of the hybrid
437: algorithm is, in fact, O(1) and that the upward trend in the results is
438: due to the finite size of the high speed memory cache.
439:
440: \section{Discussion and Summary}
441:
442: We have defined a new abstract data type, the Bounded Increasing
443: Priority Queue (BIPQ) having the same operations as a conventional
444: priority queue but which takes advantage of the fact that the value
445: associated with an item to be added to the queue has the properties that:
446: (i) the value is greater than or equal to the value associated with the
447: last item removed from the top of the queue and (ii) the value minus the
448: value of the last item removed from the top of the queue is bounded.
449: These properties are obeyed for events in event driven molecular dynamic
450: simulations. We implement a BIPQ using a hybrid approach incorporating
451: a conventional priority queue (implemented with a binary tree) and a
452: bounded priority queue. All operations on the BIPQ take an average O(1)
453: time per collision. This type of queue should provide performance
454: speedups for molecular dynamics simulations in which the event queue is
455: the bottleneck.
456:
457:
458: \section{Acknowledgments}
459:
460: We thank Sergey Buldyrev, Pradeep Kumar, Sameet Sreenivasan, and Brigita
461: Urbanc for helpful discussions. We ONR, NSF and NIH for support.
462:
463: \newpage
464: \appendix
465: \begin{center}
466: \bf {APPENDIX}
467: \end{center}
468:
469: The following code implements the hybrid queue proposed here. The calls
470: to Insert and Delete are to the functions contained in
471: Ref.~\cite{Marin}, which update NP and the complete binary tree, CBT.
472: Any code providing the same functions could be substituted for Insert
473: and Delete.
474:
475: \begin{singlespace}
476: \begin{verbatim}
477:
478: #define nlists 50000
479: #define scale 50
480:
481: typedef struct
482: {
483: int next;
484: int previous;
485: int p;
486: int q;
487: int c;
488: double t;
489: unsigned int qMarker;
490: int qIndex;
491: statusType status;
492: }eventQEntry;
493:
494: eventQEntry * eventQEntries;
495: double baseIndex;
496:
497: int * CBT; /* complete binary tree
498: implemented in an array of
499: 2*N integers */
500: int NP; /*current number of particles*/
501:
502: int linearLists[nlists+1];/*+1 for overflow*/
503: int currentIndex;
504:
505:
506: //----------------------------------------
507: int insertInEventQ(int p)
508: {
509: int i,oldFirst;
510: eventQEntry * pt;
511: pt=eventQEntries+p; /* use pth entry */
512:
513: i=(int)(scale*pt->t-baseIndex);
514: if(i>(nlists-1)) /* account for wrap */
515: {
516: i-=nlists;
517: if(i>=currentIndex-1)
518: {
519: i=nlists; /* store in overflow list */
520: }
521: }
522: pt->qIndex=i;
523:
524: if(i==currentIndex)
525: {
526: Insert(p); /* insert in PQ */
527: }
528: else
529: {
530: /* insert in linked list */
531:
532: oldFirst=linearLists[i];
533: pt->previous=-1;
534: pt->next=oldFirst;
535: linearLists[i]=p;
536:
537: if(oldFirst!=-1)
538: eventQEntries[oldFirst].previous=p;
539: }
540: return p;
541: }
542:
543: //----------------------------------------
544:
545: processOverflowList()
546: {
547: int i,e,eNext;
548: i=nlists; /* overflow list */
549: e=linearLists[i];
550: linearLists[i]=-1; /* mark empty; we will
551: treat all entries and may re-add some */
552: while(e!=-1)
553: {
554: eNext=eventQEntries[e].next; /* save next */
555: insertInEventQ(e); /* try add to regular list now */
556: e=eNext;
557: }
558: }
559:
560: //---------------------------------------
561:
562: void deleteFromEventQ(int e)
563: {
564: int prev,next,i;
565: eventQEntry * pt=eventQEntries+e;
566:
567: i=pt->qIndex;
568: if(i==currentIndex)
569: Delete(e); /* delete from pq */
570: else
571: {
572: /* remove from linked list */
573:
574: prev=pt->previous;
575: next=pt->next;
576: if(prev==-1)
577: linearLists[i]=pt->next;
578: else
579: eventQEntries[prev].next=next;
580:
581: if(next!=-1)
582: eventQEntries[next].previous=prev;
583: }
584: }
585:
586: //---------------------------------------
587:
588: int deleteFirstFromEventQ()
589: {
590: int e;
591:
592: while(NP==0)/*if priority queue exhausted*/
593: {
594: /* change current index */
595:
596: currentIndex++;
597: if(currentIndex==nlists)
598: {
599: currentIndex=0;
600: baseIndex+=nlists;
601: processOverflowList();
602: }
603:
604: /* populate pq */
605:
606: e=linearLists[currentIndex];
607: while(e!=-1)
608: {
609: Insert(e);
610: e=eventQEntries[e].next;
611: }
612: linearLists[currentIndex]=-1;
613: }
614:
615: e=CBT[1]; /* root contains shortest
616: time entry */
617:
618: Delete(CBT[1]);
619: return e;
620: }
621: //---------------------------------------
622:
623:
624: \end{verbatim}
625: \end{singlespace}
626:
627: \newpage
628:
629: \begin{thebibliography}{99}
630:
631: \bibitem{Alder}B.~J.~Alder and T.~E.~Wainwright, Studies in molecular
632: dynamics. I. General method, J. Chem. Phys. 31 (1959) 459.
633:
634: \bibitem{Allen87} M. P. Allen and D. J. Tildesley, Computer Simulations of
635: Liquids, Oxford Science Publications, (1987).
636:
637: \bibitem{Allen89} M. P. Allen, D. Frenkel, J.Talbot, Molecular dynamics
638: simulation using hard particles, Comput. Phys. Rep. 9 (1989) 301.
639:
640: \bibitem{Opteron} AMD Opteron Product Data Sheet. Publication 23932. (2004)
641:
642: \bibitem{Becker} O. M. Becker, A. D. Mackerell, B. Roux, B. M. Becker(Eds.),
643: Computational Biochemistry and Biophysics, Marcel Dekker (2001).
644:
645: \bibitem{Bentley} J. Bentley, Programming pearls: Thanks, heaps.
646: Commun. ACM (1985) 245-250.
647:
648: \bibitem{Blackstone} J. H. Blackstone, G. L. Hogg and D. T. Phillips, A
649: two-list synchronization procedure for discrete event simulation,
650: Comm. ACM 24, (1981) 825-829.
651:
652: \bibitem{Brown78} M. R. Brown, The analysis of a practical and nearly
653: optimal priority queue algorithms, SIAM J. Comput. 7 (1978) 298-319.
654:
655: \bibitem{Brown} R. Brown, Calendar queues. Commun. ACM 31, 10
656: (1988), 1220-1227.
657:
658: \bibitem{Daggett} V. Daggett and A. R. Fersht, Transition states in
659: protein folding, in Mechanisms of Protein Folding, R. H. Pain (Ed.),
660: Oxford University Press, (2000).
661:
662: \bibitem{Donev2004} A. Donev, S. Torquato, F. H. Stillinger, and
663: R. Connelly, Jamming in hard sphere and disk packings, J. Appl. Phys. 95
664: (2004) 989-999.
665:
666: \bibitem{Donev1} A.~Donev, S.~Torquato, and F.~H.~Stillinger. Neighbor
667: list collision-driven molecular dynamics simulation for nonspherical
668: hard particles. I. Algorithmic details, J. Comput. Phys. 202(2005)
669: 737-764.
670:
671: \bibitem{Erpenbeck} J. J. Erpenbeck and W. W. Wood, in Statistical mechanics
672: B: Modern theoretical chemistry,B.J.Berne (Ed.), Molecular Dynamics
673: Techniques for Hard Core Systems, vol.6, Institute of Physics
674: Publishing, London, (1977) 1-40.
675:
676: \bibitem{Francon} J. Francon, G. Viennot, and J. Vuillemin, Description
677: and analysis of an efficient priority queue representation. In
678: Proceeding of the 19th Annual Symposium on Foundations of Computer
679: Science. IEEE (1978) 1-7.
680:
681: \bibitem{Fredman} M. L. Fredman, R. Sedgewick, D. Sleator and
682: R. Tarjan, The pairing heap: A new form of self-adjusting
683: heap. Algorithmica 1 (1986) 111-129.
684:
685: \bibitem{Frenkel} D. Frenkel and B. Smit, Understanding Molecular Simulation,
686: Academic Press, New York, (2002).
687:
688: \bibitem{Handy} H. Handy, Cache Memory Book, Academic Press (1998).
689:
690: \bibitem{Hendriksen77} J. O. Henriksen, An improved events list
691: algorithm. In Proceedings of the 1977 Winter Simulation Conference, IEEE
692: (1977) 547-557.
693:
694: \bibitem{Hendriksen83} J. O. Henriksen, Event list management - A
695: tutorial. In Proceeding of the 1983 Winter Simulation Conference, IEEE,
696: (1983) 543-551.
697:
698: \bibitem{Hennessy} J. L. Hennessy and D. A. Patterson, Computer
699: Architecture: A Quantitative Approach, Elsevier (2002).
700:
701: \bibitem{Xeon} Intel Xeon Processor with 800 MHZ System Bus
702: Datasheet. Document Number 302355-001 (2004).
703:
704: \bibitem{Jones} D. W. Jones, An empirical comparison of priority-queue
705: and event-set implementations. Comm. ACM 29 (1986) 300-311.
706:
707: \bibitem{Kingston} J. H. Kingston, The amortized complexity of
708: Henriksens algorithm. Comput. Sci. Tech. Rep.85-06, Dept. of Computer
709: Science, University of Iowa (1985).
710:
711: \bibitem{Knuth} D. E. Knuth, The Art of Computer Programming Vol. 3,
712: Sorting and Searching, Addison-Wesley (1973).
713:
714: \bibitem{Krantz} A. T. Krantz, Analysis of an efficient algorithm for the
715: hard-sphere problem, TOMACS 6 (1996) 185-229.
716:
717: \bibitem{Lubachevsky}B. D. Lubachevsky, How to simulate billiards and similar
718: systems, J. Comput. Phys. 94 (1991) 255.
719:
720: \bibitem{Marin93} M. Mar\'in, D. Risso, and P. Cordero, Efficient
721: algorithms for many-body hard particle molecular dynamics,
722: J. Comput. Phys. 109 (1993) 306-317.
723:
724: \bibitem{Marin} M. Mar\'in and P. Cordero, An empirical assessment of
725: priority queues in event-driven molecular dynamics
726: simulation. Comput. Phys. Comm. 92 (1995) 214-224.
727:
728: \bibitem{Nix} R. Nix An evaluation of pagodas. Res. Rep. 164, Dept. of
729: Computer Science, Yale Univ.
730:
731: \bibitem{Rapaport80} D. C. Rapaport, The event scheduling problem in
732: molecular dynamics simulation, J. Comput. Phys. 34 (1980)184-201.
733:
734: \bibitem{Rapaport} D. C. Rapaport, Art of Molecular Dynamics Simulation,
735: Cambridge University Press, (2004).
736:
737: \bibitem{Sadus} R. J. Sudus, Molecular Simulation of Fluids, Elsevier
738: (1999).
739:
740: \bibitem{Saavedra} R. H. Saavedra and A. J. Smith, Measuring cache and TLB
741: performance and their effects on benchmark runtimes, IEEE Trans. Comp. 44
742: (1995)1223-1235.
743:
744: \bibitem{Shida} K. Shida, Y. Anzai, Reduction of the event-list for
745: molecular dynamic simulation, Comput. Phys. Commun. 69 (1992) 317-329.
746:
747: \bibitem{Sleator83} D. D. Sleator and R. E. Tarjan, Self-adjusting binary
748: trees. In Proceedings of the ACM SIGACT Symposium on theory of Computing
749: (1983) 235-245.
750:
751: \bibitem{Sleator86} D. D. Sleator and R. E. Tarjan, Self-adjusting heaps,
752: SIAM J. Comput. 15 (1986) 52-69.
753:
754: \bibitem{Tarjan} R. E. Tarjan and D. D. Sleator, Self-adjusting binary
755: search trees, JACM 32, (1985) 652-686.
756:
757: \bibitem{Urbanc} B. Urbanc, J. M. Borreguero, L. Cruz, and
758: H. E. Stanley, Ab initio discrete molecular dynamics approach to protein
759: folding and aggregation, Methods in Enzymology (2006) (in press).
760:
761: \bibitem{Vuillemin} J. A. Vuillemin A data structure for manipulating
762: priority queues. Commun. ACM 21 (1978) 309-315.
763:
764: \bibitem{Williams} J. W. J. Williams Algorithm 232:
765: Heapsort. Commun. ACM 7 (1964) 347-348.
766:
767: \end{thebibliography}
768:
769:
770:
771: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
772: %\begin{widetext}
773: \begin{table}[htb]
774: \caption{Parameters of molecular dynamics simulations.}
775: ~
776: \begin{tabular}{ccc}
777:
778: {Number} & {Scale Factor}& {Number} \\
779: {of Particles} & & {of lists} \\
780: ~
781: $N$ & $s$ & $n$ \\ \hline
782: \multicolumn 3 c {$\rho=0.01$} \\ \hline
783: $1000$ & $100$ & $25000$ \\
784: $8000$ & $700$ & $200000$ \\
785: $64000$ & $5000$ & $2.5 \times 10^6$ \\
786: $512000$ & $45000$ & $25 \times 10^6$ \\ \hline
787: \multicolumn 3 c {$\rho=0.12$} \\ \hline
788: $1000$ & $50$ & $50000$ \\
789: $8000$ & $500$ & $400000$ \\
790: $64000$ & $3400$ & $5 \times 10^6$ \\
791: $512000$ & $25000$ & $50 \times 10^6$ \\ \hline
792: \multicolumn 3 c {$\rho=0.4$} \\ \hline
793: $1372$ & $1000$ & $250000$ \\
794: $8788$ & $7500$ & $2 \time 10^6$ \\
795: $70304$ & $60000$ & $16 \times 10^6$ \\
796: $530604$ & $500000$ & $130 \times 10^6$ \\ \hline
797: \multicolumn 3 c {$\rho=0.7$} \\ \hline
798: $1372$ & $15000$ & $500000$ \\
799: $8788$ & $75000$ & $200000$ \\
800: $70304$ & $500000$ & $35 \times 10^6$ \\
801: $530604$ & $4 \times 10^6$ & $300 \times 10^6$ \\ \hline
802: \end{tabular}
803: \end{table}
804: %\end{widetext}
805:
806: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
807:
808:
809: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
810: \begin{figure}
811: \centerline{
812: \xsize
813: \epsfclipon
814: \epsfbox{pdista.ps}
815: }
816:
817: \centerline{
818: \xsize
819: \epsfclipon
820: \epsfbox{pdistb.ps}
821: }
822:
823:
824: \caption{(a) For $\rho=0.12$, the average number of events $\langle m_{\hat i} \rangle$
825: with index $\hat i$ versus $\hat i$(the distance of $i$ from the
826: current index $i^*$) for (from left to right $N$=1000, 8000, and 64000.
827: (b) Same as (a) with the x-axis scaled by $1/N$ which results in a
828: collapse of the plots. }
829: \label{pdist}
830: \end{figure}
831: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
832:
833: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
834: \begin{figure}
835: \centerline{
836: \xsize
837: \epsfclipon
838: \epsfbox{pBucket0.ps}
839: }
840: \caption{For $\rho=0.12$, $P(m^*)$, the probability that the number of events in linear
841: list $i=0$ is $m^*$, vs. $m^*$ for $N=$1000(squares), 8000(triangles),
842: and 64000(disks). }
843: \label{pBucket0}
844: \end{figure}
845:
846: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
847: \begin{figure}
848: \centerline{
849: \xsize
850: \epsfclipon
851: \epsfbox{pms.ps}
852: }
853: \centerline{
854: \xsize
855: \epsfclipon
856: \epsfbox{pmn.ps}
857: }
858:
859: \caption{(a) Scale factor, $s$, vs $N$ for (from bottom to top) $\rho=0.12,
860: 0.01, 0.4$ and $0.7$. (b) Number of linear lists, $n$, vs $N$ for (from bottom to top) $\rho=0.01,
861: 0.12, 0.4$ and $0.7$.}
862: \label{mem}
863: \end{figure}
864:
865:
866:
867:
868: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
869: \begin{widetext}
870:
871: \begin{figure}
872: \centerline{
873: \xsize
874: \epsfclipon
875: \epsfbox{pa.ps}
876: \xsize
877: \epsfclipon
878: \epsfbox{pb.ps}
879: }
880:
881: \centerline{
882: \xsize
883: \epsfclipon
884: \epsfbox{pc.ps}
885: \xsize
886: \epsfclipon
887: \epsfbox{pd.ps}
888: }
889:
890: \caption{Processing time for queue operations vs. $N$, the number of
891: particles in the system. (a) Volume density $ \rho =0.01$. The higher
892: solid line is the processing time for queue operation for a normal
893: priority queue; the lower solid line is for the hybrid queuing system
894: introduced here. The dashed line represents the benchmark test timing
895: to execute a fixed number of instructions independent of $N$ but with
896: memory sizes corresponding to the memory used for the hybrid system.
897: The dotted line represents the number of tree levels traversed ($ \times
898: 10^{-7})$ in the binary tree for the hybrid system. (b),(c) and (d)
899: Same as (a) for $\rho=0.12, 0.4$, and $0.7$, respectively.}
900: \label{pcbt}
901: \end{figure}
902:
903:
904: \end{widetext}
905: }
906:
907: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
908:
909: \end{document}
910:
911: