cs0104008/perf.tex
1: 
2: \section{Performance}
3: \label{sec:perf}
4: 
5: The overhead in computing time generated by an event indexing system
6: such as either the event directory system or the tag database must be
7: held small in comparison to the time needed to read and analyse events
8: from the sequential event store. Both the event directory system and
9: the tag database were designed with this consideration in mind and their
10: performance is being
11: monitored regularly.
12: 
13: Table~\ref{tab:evdirtime} shows measurements of the CPU time overhead
14: from the event directory system. These times were measured on a
15: Silicon Graphics Challenge XL computer~\cite{SGI} with R10000
16: processors running at 194~MHz.
17: \begin{table*}[htbp]
18:   \begin{center}
19:     \begin{tabular}{|c|c|}\hline
20:       Selection & Time \\
21:       \hline
22:       Sequential Read                        & 200 s \\
23:       Event Directory, No Selection          & 190 s \\
24:       Event Directory, Selection 1 out of 2  & 190 s \\
25:       Event Directory, Selection 1 out of 20 & 260 s \\
26:       \hline
27:     \end{tabular}
28:   \end{center}
29: \caption{Computing time used for reading 10000 events from disk with the ZEUS
30:   Event Directory system. The times are CPU time measured on a Silicon
31:   Graphics Challenge XL computer with R10000 processors running at a
32:   clock rate of 194 MHz. Events were read, but not analysed.}
33: \label{tab:evdirtime}
34: \end{table*}
35: 10000 events were read in four different ways. 
36: 
37: In the first measurement (labelled ``Sequential Read'' in
38: Table~\ref{tab:evdirtime}), all events were read
39: sequentially from the event store without using the event
40: directory system. About $20~{\rm ms}$ of CPU time were required to read
41: one event. In the second case (``Event Directory, No
42: Selection''), events were accessed using the event directory system
43: without applying any selection. The time required was slightly smaller
44: than when the data were accessed without using the event directories.
45: This is due to the fact that in addition to the event data the sequential
46: event store contains test and calibration information which in the first
47: measurement was read and ignored. The event
48: directory system skips unused non-event data without ever reading it.
49: In the third measurement (``Event Directory, Selection 1 out of 2''),
50: a selection was applied which picked approximately one out of every
51: two events. No significant overhead was observed in this case. Only when a
52: stronger selection is applied does the event directory overhead become
53: considerable. This can
54: be seen in the fourth measurement (``Event Directory, Selection 1
55: out of 20''), where a selection was applied which selected
56: approximately one out of every twenty events. The overhead here compared
57: to the previous measurement was 70 seconds.
58: Since 200,000 events had to be scanned to select 10,000 events, this
59: corresponds to a CPU time of $0.3~\rm{ms}$ per scanned event.
60: 
61: The performance of the tag database is illustrated in
62: Figures~\ref{fig:readrate}(a) and (b). 
63: \begin{figure}[htbp]
64:   \begin{center}
65:     \epsfig{file=pix/readrate1.eps,height=11cm,angle=90}
66:     \epsfig{file=pix/readrate2.eps,height=11cm,angle=90}
67:     \caption{The rate of events processed using the ZEUS tag database
68:       (a) as a function of the number of variables used inside the
69:       query and (b) as a function of the total number of events stored
70:       in the tag database. The measurements were done with a prototype
71:       implementation of the tag database using Objectivity/DB version
72:       3.8 on a Silicon Graphics Challenge XL machine with R4400
73:       processors running at a clock rate of 150 MHz.}
74:     \label{fig:readrate}
75:   \end{center}
76: \end{figure}
77: Figure~\ref{fig:readrate}(a) shows the rate of events processed by the
78: tag database as a function of the number of variables used in the
79: query. A maximum number of 6 variables was used for the measurement
80: since this is a typical number of variables used for data analyses.
81: For an empty query a rate of 5000 events per second is reached. For
82: non-empty queries the rate decreases only weakly with increasing the
83: number of variables in the query, remaining above 3500 events per
84: second, i.e. $0.28~{\rm ms}$ per event. This shows that the
85: performance of the tag database is similar to that of the event
86: directory system, even though much more information is stored in the
87: database.  Figure~\ref{fig:readrate}(b) shows the rate of events
88: processed as a function of the total number of events contained in the
89: tag database.  In this case a query involving two variables was used.
90: No dependence of the rate on the size of the database is observed.
91: These two observations confirm that the tag database has a CPU time
92: overhead which is even smaller than that of the event directory method.
93: Thus the tag database exceeds the required performance.
94: 
95: The gain in analysis efficiency achieved using either event
96: directories or the tag database is illustrated in
97: Table~\ref{tab:perfgain}.
98: \begin{table*}[htbp]
99:   \begin{center}
100:     \begin{tabular}{|c|c|c|c|c|}\hline
101:       System & Selection & Events scanned & Events selected & Time \\
102:       \hline
103:       Sequential Data & No Selection              & 25000 & 25000 &  485 s \\
104:       \hline
105:       Event Directory & One electron found        & 25000 & 11000 &  203 s \\
106:       Tag Database    & One electron found        & 25000 & 11000 &  197 s \\
107:       \hline
108:       Event Directory & $E_T > 30~{\rm GeV}$      & 45000 &  2750 &  793 s \\
109:       Tag Database    & $E_T > 30~{\rm GeV}$      & 45000 &  2750 &  105 s \\
110:       \hline
111:     \end{tabular}
112:   \end{center}
113: \caption{Comparison of computing time required for different
114:   selections for event directories and tag database. The times are
115:   CPU time measured on  a Silicon Graphics Challenge XL computer with 
116:   R10000 processors running at a clock rate of 194 MHz.Events were 
117:   read but not analysed.}
118: \label{tab:perfgain}
119: \end{table*}
120: The first row shows the CPU time required to read 25,000 events. About
121: $20~{\rm ms}$ are needed to read one event from the sequential event
122: store. The second and third rows show the times required to select the
123: 11,000 events that contain at least one electron candidate from a total
124: sample of 25,000 events using either event directories or the tag
125: database. The CPU requirement is almost the same in both cases. It
126: corresponds again to about $20~{\rm ms}$ per event. The fourth and
127: fifth rows give the time required to select the 2,750 events with
128: transverse energy greater than $30~{\rm GeV}$ from a total of 45,000
129: events. The result from using event directories is shown in the fourth
130: row. In this case the total time corresponds to the time required to
131: read all 45,000 events from the sequential event store since the event
132: directories have no precalculated flags for the query $E_T > 30~{\rm
133:   GeV}$. The fifth row shows the result when using the tag database.
134: In this case the tag database is almost an order of magnitude faster.
135: This is possible since the tag database stores the value of the
136: transverse energy for each event.  Hence the time used is governed by
137: the time needed to read in the selected events only. This
138: illustrates the power of the tag database.
139: