0104:cs0104008/perf.tex

1:

2: \section{Performance}

3: \label{sec:perf}

4:

5: The overhead in computing time generated by an event indexing system

6: such as either the event directory system or the tag database must be

7: held small in comparison to the time needed to read and analyse events

8: from the sequential event store. Both the event directory system and

9: the tag database were designed with this consideration in mind and their

10: performance is being

11: monitored regularly.

12:

13: Table~\ref{tab:evdirtime} shows measurements of the CPU time overhead

14: from the event directory system. These times were measured on a

15: Silicon Graphics Challenge XL computer~\cite{SGI} with R10000

16: processors running at 194~MHz.

17: \begin{table*}[htbp]

18:   \begin{center}

19:     \begin{tabular}{|c|c|}\hline

20:       Selection & Time \\

21:       \hline

22:       Sequential Read                        & 200 s \\

23:       Event Directory, No Selection          & 190 s \\

24:       Event Directory, Selection 1 out of 2  & 190 s \\

25:       Event Directory, Selection 1 out of 20 & 260 s \\

26:       \hline

27:     \end{tabular}

28:   \end{center}

29: \caption{Computing time used for reading 10000 events from disk with the ZEUS

30:   Event Directory system. The times are CPU time measured on a Silicon

31:   Graphics Challenge XL computer with R10000 processors running at a

32:   clock rate of 194 MHz. Events were read, but not analysed.}

33: \label{tab:evdirtime}

34: \end{table*}

35: 10000 events were read in four different ways.

36:

37: In the first measurement (labelled ``Sequential Read'' in

38: Table~\ref{tab:evdirtime}), all events were read

39: sequentially from the event store without using the event

40: directory system. About $20~{\rm ms}$ of CPU time were required to read

41: one event. In the second case (``Event Directory, No

42: Selection''), events were accessed using the event directory system

43: without applying any selection. The time required was slightly smaller

44: than when the data were accessed without using the event directories.

45: This is due to the fact that in addition to the event data the sequential

46: event store contains test and calibration information which in the first

47: measurement was read and ignored. The event

48: directory system skips unused non-event data without ever reading it.

49: In the third measurement (``Event Directory, Selection 1 out of 2''),

50: a selection was applied which picked approximately one out of every

51: two events. No significant overhead was observed in this case. Only when a

52: stronger selection is applied does the event directory overhead become

53: considerable. This can

54: be seen in the fourth measurement (``Event Directory, Selection 1

55: out of 20''), where a selection was applied which selected

56: approximately one out of every twenty events. The overhead here compared

57: to the previous measurement was 70 seconds.

58: Since 200,000 events had to be scanned to select 10,000 events, this

59: corresponds to a CPU time of $0.3~\rm{ms}$ per scanned event.

60:

61: The performance of the tag database is illustrated in

62: Figures~\ref{fig:readrate}(a) and (b).

63: \begin{figure}[htbp]

64:   \begin{center}

65:     \epsfig{file=pix/readrate1.eps,height=11cm,angle=90}

66:     \epsfig{file=pix/readrate2.eps,height=11cm,angle=90}

67:     \caption{The rate of events processed using the ZEUS tag database

68:       (a) as a function of the number of variables used inside the

69:       query and (b) as a function of the total number of events stored

70:       in the tag database. The measurements were done with a prototype

71:       implementation of the tag database using Objectivity/DB version

72:       3.8 on a Silicon Graphics Challenge XL machine with R4400

73:       processors running at a clock rate of 150 MHz.}

74:     \label{fig:readrate}

75:   \end{center}

76: \end{figure}

77: Figure~\ref{fig:readrate}(a) shows the rate of events processed by the

78: tag database as a function of the number of variables used in the

79: query. A maximum number of 6 variables was used for the measurement

80: since this is a typical number of variables used for data analyses.

81: For an empty query a rate of 5000 events per second is reached. For

82: non-empty queries the rate decreases only weakly with increasing the

83: number of variables in the query, remaining above 3500 events per

84: second, i.e. $0.28~{\rm ms}$ per event. This shows that the

85: performance of the tag database is similar to that of the event

86: directory system, even though much more information is stored in the

87: database.  Figure~\ref{fig:readrate}(b) shows the rate of events

88: processed as a function of the total number of events contained in the

89: tag database.  In this case a query involving two variables was used.

90: No dependence of the rate on the size of the database is observed.

91: These two observations confirm that the tag database has a CPU time

92: overhead which is even smaller than that of the event directory method.

93: Thus the tag database exceeds the required performance.

94:

95: The gain in analysis efficiency achieved using either event

96: directories or the tag database is illustrated in

97: Table~\ref{tab:perfgain}.

98: \begin{table*}[htbp]

99:   \begin{center}

100:     \begin{tabular}{|c|c|c|c|c|}\hline

101:       System & Selection & Events scanned & Events selected & Time \\

102:       \hline

103:       Sequential Data & No Selection              & 25000 & 25000 &  485 s \\

104:       \hline

105:       Event Directory & One electron found        & 25000 & 11000 &  203 s \\

106:       Tag Database    & One electron found        & 25000 & 11000 &  197 s \\

107:       \hline

108:       Event Directory & $E_T > 30~{\rm GeV}$      & 45000 &  2750 &  793 s \\

109:       Tag Database    & $E_T > 30~{\rm GeV}$      & 45000 &  2750 &  105 s \\

110:       \hline

111:     \end{tabular}

112:   \end{center}

113: \caption{Comparison of computing time required for different

114:   selections for event directories and tag database. The times are

115:   CPU time measured on  a Silicon Graphics Challenge XL computer with

116:   R10000 processors running at a clock rate of 194 MHz.Events were

117:   read but not analysed.}

118: \label{tab:perfgain}

119: \end{table*}

120: The first row shows the CPU time required to read 25,000 events. About

121: $20~{\rm ms}$ are needed to read one event from the sequential event

122: store. The second and third rows show the times required to select the

123: 11,000 events that contain at least one electron candidate from a total

124: sample of 25,000 events using either event directories or the tag

125: database. The CPU requirement is almost the same in both cases. It

126: corresponds again to about $20~{\rm ms}$ per event. The fourth and

127: fifth rows give the time required to select the 2,750 events with

128: transverse energy greater than $30~{\rm GeV}$ from a total of 45,000

129: events. The result from using event directories is shown in the fourth

130: row. In this case the total time corresponds to the time required to

131: read all 45,000 events from the sequential event store since the event

132: directories have no precalculated flags for the query $E_T > 30~{\rm

133:   GeV}$. The fifth row shows the result when using the tag database.

134: In this case the tag database is almost an order of magnitude faster.

135: This is possible since the tag database stores the value of the

136: transverse energy for each event.  Hence the time used is governed by

137: the time needed to read in the selected events only. This

138: illustrates the power of the tag database.

139: