1:
2: \section{The Tag Database}
3: \label{sec:tagdb}
4:
5: While event directories work very efficiently, the event selection power is
6: limited to boolean combinations of precalculated event flags. Furthermore,
7: it is only possible to test those selection conditions which were
8: considered when the event flags were calculated. Also, if quantities on
9: which the selection condition were based have changed -- e. g. due to
10: recalibration -- the selection may be invalid. This happens quite
11: frequently since analysis methods and detector understanding are evolving
12: quickly. The event flags must be recalculated after every change in the
13: selection procedure. For large data samples this amounts to a major effort
14: that can be afforded only a few times per year.
15:
16: The number of recalculations can be reduced if more information is
17: stored with each event in the event directory. This information would then
18: be updated when for instance calibration has changed. As more information
19: is stored, the event selection becomes more flexible. For instance, rather
20: than setting a flag when a vertex is found, the reconstructed position of
21: the vertex can be stored instead. In this case, with appropriate database
22: technology, one can select not only those events that have a vertex but
23: also those with a vertex within certain bounds.
24:
25: Such a system is known as a {\it tag database} in order to stress its
26: database character and its indexing capabilities and thus distinguish it
27: from the simpler event directories.
28:
29: The storage requirement for a tag database which stores 200 32-bit
30: quantities for each event in a data sample of 100 million events is 80
31: gigabytes. In order to be efficient such a system requires advanced
32: database management technology. In particular, the CPU time overhead to
33: retrieve one event from the system must be kept small compared to the time
34: needed to read the sequential event information.
35:
36: The need for a tag database within the ZEUS experiment became apparent in
37: 1996 when it was recognized that data analysis would have to become
38: much more efficient in order to cope with the ever growing data
39: samples of the experiment. A project was initiated
40: to design and build a tag database with the following goals:
41: \begin{itemize}
42: \item Provide at least the functionality of the
43: event directory system with equal or greater efficiency.
44: \item Substantially improve the selection capabilities
45: compared to those available with the event directory system.
46: \item Allow for growth. The database technology should not
47: limit future growth of the system. In particular, the system was
48: required to be capable of handling event samples of several terabytes in
49: size and to be capable of storing not only the tag information but also the
50: entire data of the experiment, which may be required in the future.
51: \item Provide an implementation backwards-compatible with the
52: existing system and require only minimal changes to the physics analysis
53: codes.
54: \item Ensure that in addition to serving as an event index, the tag
55: database be usable standalone as a compact data sample.
56: \item Allow simple maintenance of the system. In particular, it was
57: required to be able to partially update the database quickly when
58: needed.
59: \end{itemize}
60: