0104:cs0104008/tagdb.tex

1:

2: \section{The Tag Database}

3: \label{sec:tagdb}

4:

5: While event directories work very efficiently, the event selection power is

6: limited to boolean combinations of precalculated event flags. Furthermore,

7: it is only possible to test those selection conditions which were

8: considered when the event flags were calculated.  Also, if quantities on

9: which the selection condition were based have changed -- e. g. due to

10: recalibration -- the selection may be invalid. This happens quite

11: frequently since analysis methods and detector understanding are evolving

12: quickly. The event flags must be recalculated after every change in the

13: selection procedure. For large data samples this amounts to a major effort

14: that can be afforded only a few times per year.

15:

16: The number of recalculations can be reduced if more information is

17: stored with each event in the event directory. This information would then

18: be updated when for instance calibration has changed. As more information

19: is stored, the event selection becomes more flexible. For instance, rather

20: than setting a flag when a vertex is found, the reconstructed position of

21: the vertex can be stored instead. In this case, with appropriate database

22: technology, one can select not only those events that have a vertex but

23: also those with a vertex within certain bounds.

24:

25: Such a system is known as a {\it tag database} in order to stress its

26: database character and its indexing capabilities and thus distinguish it

27: from the simpler event directories.

28:

29: The storage requirement for a tag database which stores 200 32-bit

30: quantities for each event in a data sample of 100 million events is 80

31: gigabytes. In order to be efficient such a system requires advanced

32: database management technology. In particular, the CPU time overhead to

33: retrieve one event from the system must be kept small compared to the time

34: needed to read the sequential event information.

35:

36: The need for a tag database within the ZEUS experiment became apparent in

37: 1996 when it was recognized that data analysis would have to become

38: much more efficient in order to cope with the ever growing data

39: samples of the experiment. A project was initiated

40: to design and build a tag database with the following goals:

41: \begin{itemize}

42: \item Provide at least the functionality of the

43: event directory system with equal or greater efficiency.

44: \item Substantially improve the selection capabilities

45:   compared to those available with the event directory system.

46: \item Allow for growth. The database technology should not

47:   limit future growth of the system. In particular, the system was

48: required to be capable of handling event samples of several terabytes in

49: size and to be capable of storing not only the tag information but also the

50: entire data of the experiment, which may be required in the future.

51: \item Provide an implementation backwards-compatible with the

52:   existing system and require only minimal changes to the physics analysis

53:   codes.

54: \item Ensure that in addition to serving as an event index, the tag

55: database be usable standalone as a compact data sample.

56: \item Allow simple maintenance of the system. In particular, it was

57: required to be able to partially update the database quickly when

58: needed.

59: \end{itemize}

60: