1: %\documentclass[reviewcopy]{elsart}
2: \documentclass{elsart}
3: \usepackage{graphics}
4: \usepackage{graphicx}
5: \usepackage{epsfig}
6: \usepackage{amssymb}
7:
8: \begin{document}
9:
10: \begin{frontmatter}
11:
12: \title{HERA-B Framework for Online Calibration and Alignment}
13:
14: \author[desy-HH,desy-ifh]{J.M. Hern\'andez\thanksref{ciemat},}
15: \author[desy-HH]{D. Ressing,}
16: \author[desy-HH]{V. Rybnikov,}
17: \author[desy-HH]{F. S\'anchez\thanksref{ifae},}
18: \author[lip]{A. Amorim,}
19: \author[desy-HH]{M. Medinnis,}
20: \author[desy-HH]{P. Kreuzer\thanksref{athens},}
21: \author[desy-ifh]{U. Schwanke\thanksref{hu}}
22: \address[desy-HH]{DESY, D-22603 Hamburg, Germany}
23: \address[desy-ifh]{DESY, D-15738 Zeuthen, Germany}
24: %\address[MPI]{Max-Planck-Institut f\"ur Kernphysik, D-69117 Heidelberg, Germany}
25: \address[lip]{FCUL and LIP, P-1749-016 Lisboa, Portugal}
26: \thanks[ciemat]{Now at CIEMAT, E-28040 Madrid, Spain}
27: \thanks[ifae]{Now at Universitat Aut\`onoma Barcelona/IFAE, E-08193 Bellaterra, Spain}
28: \thanks[athens]{Now at Athens University, Athens, Greece}
29: \thanks[hu]{Now at Humboldt University, Berlin, Germany}
30:
31: \begin{abstract}
32: This paper describes the architecture and implementation of the
33: HERA-B framework for online calibration and alignment. At HERA-B the
34: performance of all trigger levels, including the online
35: reconstruction, strongly depends on using the appropriate
36: calibration and alignment constants, which might change during data
37: taking. A system to monitor, recompute and distribute those
38: constants to online processes has been integrated in the data
39: acquisition and trigger systems.
40:
41: %An online system has been implemented in the HERA-B experiment to monitor, recompute and distribute
42: %on the fly updates of the calibration, alignment and channel status constants without causing significant
43: %deadtime during the data acquisition. This system is necessary
44: %to keep the trigger performance and the online reconstruction stable under variations of the detector conditions.
45: \end{abstract}
46:
47: \begin{keyword}
48: Conditions database \sep calibration \sep alignment \sep online reconstruction \sep PC farms
49: \PACS \\
50: 07.05.-t Computers in experimental physics \\
51: 07.05.Hd Data acquisition: hardware and software\\
52: 07.05.Bx Computer systems: hardware, operating systems, computer languages and utilities \\
53:
54:
55: \end{keyword}
56: \end{frontmatter}
57:
58: \section{Introduction}
59: \label{introduction}
60:
61: It is essential in High Energy Physics experiments, that accurate and
62: consistent detector parameter sets are used at all trigger levels and
63: also in the event reconstruction. Similarly, simulation programs must
64: use detector parameters which are consistent with those used in the
65: trigger and reconstruction to properly simulate the detector and
66: trigger conditions. The detector parameters (such as calibrations,
67: alignments, detector channel maps, resolutions, etc), globally known
68: as detector conditions\footnote{The detector conditions will be
69: hereinafter also referred as CnA ({\underline C}alibration
70: a{\underline n}d {\underline A}lignment) constants}, are normally
71: calculated offline in a sporadic manner from monitoring information
72: and event data, and updated in the trigger and reconstruction codes.
73: The bookkeeping of the detector conditions becomes then an important
74: issue.
75:
76: The HERA-B experiment \cite{herab} was designed for the measurement of
77: CP violation in the neutral B-meson system. The data acquisition (DAQ)
78: and trigger systems were designed to cope with more than half a
79: million detector channels, a 40 MHz interaction rate and an extremely
80: low signal to background ratio of $10^{-10}$. A networked
81: high-bandwidth data acquisition system \cite{daq} and a highly
82: selective multi-level trigger \cite{trigger,hlt}, with a suppression
83: factor of $10^{6}$, were built. Unlike most HEP experiments, HERA-B
84: performs full event reconstruction online.
85:
86: % redundant...
87: %Stable performance demands that all trigger levels
88: %as well as the online reconstruction use up-to-date values for the
89: %parameters which describe detector conditions. When detector
90: %conditions change, for example due to temperature effects, new sets of
91: %constants must be computed and distributed.
92:
93:
94: A novel approach for handling the detector conditions has been
95: followed at HERA-B where a system to monitor, recompute and distribute
96: CnA constants to online clients is integrated into the DAQ and trigger
97: systems. Online updates of the CnA constants help to stabilize trigger
98: performance and online reconstruction as detector conditions vary during
99: data taking.
100: %redundant
101: %This system monitors and recomputes the CnA constants
102: %from detector monitoring information and event data, and is capable of distributing
103: %updated constants online to the trigger and reconstruction processors.
104: The CnA system is also employed offline during event data reprocessing
105: and Monte Carlo reconstruction. It allows to incorporate offline
106: updates of the CnA constants during the data reprocessing and also
107: ensures that the reconstruction of Monte Carlo simulated events is
108: performed using the same CnA constants employed in the reconstruction
109: of the real data being simulated. This approach is of potential
110: interest to future HEP experiments who are planning sophisticated
111: trigger systems and online event reconstruction.
112:
113: The architecture, implementation and performance of the CnA system are
114: described in this paper. The motivation for the CnA system and its
115: requirements are summarized in the next section. The system
116: architecture is described in section~\ref{s:design} and its
117: implementation and performance is discussed in section~\ref{s:desc}.
118: Section~\ref{s:mc} describes the offline usage of the CnA system for
119: data reprocessing and Monte Carlo reconstruction.
120:
121:
122: \section{Motivation and requirements}
123:
124: The design and requirements of the online CnA system are driven by the
125: design of the HERA-B detector and the architecture of the DAQ and
126: trigger systems. We therefore begin this section with a description
127: of the relevant aspects of the HERA-B detector, DAQ and trigger
128: systems.
129:
130: %Particularly important are
131: %the facts that event reconstruction is run online and that the performance of all %trigger levels strongly depend on
132: %detector conditions.
133: % makes necessary an online system for monitoring, computation and distribution of
134: %CnA constants.
135:
136: The HERA-B detector is depicted in figure~\ref{fig:detector}. The
137: target wires and the silicon vertex detector (SVD) stations are
138: movable. The target positions change to
139: stabilize the average interaction rate.
140: % by scraping the lateral tails of the HERA proton beam.
141: The SVD stations are also moved towards the beam at the beginning of
142: every fill and retracted at the end to avoid damage during injection.
143: These two subsystems are particularly subject to alignment changes.
144: Although the stepping motors provide sufficient precision (of order
145: 1~micron), alignment corrections are needed relatively often for
146: optimal performance due to thermal and other effects. Similarly, the
147: realignment of the tracking chambers is needed whenever they are moved
148: away from the beam line during accesses to the detector for repairs.
149: Such accesses occur typically at intervals of order one week. The
150: calibration of all detector subsystems is often updated as well.
151: Examples are pedestal following and energy calibration of the
152: electromagnetic calorimeter, the time calibration of the TDC boards of
153: the tracker system, the drift velocity calibration of the tracker
154: system and the maximal value of the Cherenkov angle in the RICH
155: detector which varies with atmospheric pressure, temperature and gas
156: composition. Moreover, the channel status (hot/dead/noisy) maps for
157: all subsystems must be monitored and updated periodically.
158:
159: \begin{figure}[htp]
160: \centering
161: \includegraphics[width=\textwidth]{detector_schematic_yz.eps}
162: \caption{Side view of the HERA-B detector}
163: \label{fig:detector}
164: \end{figure}
165:
166: The HERA-B DAQ system and its relationship with the trigger levels is
167: sketched in figure~\ref{fig:daq}. The trigger rates, latencies and
168: data volumes of each stage are also shown.
169:
170: \begin{figure}[htp]
171: \centering
172: \includegraphics[width=0.8\textwidth]{daq-trigger_scheme.eps}
173: \caption{Scheme of the data acquisition and trigger systems. The data throughput,
174: trigger rates and latencies for each of the DAQ and trigger stages
175: are also shown.}
176: \label{fig:daq}
177: \end{figure}
178:
179: The detector data are read out at the HERA bunch-crossing rate of
180: about 10 MHz and stored in a 128-deep front-end pipelines during the
181: First Level Trigger (FLT) processing. The large input event rate
182: forces the FLT to be entirely built from specialized hardware. The FLT
183: performs hardware tracking to select events with $J/\Psi$ particles
184: decaying into two leptons. The FLT tracking is initiated by lepton
185: candidates in the electromagnetic calorimeter and muon systems. The
186: accepted events are pushed into a distributed system of buffers
187: (Second Level Buffers, -SLB-). The events reside in the SLBs while
188: the Second Level Trigger (SLT) step is being run. The SLT is
189: implemented as a software trigger running in a PC farm of 240 nodes
190: \cite{dam}.
191:
192: A switching network provides full connectivity between the SLBs and
193: the SLT nodes. This high bandwidth and low latency switch is built by
194: interconnecting several hundreds of Digital Signal Processors (DSP)
195: between the SLB system and the SLT nodes. The total bandwidth of the
196: DSP switch is above 1 GB/sec. The switch message passing software
197: ensures zero packet loss and, in addition, possesses multicasting
198: capabilities which are used for distributing data sets to all SLT
199: nodes in parallel.
200:
201: The SLT operates on regions of the detector defined either by FLT
202: track candidates or pretrigger information (Region of Interest -RoI-).
203: The SLT refines the FLT tracks and extrapolates them through the
204: spectrometer magnet, tracks them through the SVD and optionally
205: performs a vertex cut. Tracker and SVD data needed by the SLT are
206: fetched from the SLBs via the DSP switch. Events accepted by the SLT
207: are assembled and, optionally, further processed by the Third Level
208: Trigger (TLT) in the same trigger node. Events passing the TLT are
209: sent via a switched Ethernet network to a second 200-processor PC farm
210: to be fully reconstructed online \cite{gellrich}. Event
211: classification by physics category is performed after the event
212: reconstruction and an additional Fourth Level Trigger (4LT) step can
213: be run at this stage if further reduction of the event rate is
214: required.
215:
216: The HERA-B trigger system relies on track-finding to an unusual
217: degree. In turn, accurate track-finding and event selection at all
218: trigger levels requires relatively precise knowledge of detector
219: calibration, alignment and detailed detector channel status
220: information all of which can influence both trigger efficiency and
221: trigger rate (and therefore system deadtime).
222: %A proper alignment of the target, SVD and the tracker system is essential for
223: %the extrapolation and reconstruction of the tracks. Channel masks (i.e. maps of dead, %and noisy detector channels) significantly
224: %influence both trigger efficiency and trigger rate, and therefore the system deadtime. %Hot channels particularly affect the performance
225: %of the FLT.
226: %A good calibration of the tracker drift velocity is necessary for reaching the required
227: %suppression at the SLT by using an improved spatial resolution in the tracker.
228: The highly distributed DAQ and trigger systems require a dedicated
229: online system for monitoring and distributing updates of the CnA
230: constants into the trigger processors without incurring significant
231: deadtime.
232:
233: Running full reconstruction online imposes constraints on the quality
234: of the CnA constants but also allows immediate data analysis and
235: therefore detailed information for data quality monitoring. Given the
236: large data volume collected by the experiment and the large event
237: reconstruction time, offline data reprocessing should be minimized.
238: % redundant
239: %On the other hand, the online reconstruction opens the possibility
240: %to monitor and update online the calibration and alignment conditions
241: %using high level information from the fully reconstructed events.
242:
243: \section{Architecture}
244: \label{s:design}
245:
246:
247: The CnA system provides the online infrastructure for collecting data
248: suitable to align and calibrate the detector, for computing CnA
249: constants and for delivering updated constants to all processes
250: involved in the trigger and the online reconstruction, see
251: figure~\ref{fig:cna1}. The CnA system takes care of tagging the set
252: of CnA constants being used at any moment by the DAQ providing an
253: exact history of the set of constants used.
254:
255: \begin{figure}[htp]
256: \centering
257: \includegraphics[width=0.9\textwidth]{CnA-gatering+distribution.eps}
258: \caption{Gathering of CnA data and distribution of updated CnA constants.}
259: \label{fig:cna1}
260: \end{figure}
261:
262: \subsection{Data gathering}
263:
264: During the online reconstruction procedure, data for monitoring and
265: calculating detector conditions are collected from the reconstruction
266: processes. In addition, subsystem specific monitors continuously
267: check the raw data and derive channel status maps. To make use of the
268: large number of trigger and reconstruction nodes providing such data
269: in parallel, the CnA architecture relies on a gathering system to
270: collect data in a central place. Gathered data can then be used
271: centrally to compute updated CnA constants which are subsequently
272: stored in the online database system \cite{amorim}.
273:
274: \subsection{CnA distribution}
275:
276: The CnA distribution system delivers updated CnA constants to the
277: trigger and reconstruction processes. This involves distributing
278: large objects to a large number of clients as quickly as possible to
279: minimize deadtime. Two different approaches were followed according
280: to the different latency of the trigger levels and the bandwidth of
281: the DAQ at those stages (see figure~\ref{fig:cna1}): a push
282: architecture is best suited for distributing the CnA constants to the
283: FLT and SLT/TLT processes while a pull architecture was chosen for the
284: reconstruction farm.
285:
286: For the SLT/TLT, trigger latencies are of the order of milliseconds and
287: a fast distribution is required in order not to cause deadtime. Taking
288: advantage of the high speed, reliability and multicasting capabilities
289: of the DSP switch, the CnA data can be synchronously pushed to the
290: trigger processes.
291:
292: For the online reconstruction farm, several factors favor a pull
293: architecture. The Ethernet switching network of the reconstruction
294: farm has substantially less bandwidth than the DSP switch and would be
295: rapidly saturated if operated under a synchronous push protocol. This
296: would lead to frequent data retransmissions and consequently, to
297: degraded performance. In addition, the Ethernet switches have no
298: support for multicasting so that the same data sets would have to be
299: sequentially pushed into all nodes. Furthermore, the online
300: reconstruction latency is much larger than that of the SLT/TLT and
301: pausing data taking to wait until all events are fully consumed would
302: cause deadtime on the order of seconds. On the other hand, the
303: reconstruction nodes process events asynchronously and independently
304: from each other, and therefore an asynchronous pull protocol would
305: distribute the requests for data over the average reconstruction time
306: and thus make more efficient use of the available bandwidth. Finally,
307: with a pull architecture, a distributed system of fast memory database
308: caches can be implemented to replicate the CnA data and allow for
309: faster uploading and reduction of overall bandwidth requirements.
310:
311: Since uploading into the reconstruction nodes is asynchronous, the
312: reconstruction processes need to be individually notified when new
313: constants become available. The notification is done through the
314: event data. A data base table (the "key table") contains identifiers
315: to all sets of CnA tables which are used by all online processes. The
316: identifier of the current key table (the CnA key) is stamped into
317: events by the SLT process at event assembly time. This index allows an
318: event to be associated with all the calibration and alignment data
319: used in its triggering process and online reconstruction. Whenever
320: updated CnA constants have been distributed to the SLT/TLT nodes or
321: become available for the online reconstruction, a new identifier is
322: stamped in the event data. The reconstruction processes check the CnA
323: identifier and request updated CnA constants when the identifier
324: changes.
325:
326: For the synchronous distribution of the CnA constants to the FLT and
327: SLT/TLT, a manager process is needed for synchronization during the
328: distribution and also as an intermediary between the CnA producers
329: (processes producing online updated constants) and the consumers
330: (trigger and reconstruction processes). The manager is notified
331: when updated constants are available for distribution.
332: On notification, the manager requests that the FLT/SLT/TLT be paused,
333: supervises the distribution, and requests resumption data taking
334: when the distribution is completed.
335:
336: The system is quite flexible in that it allows distribution of any kind
337: of information to the trigger processors. This includes FLT and SLT
338: trigger settings as well as geometry and detector calibration data
339: sets. The same distribution protocols used for the on the fly
340: distributions of CnA constants are employed for the initial loading of
341: the constants at DAQ booting time.
342:
343: \subsection{CnA offline usage}
344:
345: The trigger and online reconstruction farms together with the online
346: booting, control, monitoring and online data transmission protocols
347: and processes are used offline for performing data reprocessing and
348: Monte Carlo production during DAQ idle time \cite{jhnim}. The CnA
349: system was also designed for offline use. During data reprocessing,
350: the CnA system allows any online changes of the CnA conditions to be
351: accurately reproduced. Moreover, it provides for use of recalculated
352: sets of CnA constants in place of the online tables, when appropriate.
353: For Monte Carlo reconstruction, the geometry, calibrations and channel
354: maps of the run period being simulated are identified and loaded as
355: well as additional data sets containing detector resolution and
356: efficiency data.
357:
358: \section{Implementation and performance}
359: \label{s:desc}
360:
361:
362: We describe in this section the implementation of the online
363: calibration and alignment system following the requirements and
364: architecture discussed in the previous sections. Key elements of the
365: system are the data collection and monitor processes (gatherers), the
366: distributed system of database servers and proxies for storage and
367: replication of the constants, and the procedure for distribution to
368: the trigger and reconstruction processes. The CnA framework also
369: includes the software modules for the trigger and reconstruction codes
370: needed to upload the CnA constants.
371:
372:
373: \subsection{Data gathering and computation}
374:
375: Data needed to monitor detector conditions are collected, by
376: gatherers, from the reconstruction processes and from dedicated nodes
377: in the SLT farm which run subsystem-specific monitors. The dedicated
378: SLT nodes receive unbiased events at rates up to several Hz and
379: continuously check the raw event data, updating channel status maps as
380: needed.
381:
382:
383: %Unlike the event data transmission protocols which must be fully reliable (lossless), %the gathering protocols can tolerate some data loss
384: % is tolerable since it only results in a decrease of the statistics
385: %available for the monitoring and computation of updated CnA constants. This fact makes %easier the implementation
386: %of the data gathering protocols.
387: As sketched in figure~\ref{fig:cna1}, gatherers collect summary data
388: via Ethernet in parallel from all farm nodes. Gatherer processes can
389: work in two distinct modes, either requesting data from the providers
390: or subscribing for the data in the provider nodes which then
391: periodically publish the data to the subscribers. Gatherers can also
392: serve as data providers to other gatherers which subscribe to the
393: provider gatherer for needed data and use the data to update CnA
394: constants. In order to limit the amount of CnA data kept locally in
395: the provider nodes, the CnA data are stored either as histograms or as
396: ring buffers with arbitrary format. A remote histogramming package
397: (RHP\cite{schwanke}) was developed for the data definition and data
398: collection. RHP implements part of the functionality of the CERN
399: HBOOK package \cite{hbook}.
400:
401: By calling subdetector specific functions, CnA gatherers compute new
402: CnA constants and monitor their evolution over time. If significant
403: changes are produced, the new constants are stored in the distributed
404: database system described in section~\ref{s:db}, triggering the online
405: on-the-fly distribution as described in section~\ref{s:dist}.
406:
407:
408: \subsection{CnA distributed database system}
409: \label{s:db}
410:
411: The storage of updated CnA constants into active database servers
412: triggers online distribution. Upon an update, the CnA database servers
413: propagate update messages notifying the update to the CnA distribution
414: system. Indexed objects (CnA keytables), whose identifiers (CnA key)
415: are stored in the event data, are created automatically by a dedicated
416: CnA database server. The CnA keytable contains CnA metadata, namely
417: the indices of the sets of CnA constants used online during a
418: particular period. The CnA key allows to associate every event with
419: all the CnA constants used in its triggering process and online
420: reconstruction. This bookkeeping is of crucial importance for
421: identifying the correct sets of CnA constants in the event simulation
422: and in the offline event data reprocessing as explained in
423: section~\ref{s:mc}.
424:
425: The CnA distributed system of databases consists of active subdetector
426: CnA database servers, a dedicated CnA keytable database server and a
427: distributed system of fast memory database cache servers. The
428: subdetector CnA database servers store the subdetector specific CnA
429: constants and notify the CnA keytable server of any update. The CnA
430: keytable server holds the keytable CnA metadata. After an update
431: message, it generates a new keytable incrementally, i.e., it copies
432: the last keytable and updates the indices of the updated sets of CnA
433: constants. It then publishes to the CnA distribution system the CnA
434: key of the new CnA keytable. The distributed system of memory
435: database caches is used to replicate the CnA constants in order to
436: speed up their distribution to the online reconstruction farm as
437: explained in the next section.
438:
439:
440: \subsection{CnA distribution}
441: \label{s:dist}
442:
443: As stated before, the distribution procedure of CnA constants is
444: initiated by the storage of new constants into active database servers
445: which then propagate the updates to the CnA keytable database server.
446: This server in turn notifies the CnA distribution system of the
447: existence of updated CnA constants. The distribution procedure is
448: sketched in figure~\ref{fig:cna2}.
449:
450: \begin{figure}[htp]
451: \centering
452: \includegraphics[width=\textwidth]{cna_distribution.eps}
453: \caption{Online distribution of updated CnA constants.}
454: \label{fig:cna2}
455: \end{figure}
456:
457: The CnA manager is the process receiving the notifications from the
458: CnA keytable database server. This process is in charge of the control
459: and synchronization of the distribution of the CnA constants. At DAQ
460: booting time, the CnA consumers (trigger and reconstruction processes)
461: subscribe in the CnA manager for updates of particular sets of CnA
462: constants.
463:
464: The format of the CnA constants as stored in the databases might be
465: different to the format required by the CnA consumers. To centralize
466: the formatting of the constants, to save CPU processing time, and to
467: simplify code in the CnA consumers, CnA formatters were introduced.
468: The formatters are CnA processes that call subdetector specific
469: functions which use one or more raw data base tables to produce the
470: tables which are actually distributed to the consumers. The
471: subscription messages sent by the CnA consumers to the CnA manager
472: identify the associated CnA formatter of the desired set of constants.
473: The CnA formatters fetch any needed CnA tables from the database, make
474: the required formatting and send the formatted constants to the CnA
475: consumers.
476:
477: The full sequence of the distribution of the CnA constants to the
478: trigger processors is the following: a CnA producer (CnA gatherer)
479: stores updated constants into an active database. The database server
480: informs the CnA keytable server of the update which creates a new
481: keytable with a new CnA key index that is sent to the CnA manager. The
482: manager informs appropriate formatters of the update which then fetch
483: the updated CnA constants from the appropriate database and produce
484: format tables. When the formatting is finished the CnA manager
485: requests the DAQ event controller (EVC) to pause the data taking. The
486: EVC waits until all events in the second level buffers are processed
487: by the SLT nodes before handshaking Cna manager's request. This way
488: the events already processed by the FLT are processed by the SLT and
489: are stamped with the correct CnA key. In addition, consuming all
490: events in the buffers guarantees that the DSP switch bandwidth will be
491: fully available for the transmission of the CnA constants. The CnA
492: manager then requests the CnA formatter to push the constants into the
493: trigger nodes. The new CnA key is also distributed to the nodes and
494: will be written into the event data of the new events. When the
495: distribution is complete, the CnA manager requests the event
496: controller to resume the data taking.
497:
498: For distribution to the FLT processors, the FLT CnA formatter sends
499: the constants through Ethernet to a master process running in each of
500: the FLT trigger crates which in turn distributes them in parallel to
501: the trigger boards in the crate. For the distribution into the SLT/TLT
502: farm, the SLT/TLT formatter sends the constants via Ethernet to a
503: dedicated process running in one of the SLT/TLT nodes, the SLT
504: distributor. This process in turn uses the multicasting capabilities
505: of the DSP switch to transmit the constants in parallel to all the 240
506: trigger nodes. The throughput of the multicasting via the DSP switch
507: is about 1 GB/sec. The constants are synchronously pushed into the
508: trigger nodes under the coordination of the CnA manager. Thanks to
509: the large bandwidth of the DSP switch, the distribution introduces no
510: significant deadtime.
511:
512: Unlike the distribution to the trigger nodes, where a synchronous
513: distribution is done using a push architecture, the distribution of
514: updated CnA constants to the reconstruction nodes is done
515: asynchronously in each node and using a pull architecture. After any
516: update of the CnA constants, the new CnA key index will end up in the
517: event data. The reconstruction processes upon a change of this index
518: will fetch from the CnA keytable database the new keytable to
519: determine which updated set of constants should be loaded. In order to
520: speed up the loading of the new constants, the reconstruction nodes
521: fetch them via a distributed system of memory database caches. Given
522: the smaller Ethernet bandwidth compared to the DSP switch bandwidth,
523: an asynchronous retrieval of the constants is more efficient.
524:
525: Exactly the same distribution procedure is applied at DAQ booting time
526: to upload the CnA constants into the trigger and reconstruction
527: processors. At booting time, all the sets of constants appear to be
528: updated and all of them are distributed. Not only detector conditions
529: constants are uploaded following the procedure described above, but
530: also the trigger settings are distributed into the trigger nodes in
531: the same way.
532:
533: \subsection{Performance}
534:
535: Over 60 sets of CnA constants, amounting a total volume of 6.5 MB, are
536: used. At DAQ booting time, they are pushed into the SLT/TLT trigger
537: nodes in 1.5 secs, at a rate of 1 GB/sec (6.5 MB x 240 nodes / 1.5
538: sec). Upon receiving their first events after run startup,
539: reconstruction nodes fetch the constants at an effective rate of 50
540: MB/sec (6.5 MB x 200 processes / 25 sec). Note that in this case, 25
541: seconds is the total time required to completely upload the constants
542: in all the reconstruction processes, but not the deadtime caused since
543: the retrieval is asynchronous and independent in every node so that
544: each node starts processing events as soon as all the constants have
545: been read.
546:
547: The deadtime caused in the data taking by the distribution of CnA
548: constants to the SLT/TLT nodes is dominated by the time needed for
549: data transmission and multicasting in the DSP switch. The distribution
550: messages containing individual sets of CnA constants are multicasted
551: within the DSP switch. The multicast is based on message copy. The
552: copy of the messages in the first block of the switch dominates the
553: multicast latency. The contribution from the CnA control and
554: distribution protocol is small, of the order of 100 msecs.
555:
556:
557: The RICH detector calibration and channel status map were updated
558: online at intervals of the order of minutes. ECAL pedestals and
559: tracker channel maps online updates occured at a intervals of the
560: order of hours. Although the online CnA system was fully functional,
561: not all subdector groups developed online monitors or updated CnA
562: constants online due largely to lack of manpower. Calibration and
563: alignment constants for the vertex detector, tracker and muon systems
564: were updated offline by manually updating the index to the relevent
565: tables in the online keytable. The change in the online keytable
566: caused the new constants to be automatically loaded at the next DAQ
567: startup or triggered their distribution if data taking was in
568: progress.
569:
570: The FLT lookup tables are distributed to the FLT trigger boards at DAQ
571: booting time using the CnA distribution procedure. The large number of
572: these tables and the slow input link to the boards prevents online
573: distributions of updated lookup tables except as part of the startup
574: procedure. Steering parameters for the FLT processes are also
575: distributed via the CnA system.
576:
577:
578: \section{CnA system for offline reprocessing and Monte Carlo reconstruction}
579: \label{s:mc}
580:
581: As the knowledge of the detectors improves, the reconstruction
582: packages are further developed and improved calibration and alignment
583: constants are made available, the offline reprocessing of the event
584: data becomes necessary. At HERA-B the trigger and online
585: reconstruction farms together with the online booting, control,
586: monitoring and online data transmission protocols and processes are
587: used offline for performing data reprocessing and Monte Carlo
588: reconstruction as described in \cite{jhnim}.
589: Exactly the same CnA distributed database system and CnA data
590: uploading mechanism used for online reconstruction are also
591: employed offline for mass data processing. The only difference
592: between reprocessing and online reconstruction is that the source of
593: the data is not the detector but the recorded raw events archived on
594: tape. This system makes an extremely efficient use of the online
595: computing resources during idle time and shutdown periods of the
596: detector.
597:
598: As mentioned earlier, the event record includes a tag which
599: links the event with the CnA keytable in the database containing
600: the indices of all the sets of CnA constants used in the online
601: reconstruction of that event.
602: The automatic bookkeeping of the keytables in the database
603: during data taking allows to reproduce the detector calibration and
604: alignment conditions for offline data reprocessing. Sets of constants
605: improved offline are incorporated in the reprocessing by producing
606: a revision of the online keytables. The online keytables are first
607: duplicated in the database with a new revision number and then the
608: keytables corresponding to data taking periods for which updated
609: constants are available are modified with the indices of the updated constants.
610: The offline reconstruction
611: processes make use of a given revision number of the keytables
612: when reprocessing the data.
613:
614: Monte Carlo event reconstruction should be performed using the
615: reconstruction conditions of the real data one
616: intends to simulate. This is simply achieved by using the keytable (CnA
617: key and CnA revision number) employed in the reconstruction of the real
618: data. For extended data taking periods with several associated keytables,
619: Monte Carlo samples can be reconstructed using those keytables separately, and the
620: events are reweighted according to the relative luminosities of the periods
621: of validity of the different keytables.
622:
623:
624: \section{Summary}
625:
626: In the HERA-B experiment, all trigger levels as well as the online
627: reconstruction critically depend on calibration and alignment
628: constants. In order to keep the trigger performance and the online
629: reconstruction stable under variations of the detector conditions, an
630: online calibration and alignment system was implemented and used. This
631: system monitors the status of the calibration and alignment constants,
632: recomputes them upon significant changes in the calibration or
633: alignment conditions in the detector and if necessary distributes them
634: on the fly to the trigger and reconstruction processors without
635: causing significant deadtime in the data acquisition. The
636: distribution system exploits the high bandwidth and multicasting
637: capabilities of the DSP switch to synchronously push the constants to
638: the SLT/TLT trigger processes with a throughput of 1 GB/s. On the
639: other hand, given the smaller effective network bandwidth of the
640: reconstruction farm and the higher event processing time, the
641: reconstruction processes asynchronously fetch the updated constants
642: from a distributed and replicated database system.
643:
644: A tag in the event record associates every event with the detector
645: conditions used in the trigger and online reconstruction. This
646: mechanism provides the bookkeeping necessary for offline data
647: reprocessing and Monte Carlo reconstruction. The online CnA
648: distribution system is also used offline for mass data processing.
649:
650: The integration of the CnA system took place during the HERA-B
651: commissioning runs in 2000/2001. The system was fully operational and
652: routinely working during the 2002-2003 data taking period and is still
653: in use for data reprocessing and Monte Carlo reconstruction.
654:
655: The upcoming LHC experiments will incorporate PC farms into their DAQ
656: and trigger systems and might find the HERA-B experience concerning
657: the online calibration and alignment system of interest.
658:
659: \section{Acknowledgments}
660: We are grateful to Andreas Gellrich for fruitful discussions. We thank
661: the DAQ subdetector and trigger experts for their work in the
662: integration of the online subsystems into the DAQ CnA framework.
663:
664: \bibliographystyle{elsart-num}
665: \bibliography{cna}
666:
667:
668: \end{document}
669:
670: