0805.3170/ops.tex
1: \section{Detector Operation and Performance}
2: \label{sec:ops}
3: 
4: As the result of several years of routine data-taking, extensive operational
5: experience has been obtained with the MINOS detector systems.  Representative
6: observations are presented here.  Section~\ref{sec:ops-rel} summarizes
7: detector performance and reliability information.  
8: %Section~\ref{sec:ops-comm}
9: %[Reviewers' note:  this prints as ``section 4.5'']
10: % ATH - This section was indeed moved back to the electronics section as
11:  % part of the last round of reviews, so this intro text needs to be
12:  % changed.  In fact, this sentence will be removed.
13: %describes the inter-detector
14: %timing system that provides a beam spill ``gate'' for the far detector and
15: %allows the association of near and far data from the same beam pulse.
16: Section~\ref{sec:ops-quality} describes the systems and procedures used
17: to ensure data quality and to monitor detector system performance in real time.
18: Finally, Sec.~\ref{sec:ops-off-line} gives an overview of the offline software
19: used for detector performance measurements and data analysis.
20: 
21: Since the NuMI beam and near detector are located at Fermilab while
22: the far detector is \unit[735]{km} north in Soudan, MN,
23: the coordination of experimental operations is non-trivial.
24: This challenge has been addressed by making as much of the experiment as
25: possible controllable remotely over computer networks.  
26: Physicist shift workers are present \unit[24]{hours/day}, \unit[7]{days/week} 
27: in the main MINOS control room on the 12th floor of Wilson Hall at Fermilab,
28: where both near and far detectors are monitored and controlled.
29: In addition, the NuMI beam is monitored by the MINOS shift workers but
30: is controlled from the Fermilab accelerator control room.  Weekdays
31: between the hours of 7:30 and 17:30 US Central Time, four to five full-time 
32: technicians are present in the MINOS cavern of the Soudan Underground
33: Laboratory, monitoring and controlling the far detector, and, as needed, 
34: repairing the detector subsystems.  Both sites have
35: technical support on call for after-hours intervention when necessary.
36: Close coordination among the three control rooms has provided
37: high detector live times for periods when the beam is in operation (see
38: Sec.~\ref{sec:ops-rel-fd}).
39: 
40: \subsection{Detector reliability and live-time fractions}
41: \label{sec:ops-rel}
42: 
43: \subsubsection{Near detector}
44: \label{sec:ops-rel-nd}
45: 
46: The near detector was commissioned in January~2005.  Since
47: then the detector has been kept in an operational state
48: except during extended periods when the beam was not in operation.  
49: The near detector data taking is usually organized into an approximately
50: \unit[24]{hour} run sequence, consisting of \unit[210]{seconds} of
51: calibration runs followed by a \unit[24]{hour} physics run.
52: Excluding periods when the beam has been off, the fraction of
53: time during which the near detector has been in physics data taking mode
54: has averaged above 98.5\%. Typically more than 99.95\% of the
55: detector channels are operational.  The small fraction of the lost beam time
56: is due to the daily calibration runs and infrequent detector maintenance.
57: 
58: Most downtime with beam on is due to maintenance, usually for the
59: replacement of failed front-end electronics cards.  A mean number of
60: \unit[3.5]{cards} (out of 9,360~total) failed per week before a mass
61: replacement of unreliable on-board fuses was done in the summer of 2007.
62: After that, the failure rate dropped to less than a board per week.  The
63: typical intervention to replace a few front-end cards requires
64: approximately one hour of downtime, including the calibration of the
65: replacement channels.
66: 
67: \subsubsection{Far detector}
68: \label{sec:ops-rel-fd}
69: 
70: The far detector installation was completed in July~2003 and the
71: detector has been recording cosmic ray and atmospheric-neutrino data
72: since then.  By the time the beam arrived in the spring of 2005,
73: reliable detector operation had become routine. 
74: Similar to near detector data taking, a 
75: \unit[24]{hour} run sequence for physics data and calibrations is also
76: the operational mode at the far detector.  The overall live-time
77: fraction in 2005 was 96.7\% and has risen to the high 90\%'s since.  Typically all channels in the detector are
78: functional, with isolated failures being fixed during beam downtime,
79: usually within hours to days of the appearance of the problem.
80: 
81: After the neutrino beam turned on in March~2005 the important metric for
82: evaluating experimental performance was the fraction of 
83: protons-on-target (``POT'') while the far detector was taking good data.  The MINOS experiment's
84: sensitivity is driven by the statistics of neutrinos observed at the far
85: detector, making it crucial to keep the far detector operating as much as
86: possible.  Since the end of beam commissioning in March 2005,
87: the detector has run very smoothly, taking physics data for $>$98.7\% of all
88: delivered NuMI protons for the first year's beam
89: operations~\cite{Adamson:2007gu} and better than 99\% thereafter.
90: 
91: \subsection{Data quality and real-time monitoring}
92: \label{sec:ops-quality}
93: 
94: A combination of real-time
95: monitoring and offline or post-processing monitoring is
96: performed on a daily basis by physicists on shift to ensure data quality.
97: The systems developed
98: for these tasks keep track of similar parameters for both near and far
99: detectors.
100: 
101: The MINOS Online Monitoring (OM) system is designed to provide real-time
102: monitoring of data quality in the near and far detectors. It is based on
103: the system used for the CDF experiment's Run~II operations~\cite{wagner:2001is}
104: and consists of three main processes.
105: i)~The {\it Producer} process receives raw data records from the DAQ via the
106: MINOS Data Dispatcher, which are then processed to fill monitoring
107: histograms. ii)~The {\it Server} process receives monitoring
108: histograms from the Producer, handles connections from external GUI
109: processes, and serves histogram data to these processes on request.
110: iii)~The {\it GUI} process allows browsing and plotting of any of these
111: monitoring histograms.
112: 
113: The monitor histograms are grouped into sections, e.g., those relating to
114: digitized hits from the detector (channel occupancies, ADC
115: distributions, etc.), singles rates, and distributions relating to
116: electronics calibration and light injection data. A representative
117: subset of these histograms is checked once every six hours by
118: the shift crews at the detector sites and any problems are entered into 
119: the MINOS electronic logbook via a checklist template. All
120: monitoring histograms are archived to tape for future reference.
121: %A stand-alone OMhistory browser has been developed to examine data from
122: %a collection of such files to observe long-term trends in the data.
123: 
124: The raw event data are moved to storage at Fermilab and copied over to the
125: Farm Batch System for offline processing.  From there the reconstruction is
126: completed with a stable software release.  Offline reconstruction is
127: performed on data taken the previous day and used for the offline
128: monitoring and subsequent data quality checks. The data are divided into
129: separate streams in-time and out-of-time with the beam spills to
130: facilitate monitoring as well as analysis.
131: 
132: The Offline Monitoring system serves two main purposes. It allows
133: monitoring of the detector systems using reconstructed event data
134: quantities such as event rates per POT, demultiplexing and scintillator
135: strip efficiencies. Additionally the system provides the ability to
136: verify that the offline
137: production is proceeding normally so that unexpected changes can be
138: tracked down quickly.  The Offline Monitoring system has a histogram
139: making process which runs once per day, reading in all the reconstructed
140: data from the near and far detectors processed in the previous day and
141: producing a set of histograms for monitoring.  It also runs the
142: {\it OMhistory} package, a process for viewing how these histograms
143: change over time.   
144: % From a
145: %selected set of histograms, a daily checklist is filled by the shift
146: %physicists as a part of the data taking procedure.
147: 
148: Other tasks performed during shifts include completing a checklist of the 
149: DCS systems described in Sec.~\ref{sec:elec-dcs}, monitoring 
150: quasi-real-time event displays for both near and far detectors, and 
151: monitoring the NuMI beam performance ~\cite{Kopp:2006nq}.  
152: These checks are performed during each
153: shift to ensure that problems are noticed promptly and flagged for
154: repair. 
155: 
156: \subsubsection{Near detector}
157: \label{sec:ops-quality-nd}
158: 
159: In order to detect anomalies and trends in both the performance and data
160: quality of the MINOS near detector, several quantities are verified
161: weekly, including the total uncalibrated digitized response of the detector
162: activity per POT during the $\unit[\sim10]{\mu s}$ spill as a function
163: of time (Fig.~\ref{fig:ops-quality-nd}a), the number of reconstructed
164: events per POT as a function of time (Fig.~\ref{fig:ops-quality-nd}b),
165: and the reconstructed event time 
166: (Fig.~\ref{fig:ops-quality-nd}c).  Instabilities in these quantities
167: may indicate a detector and/or reconstruction problem. The data used for
168: these quantities come from the in-time spill stream, taking advantage of
169: the large flux of beam neutrino events at the near site.
170: 
171: \begin{figure*}[htpb]
172:   \centering
173:   \includegraphics[width=\textwidth,keepaspectratio=true,bb=0 0 740 235]{nim_near_monitor_plots-5.eps} 
174:   \caption{Distributions examined
175:     in near detector data quality monitoring include: (a) the average
176:     spill pulse height, (b) the average number of
177:     reconstructed events per \unit[$1\times10^{12}$]{POT} in a
178:     \unit[13]{day} period, and (c) reconstructed event times in the
179:     $\unit[20]{\mu s}$ gate.)}
180:   \label{fig:ops-quality-nd}
181: \end{figure*}
182: 
183: \subsubsection{Far detector}
184: \label{sec:ops-quality-fd}
185: 
186: The far detector data are checked weekly for anomalies in
187: the reconstruction and data quality by comparing distributions of several
188: reconstructed quantities to a baseline data set.
189: Examples of distributions monitored are the number of planes crossed 
190: by muons in the detector (Fig.~\ref{fig:ops-quality-fd}a), the
191: incoming directions of the tracks and showers 
192: (Fig.~\ref{fig:ops-quality-fd}b), and track entry
193: locations (Fig.~\ref{fig:ops-quality-fd}c).
194: Other quantities that help ensure the detector calibration remains stable 
195: are the reconstructed velocity for cosmic ray muons 
196: for timing calibration and pulse heights of tracks and showers for
197: energy calibration.  Cosmic ray muons are most useful for these
198: checks as they are the most abundant data source in the far detector.
199: %Similar checks are also made for major changes in the reconstruction
200: %software.  In such cases, distributions made from the previous release
201: %are compared to those made with the current software version.
202: 
203: \begin{figure*}[htpb]
204:   \centering
205:   \includegraphics[width=\textwidth,keepaspectratio=true]{nim_far_monitor_plots.eps} 
206:   \caption{Distributions examined during
207:     far detector data quality monitoring include:  (a) the number
208:     of planes crossed by cosmic ray muon tracks, (b) the incoming
209:     direction of the cosmic ray muons with respect to the beam direction
210:     and (c) the entry location for cosmic ray muons along the
211:     length of the detector.}
212:   \label{fig:ops-quality-fd}
213: \end{figure*}
214: 
215: \subsection{Offline software overview}
216: \label{sec:ops-off-line}
217: 
218: Although MINOS comprises three detectors (the near, far and calibration
219: detectors) at different depths and
220: latitudes and with different sizes, physical configurations, beam
221: characteristics and electronic readout schemes, the simplicity of the
222: active detector technology has allowed a single framework
223: %\footnote{MINOS offline software: \url{http://www-numi.fnal.gov/off-line_software/srt_public_context/WebDocs/WebDocs.html}}
224: of offline analysis software to be constructed for all detectors.  The object
225: oriented characteristics of the C++ language~\cite{Stroustrup97} have
226: enabled the modularity required for this task.  MINOS software is
227: made available to collaborators using the Concurrent
228: Versioning System (CVS)\footnote{\url{http://www.nongnu.org/cvs/}}
229: embedded in the SLAC-Fermilab Software Release Tools (SRT) code
230: management
231: system\footnote{\url{http://www.fnal.gov/docs/products/srt/}}.
232: %[Reviewers' comment:  are footnotes appropriate?  Should these just be
233: %endnotes?]
234: % ATH - good question, and I think what we decided is for stuff like
235:  % this, where the reference really is a URL, that footnotes were the
236:  % appropriate vehicle.  Will leave the final judgement to the journal
237:  % editors, though.
238: %and
239: %\url{http://www.slac.stanford.edu/BFROOT/www/Computing/Environment/Tools/SRT/SRTuser-node1.html}}.
240: The system uses software
241: libraries from the CERN ROOT project~\cite{Brun:1997pa}, including
242: ROOT tools for I/O, graphical display, analysis, geometric detector
243: representation, database access and networking.
244: 
245: Raw data from different data acquisition processes at the MINOS
246: detectors are written to disk as separate ROOT TTree ``streams.''  These
247: include physics event data, pulser calibration data, beam monitoring
248: data and detector control data.  This information immediately becomes available
249: for monitoring, calibration and event display processes 
250: through an online data ``dispatcher'' service.  This utility can access the
251: online ROOT files while they are still open for writing by the MINOS DAQ
252: systems.  Subsequent offline processing produces additional TTree
253: streams for event reconstruction results and analysis ntuples.
254: 
255: %[Reviewers' note:  this is a large paragraph -- suggest breaking in two.]
256: % ATH - OK 
257: The need to correlate these streams of MINOS data with each other has
258: motivated a key element in the MINOS software strategy called {\it VldContext}
259: (``Validity Context'').  VldContext is a C++ class that encapsulates
260: information needed to locate a data record in time and space.  Separate
261: streams of data from different sources can be synchronized by comparing
262: their VldContext objects.  When MINOS offline software opens files
263: containing these streams, it indexes each stream according to the
264: VldContext of each record.  The indexing information can then be used
265: %with ROOT's random access I/O capability 
266: to put VldContext-matched
267: records into computer memory simultaneously.  The GPS timestamps
268: attached to raw data records enable this matching for far and near
269: detector data in the same offline job.  These features are illustrated
270: in Fig.~\ref{fig:validity}.  
271: 
272: All MINOS record types derive from a common
273: record base class with a header that derives from a common header base
274: class.  The minimum data content of the record header is the VldContext,
275: used to associate records on input.  The small record header is stored
276: on a separate ROOT TTree Branch from the much larger data blocks.  A
277: MINOS stream is an ordered sequence of records stored in a ROOT TTree
278: containing objects of a single record type extending over one or more
279: sequential files.  On input, records stored in different streams are
280: associated with each other by VldContext and not by Tree index.  The
281: default mode is that records of a common VldContext form an input record
282: set.  Alternative input sequencing modes by VldContext are also
283: supported.
284: 
285: \begin{figure*}[htpb]
286:   \centering
287:   \begin{minipage}[b]{0.45\textwidth}
288:     \includegraphics[width=\textwidth,bb=155 530 375 760]{NIMVldCFigA.eps}
289:   \end{minipage}
290:   \begin{minipage}[b]{0.4\textwidth}
291: % Made B a bit smaller since it's taller, trying to make the fonts match
292:     \includegraphics[width=\textwidth,bb=180 230 385 495]{NIMVldCFigB.eps}
293:   \end{minipage}
294:   \caption{
295:     MINOS data structure.  The schematic on the left shows a MINOS record type 
296:     derivation and header structure.  The schematic on the right shows 
297:     a MINOS stream as an ordered
298:     sequence of records stored in a ROOT TTree.  On input, records
299:     stored in different streams are associated with each other by
300:     VldContext and not by Tree index.}
301:   \label{fig:validity}
302: \end{figure*}
303: 
304: The MINOS offline database contains calibration and survey data,
305: including component locations and connection maps from the construction
306: phase of the detector.  These relational tables are keyed with a notion
307: of ``Validity Range'' or scope of VldContext values to which a 
308: database record applies.  For physics data of a particular VldContext, 
309: the offline database interface enables retrieval of matching database
310: records whose Validity Ranges encompass the VldContext of the physics
311: data in question.
312: 
313: %The offline software accesses the offline database through a generic
314: %ODBC\footnote{\url{http://www.unixodbc.org/}} interface, which allows
315: %data to be saved to and retrieved from any compliant database product.
316: %Currently, the central database warehouse is served by
317: %Oracle\footnote{\url{http://www.oracle.com/database/index.html}} and is
318: %around \unit[100]{GB} in size.
319: %% and \url{http://www.easysoft.com/developer/interfaces/odbc/index.html}}.
320: %Local distributed databases are in
321: %MySQL\footnote{\url{http://www.mysql.com/}}
322: %% and \url{http://www.mysql.com/products/connector/odbc/}} 
323: %installations and can be substantially smaller depending on the local
324: %needs.  Data in the MySQL servers are automatically synchronized with
325: %the Oracle warehouse through a multiple- master replication scheme.
326: 
327: %New paragraph from George:
328: The offline software accesses the offline database through
329: a low-level ROOT~\cite{Brun:1997pa} interface, which allows data to be
330: saved to and retrieved from compliant database products.
331: The central database warehouse is served by
332: MySQL\footnote{\url{http://www.mysql.com/}} 
333: and is about \unit[100]{GB} in size.  Local distributed databases
334: are in MySQL installations and can be substantially smaller
335: depending on the local needs.  Data in the distributed MySQL
336: servers are automatically synchronized with the MySQL
337: warehouse through a multiple-master replication scheme.
338: 
339: %%% Local Variables: 
340: %%% mode: latex
341: %%% TeX-master: "minos-nim"
342: %%% End: 
343: