1: \documentclass[12pt]{article}
2:
3: %%\usepackage{balance}
4:
5: \usepackage{graphicx}
6: \usepackage{amsmath}
7:
8: \textwidth16cm
9: \textheight23cm
10: \topmargin-2.0cm
11: \oddsidemargin0cm
12: \parindent0pt
13: %\pagestyle{empty}
14:
15: \begin{document}
16:
17: %%\begin{flashright
18: %%{\hfill \Large CMS Internal Note DRAFT}
19: %%\end{flashright}
20:
21: \begin{center}
22: {\Huge On the Combining Significances}
23: \end{center}
24:
25: \begin{center}
26: {Sergey Bityukov, Nikolai Krasnikov, Alexander Nikitenko\\
27: (e-mail: Serguei.Bitioukov@cern.ch) }
28: \end{center}
29:
30: %%\date{November 21, 2006}
31:
32: \begin{center}
33: Abstract
34: \end{center}
35:
36: We present the statistical approach to the combining of signal
37: significances.\\
38:
39:
40: \section{What we keep in mind as a significance ?}
41:
42: The measure of the excess of observed (or expected) events in the experiment
43: above the background often is named the signal significance.
44: According to ref.~\cite{Frodesen} ``Common practice is to express
45: the significance of an enhancement by quoting the number of
46: standard deviations''.
47:
48: \bigskip
49:
50: Let us distinguish the significances of two classes:
51:
52: \begin{itemize}
53: \item ``the initial (or internal) significance'' $S$ of
54: an experiment
55: is the expression of two parameters of the experiment - expected number of
56: signal events $N_s$ and expected number of background events $N_b$
57: in the given experiment (``the initial significance'' can be
58: considered as a potential for discovery in planned
59: experiments~\cite{BitDurh}),
60:
61: \item ``the observed significance'' $\hat S$ is the expression of observed
62: number of events $\hat N_{obs}$ and of the expected
63: background $N_b$~\cite{Bit2000}.
64: \end{itemize}
65:
66: The first one is a parameter of the experiment. We suppose that it is constant
67: for given integral luminosity.
68: The second one is a realization of a random variable. The observed
69: significance is considered as an estimator of the initial significance.
70:
71: \bigskip
72:
73: Why we can consider the observed significance as the realization of
74: a random variable?
75:
76: \bigskip
77:
78: The observed number of events $\hat N_{obs}$ is the realization of the random
79: variable which obeys the Poisson distribution, hence the observed
80: significance $\hat S$ also is the realization of the random variable
81: as a function which depends from $\hat N_{obs}$.
82:
83: \bigskip
84:
85: It is easy to show. Let us take, as an example, the
86: ``counting''~\cite{GunBob} significance $\hat S_{c12}$~\cite{BitDurh}
87: and the significance $\hat S_{cP}$~\cite{NarBit}.
88:
89: \noindent
90: The observed significance $\hat S_{c12}$ is expressed by formula
91: \begin{equation}
92: \displaystyle \hat S_{c12} = 2 \cdot (\sqrt{\hat N_{obs}} - \sqrt{N_b}).
93: \end{equation}
94:
95: The significance $\hat S_{cP}$ is the probability from Poisson distribution
96: with mean $N_b$ to observe equal or greater than $\hat N_{obs}$ events,
97: converted to equivalent number of sigmas of a Gaussian distribution, i.e.
98:
99: \begin{equation}
100: \beta = \displaystyle
101: \frac{1}{\sqrt{2\pi}}\int_{\hat S_{cP}}^{\infty}{e^{-\frac{x^2}{2}}dx},~
102: {\tt where}~~\beta = \displaystyle
103: \sum_{i=\hat N_{obs}}^{\infty}{\frac{N_b^ie^{-N_b}}{i!}}.
104: \label{eq:1}
105: \end{equation}
106:
107:
108: We use the method which allows to connect the magnitude of
109: ``the observed significance'' with the confidence density~\cite{Efr, BitSan}
110: of the parameter ``the initial significance''.
111: This method was applied in many studies~\cite{Feldman, Bit2004B}.
112: We carried out the uniform scanning of initial significance $S_{c12}$ and
113: $S_{cP}$, varying $S_{c12}$ from value $S_{c12}=1$
114: to value $S_{c12}=16$ using step size 0.075 and varying $S_{cP}$ from
115: value $S_{cP}=0$ to value $S_{cP}=6.2$ using step size 0.031.
116: By playing with the two Poisson distributions
117: (with parameters $N_s$ and $N_b$) and using 30000 trials for each value
118: of $S_{c12}$ and $S_{cP}$ we used the RNPSSN function (CERNLIB~\cite{CERNLIB})
119: to construct the conditional distribution of the probability
120: (the confidence density) of the production of the observed value of
121: significance $\hat S_{c12}$ or $\hat S_{cP}$ by the initial significance
122: $S_{c12}$ or $S_{cP}$, correspondingly. We assume that an integral
123: luminosity of the experiment is a constant $N_s+N_b$.
124: The parameters $N_s$ and $N_b$
125: are chosen in accordance with the given initial significance $S_{c12}$ or
126: $S_{cP}$, the realization $\hat N_{obs}$ is a sum of realizations $\hat N_s$
127: and $\hat N_b$ of two random variables with parameters
128: $N_s$ and $N_b$, correspondingly.
129:
130: \begin{figure}[htpb]
131: \begin{center}
132: \includegraphics[width=0.9\textwidth]{sc12f1.eps}
133: \caption{The observed significances $\hat S_{c12}$ for the case
134: $N_s + N_b = 70$.}
135: \label{fig:1}
136: \end{center}
137: \end{figure}
138:
139: \bigskip
140:
141: In Fig.1 the distributions of $\hat S_{c12}$ of several values of initial
142: significance $S_{c12}$ with the given integral luminosity $N_s+N_b=70$
143: are shown. As seen, the observed distributions of significance is similar
144: to the distributions of the realizations of normal distributed random
145: variable with variance which close to 1.
146: The distribution of the observed significance $\hat S_{c12}$ versus
147: the initial significance $S_{c12}$ (Fig.2) shows the result of the
148: full scanning.
149:
150: \begin{figure}[htpb]
151: \begin{center}
152: \includegraphics[width=0.9\textwidth]{sc12f2.eps}
153: \caption{The distribution of observed significance $\hat S_{c12}$ versus
154: the initial significance $S_{c12}$.}
155: \label{fig:2}
156: \end{center}
157: \end{figure}
158:
159: The normal distributions with a fixed variance are statistically self-dual
160: distributions~\cite{BitSan}. It means that the confidence density
161: of the parameter ``initial significance'' $S$ has the same distribution
162: as the random variable which produced a realization ``the observed
163: significance'' $\hat S$.
164: The several distributions of the probability of the initial significances
165: $S_{c12}$ to produce the observed values of $\hat S_{c12}$ are
166: presented in Fig.3. These figures clearly shows that the observed significance
167: $\hat S_{c12}$ is an estimator of the initial significance
168: $S_{c12}$.
169:
170: \begin{figure}[htpb]
171: \begin{center}
172: \includegraphics[width=0.9\textwidth]{sc12f3.eps}
173: \caption{The distributions of the initial significances $S_{c12}$
174: (confidence densities) for the case $N_s + N_b = 70$.}
175: \label{fig:3}
176: \end{center}
177: \end{figure}
178:
179: The distribution presented in Fig.4 shows the result of the
180: full scanning in the case of the observed significance $\hat S_{cP}$ and
181: the initial significance $S_{cP}$.
182:
183: \begin{figure}[htpb]
184: \begin{center}
185: \includegraphics[width=0.9\textwidth]{scplego.eps}
186: \caption{The distribution of observed significance $\hat S_{cP}$ versus
187: the initial significance $S_{cP}$.}
188: \label{fig:4}
189: \end{center}
190: \end{figure}
191:
192:
193: The error of these estimators with a good accuracy obeys the standard
194: normal distribution (variance equals to 1). It can be confirmed by
195: the using of the Eqs.1-2 for pure background.
196: The results of the simulation of the signal absence (3000000 trials) are
197: shown in Fig.5 (for the estimator $\hat S_{c12}$) and in Fig.6
198: (for the estimator $\hat S_{cP}$).
199:
200: \begin{figure}[htpb]
201: \begin{center}
202: \includegraphics[width=0.9\textwidth]{sc12bl1.eps}
203: \caption{The distributions of the observed significances $\hat S_{c12}$
204: for four different experiments without signal.}
205: \label{fig:5}
206: \end{center}
207: \end{figure}
208:
209: \begin{figure}[htpb]
210: \begin{center}
211: \includegraphics[width=0.9\textwidth]{scpb1.eps}
212: \caption{The distributions of the observed significances $\hat S_{cP}$
213: for four different experiments without signal.}
214: \label{fig:6}
215: \end{center}
216: \end{figure}
217:
218:
219: \underline{
220: {\bf Statement 1:} The observed significance (the case of the
221: Poisson flows of events)} \\
222: \underline{is a realization of the random variable
223: which can be approximated by} \\
224: \underline{normal distribution with variance close to 1}.
225:
226: \section{What is the Combining Significance?}
227:
228: The Statement 1 allows us to determine the combinations of
229: the several partial significances $S_i$ as combinations of
230: independent normal distributed random variables
231: by the simple way.
232:
233: Let us define the observed sum $\hat S_{sum}$ of partial significances and
234: the observed combining significance $\hat S_{comb}$ for the $n$ observed
235: partial significances $\hat S_i$ with variances $var(S_i)$:
236:
237: \begin{equation}
238: \displaystyle
239: \hat S_{sum} = \sum_{i=1}^n \hat S_i,~~~~
240: var(\hat S_{sum}) = \sum_{i=1}^n var(S_i),
241: \end{equation}
242:
243: \begin{equation}
244: \displaystyle
245: \hat S_{comb} = \frac{\hat S_{sum}}{\sqrt{ var(\hat S_{sum})}}.
246: \end{equation}
247:
248:
249: \underline{{\bf Statement 2:} The ratio of the sum of the several
250: observed partial significances}\\
251: \underline{and the standard deviation of this sum is
252: the observed combining significance}\\
253: \underline{of several partial significances.}~
254: \footnote{Note the additivity of observed combined significances is not
255: conserved. We must take into account the number of partial
256: significances in each observed combined significance for performance
257: of the Eq.3.}
258:
259: \bigskip
260:
261: In our case of Poisson flows of events the variances of the considered
262: significances close to 1. It means that the formula (Eq.4) is
263: approximated by the formula
264:
265: \begin{equation}
266: \displaystyle
267: \hat S_{comb} \approx \frac{\hat S_{sum}}{\sqrt{n}}.
268: \end{equation}
269:
270: It also can be shown by Monte Carlo. Let us generate the observation of
271: the significances $\hat S_{c12}$
272: for four experiments with different parameters
273: $N_b$ and $N_s$ simultaneously. The results of this simulation
274: (30000 trials) for
275: each experiment are presented in Fig.7. The distribution of the
276: sums of four observed significances of experiments in each trial
277: is shown in Fig.8 (top). Correspondingly, the Fig.8 (bottom)
278: presents the distribution of these sums divided by $\sqrt{4}$ in each trials,
279: i.e. the distribution of the observed combined significances.
280:
281: \begin{figure}[htpb]
282: \begin{center}
283: \includegraphics[width=0.9\textwidth]{sc12s1.eps}
284: \caption{The distributions of the observed significances $\hat S_{c12}$
285: for four different experiments.}
286: \label{fig:7}
287: \end{center}
288: \end{figure}
289:
290: \begin{figure}[htpb]
291: \begin{center}
292: \includegraphics[width=0.9\textwidth]{sc12s2.eps}
293: \caption{The distribution of the sum of observed significances
294: in different experiments for each trials (top). The distribution
295: of the normalized sums of observed significances (bottom).}
296: \label{fig:8}
297: \end{center}
298: \end{figure}
299:
300: This property is correct also for significance $\hat S_{cP}$.
301:
302: %%The results of the simulation for the significance $\hat S_{cP}$
303: %%are presented in Fig.9 and Fig.10.
304:
305: %%\begin{figure}[htpb]
306: %% \begin{center}
307: %%\includegraphics[width=0.9\textwidth]{scps1.eps}
308: %%\caption{The distributions of the observed significances $\hat S_{cP}$
309: %%for four different experiments.}
310: %% \label{fig:9}
311: %% \end{center}
312: %%\end{figure}
313:
314: %%\begin{figure}[htpb]
315: %% \begin{center}
316: %%\includegraphics[width=0.9\textwidth]{scps2.eps}
317: %%\caption{The distribution of the sum of observed significances $\hat S_{cP}$
318: %%in different experiments for each trials (top). The distribution
319: %%of the normalized sums of the same observed significances (bottom).}
320: %% \label{fig:10}
321: %% \end{center}
322: %%\end{figure}
323:
324:
325: \section{Conclusion}
326:
327: The initial significance is a parameter of the given measurement.
328: The observed significance is a realization of the random variable.
329: Also the observed significance is the estimator of the
330: initial significance. It means that we must consider the combinations
331: of the significances as the combinations of the random variables with
332: corresponding estimators.
333:
334: \section*{Acknowledgments}
335:
336: We are grateful to Vladimir Gavrilov, Vassili Kachanov and
337: Vladimir Obraztsov for interest and support of this work.
338:
339: We also thank Alexander Lanyov, Sergey Shmatov, Vera Smirnova and
340: Valeri Zhukov for very useful discussions.
341:
342: This work has been partly supported by grants RFBR 05-07-90072
343: and RFBR 04-02-16381.
344:
345: \begin{thebibliography}{99}
346:
347: \bibitem{Frodesen} A.G.Frodesen, O.Skjeggestad, H.T$\o$ft,
348: {\it Probability and~Statistics~in~Particle Physics,}
349: UNIVERSITETSFORLAGET, Bergen-Oslo-Troms$\o$, 1979. p.408.
350:
351: \bibitem{BitDurh} S.I.~Bityukov, N.V.~Krasnikov, {\it Proc.
352: of the Conference on: Advanced Statistical Techniques in
353: Particle Physics}, Eds M.R.~Whalley and L.~Lyons,
354: IPPP/02/039, DCPT/02/78, Durham, UK, 2002, p.77;
355: also, {\it e-Print:} hep-ph/0204426.
356:
357: \bibitem{Bit2000} S.I.~Bityukov, N.V.~Krasnikov, {\it Nucl.Instr.\&Meth.}
358: {\bf A452} (2000) 518.
359:
360: \bibitem{GunBob} V.Bartsch and G.Quast, {\it CMS Note 2005/004}. Aug., 2003;
361: R. Cousins, J. Mumford, V. Valuev, {\it CMS Note 2005/003}. Feb., 2005.
362:
363: \bibitem{NarBit} I.Narsky, {\it Nucl.Instrum.Meth.} {\bf A450} (2000) 444;
364: G.~Quast, {\it CMS Physics Analysis Days}, May 9, 2005;
365: S.I. Bityukov, et al., {\it in Conference Proceedings of PHYSTAT2005:
366: Statistical Problems in Particle Physics, Astrophysics, and Cosmology},
367: Editors: Louis Lyons, Muge Karagoz Unel,
368: Imperial College Press, 2006, p. 106.
369:
370: \bibitem{Efr} B.~Efron, {\it Stat.Sci.} {\bf 13} (1998) 95.
371:
372: \bibitem{BitSan} S.I.~Bityukov, N.V.~Krasnikov, {\it Proc.
373: of the 25 Int. Workshop on Bayesian Inference and Maximum Entropy
374: Methods in Science and Engineering}, Eds K.H.~Knuth, A.E.~Abbas,
375: R.D.~Morris, J.P.~Castle, Melville, NY, 2005, AIP Conference
376: Proceedings, {\bf 803}, 2005, p.398.
377:
378: \bibitem{Feldman} G.I. Feldman and R.D. Cousins, {\it Phys.Rev.}
379: {\bf D57} (1998) 3873.
380:
381: \bibitem{Bit2004B} S.I.~Bityukov et al., {\it Nucl.Instr.\&Meth.}
382: {\bf A534} (2004) 228; also, {\it e-Print:} physics/0403069, 2004.
383:
384: \bibitem{CERNLIB} CERNLIB, CERN PROGRAM LIBRARY,
385: (CERN, Geneva, Switzerland, Edition - June 1996)
386:
387: \end{thebibliography}
388:
389: \medskip\noindent{Key Words: }
390: {Uncertainty, Measurement, Estimation}
391:
392: \thispagestyle{empty}
393: \end{document}
394: