1: \documentstyle[12pt,epsfig,amssymb]{article}
2:
3: \voffset=-1in
4: \hoffset=-0.75in
5: \textwidth 6.5in
6: \textheight 9.5in
7:
8: \begin{document}
9:
10: \begin{center}
11:
12: {\Large \bf Statistical properties of the estimator using covariance matrix}
13:
14: \vspace{0.1in}
15:
16: {\bf S.I. Alekhin}
17:
18: \vspace{0.1in}
19: {\baselineskip=14pt Institute for High Energy Physics, 142284, Protvino, Russia}
20:
21: \begin{abstract}
22: The statistical properties of estimator using covariance matrix
23: for the account of point-to-point correlations due to systematic
24: errors are analyzed. It is shown that the covariance matrix estimator
25: (CME) is consistent
26: for the realistic cases (when systematic errors on the fitted parameters are
27: not extremely large comparing with the statistical ones)
28: and its dispersion is always smaller, than
29: the dispersion of the simplified $\chi^2$ estimator
30: applied to the correlated data.
31: The CME bias is negligible for the realistic
32: cases if the covariance matrix is calculated during the fit iteratively
33: using the parameter estimator itself. Analytical formula for the covariance
34: matrix inversion allows to perform fast and precise calculations
35: even for very large data sets. All this allows for efficient use
36: of the CME in the global fits.
37: \end{abstract}
38: \end{center}
39: \newpage
40:
41: \section*{INTRODUCTION}
42:
43: Modern particle physics development becomes more and more based
44: on the analysis
45: of precise experimental data. This demands refining of all stages
46: of the data inference including the account of
47: correlations due to systematic uncertainties
48: which are often comparable or even larger than the statistical ones.
49: In particular this problem is important for the
50: precise tests of Standard Model
51: and determination of the parton distributions \cite{:2000nr,Catani:2000jh}.
52: Many authors for the sake of simplicity very often use
53: approaches which ignore point-to-point correlations due to systematic
54: errors, i.e. sum all errors in quadrature or drop systematics
55: at all. It is evident that if the systematic errors are important source of
56: the data uncertainty such approaches can lead to the
57: distortion of the estimated errors on the fitted parameters.
58: At the same time the construction of estimators accounting for the
59: correlations is not straightforward since
60: the competitive probabilistic model of data can be used in the analysis.
61: Essentially two generic models are possible: One based on the frequentist
62: treatment of systematic shifts and another one based on the Bayesian
63: approach. This paper is concentrated on the analysis of statistical
64: properties of the estimators within the Bayesian treatment of
65: systematic errors. An introduction
66: into this scope given in Ref.~\cite{D'Agostini:1995fv} contains
67: argumentation in favor of this approach. The only point
68: that we would like in particular underline here is that
69: the Bayesian treatment is the only constructive way in
70: the case of many sources of systematics when classical treatment which implies
71: introduction of additional parameter for every source of systematic errors
72: can cause great problem with the
73: interpretation/representation of the function of the large number of arguments.
74:
75: The natural way to account for point-to-point correlations due to
76: systematic errors within Bayesian approach is to use
77: covariance matrix associated with systematic errors
78: (see e.g. Refs.~\cite{Swartz:1994qz,Gates:1995rq}).
79: Meanwhile, there are concerns that the covariance matrix estimator (CME)
80: can result in biased values of the parameters values and their dispersions (see
81: Refs.~\cite{Seibert:1994sf,Michael:1994yj,D'Agostini:1994uj,Swartz:1996hc}).
82: In this connection it worth to recall
83: that the estimators accounting for the data correlations
84: often exhibit poor statistical properties
85: regardless they use covariance matrix or not.
86: For example as it was shown in Ref.~\cite{Daniell:1984ea}
87: the sample dispersion estimated from the
88: the correlated Monte Carlo data sets can
89: acquire the bias equal to the dispersion value itself\footnote{This
90: effect is connected with the
91: well known fact that the sample dispersion gives biased
92: estimation of the studied distribution dispersion;
93: the correlations merely amplify this bias.}.
94: At the same time the estimators would be unbiased if the
95: covariance matrix is not evaluated from the
96: measurements itself. Indeed, the unbiased estimator
97: for the correlated Monte-Carlo data
98: was constructed in Ref.~\cite{Michael:1995sz}
99: using the modeled covariance matrix.
100:
101: Running this way, one can hope to construct the unbiased estimators
102: accounting for systematic errors through covariance matrix,
103: but to be aware of its unbiasness the study of their properties is needed.
104: In view of lack of the comprehensive information on this scope in
105: literature, this paper is devoted to the analysis of the
106: statistical properties of such estimators with a particular attention paid
107: on the control of the bias. Through the paper the CME
108: properties are compared with the properties of the
109: simplest $\chi^2$ estimator (SCE) as well as if was done
110: earlier in Ref.~\cite{Alekhin:1995ij}.
111:
112: \section{THE SIMPLEST $\chi^2$ ESTIMATOR}
113:
114: To illustrate our method of the statistical properties analysis
115: we start from the analysis of uncorrelated measurements. In this case,
116: if the data sample $\{y_i\}$ is supposed to be
117: explicitly described by a theoretical model $t_i=f_i(\theta^0)$,
118: \begin{equation}
119: y_i=t_i+\mu_i \sigma_i,
120: \label{UNCSET}
121: \end{equation}
122: where $\mu_i$ are independent random variables, $\sigma_i$ are
123: statistical errors,
124: $i=1\ldots N$, $N$ is the total number of points in the sample.
125: We adopt that theoretical model parameter $\theta^0$ is scalar,
126: the generalization of the
127: formula on the case of vector parameter is evident.
128: If $y_i$ are obtained in the counting experiment with
129: the large number of events, $\mu_i$ are Gaussian distributed,
130: although it is not crucial for our consideration.
131: As a rule the values of $\sigma_i$
132: given in the experimental publications, are the estimators of the
133: $y_i$ standard deviations, i.e. are random variables,
134: but we neglect their fluctuations.
135: The SCE is based on the minimization of functional
136: \begin{equation}
137: \chi^2(\theta)=\sum_{i=1}^{N} \frac{(f_i(\theta)-y_i)^2}{\sigma_i^2}
138: \label{SIMCHI}
139: \end{equation}
140: or, equivalently, solution of the equation
141: \begin{equation}
142: \xi(\theta)\equiv\frac{1}{2}\frac{\partial\chi^2}{\partial\theta}=0.
143: \label{BASEQN}
144: \end{equation}
145: The solution $\hat\theta$ is the estimator of
146: parameter $\theta$, which is the random variable depending on $\{y_i\}$.
147: To investigate statistical properties of $\hat\theta$ we expand
148: the function $\xi(\theta)$
149: around $\theta^0$ and then apply Legendre inversion to
150: obtain the series for $\hat\theta$ (see Ref.~\cite{JAMES}
151: for the details of method).
152: Introducing
153: \begin{displaymath}
154: X=\xi(\theta^0),~~~~~~
155: a=-\left\langle\frac{\partial\xi(\theta^0)}{\partial\theta}\right\rangle ,
156: \end{displaymath}
157: \begin{displaymath}
158: b=\left\langle\frac{\partial^2\xi(\theta^0)}{\partial\theta^2}\right\rangle
159: ,~~~~~Y=\frac{\partial\xi(\theta^0)}{\partial\theta}
160: -\left\langle\frac{\partial\xi(\theta^0)}{\partial\theta}\right\rangle,
161: \end{displaymath}
162: one can obtain
163: \begin{equation}
164: \hat\theta-\theta^0=\frac{X}{a}+\frac{X Y}{a^2}+\frac{b X^2}{2a^3}+\ldots
165: \label{GENBIAS}
166: \end{equation}
167: where $<~>$ means averaging over the samples
168: and the rejected part of the expansion contains the terms with the higher
169: powers of $1/a$ and/or $X$ and $Y$.
170: In this approximation the dispersion of $\hat\theta$ is
171: \begin{displaymath}
172: D(\hat\theta)=\frac{\left\langle X^2\right\rangle}{a^2}
173: \end{displaymath}
174: and the bias is
175: \begin{displaymath}
176: B(\hat\theta)=\frac{\left\langle X\right\rangle}{a}+
177: \frac{\left\langle X Y\right\rangle}{a^2}
178: +\frac{b\left\langle X^2\right\rangle}{2a^3}.
179: \end{displaymath}
180: For the SCE applied to the sample (\ref{UNCSET})
181: one can easily obtain
182: \begin{displaymath}
183: \left\langle X\right\rangle=0,
184: \end{displaymath}
185: \begin{displaymath}
186: \left\langle X^2\right\rangle=-a=\sum_{i=1}^{N}
187: \frac{\left[f_i'(\theta_0)\right]^2}{\sigma_i^2},
188: \end{displaymath}
189: \begin{equation}
190: \left\langle X Y\right\rangle=\frac{b}{3}=\sum_{i=1}^{N}
191: \frac{f_i'(\theta_0)f_i''(\theta_0)}{\sigma_i^2},
192: \label{eqn:ab0}
193: \end{equation}
194: where $f_i'(\theta)$ is the derivative on $\theta$.
195: The dispersion and the bias of this estimator are
196: \begin{equation}
197: D_0^{\rm U}(\hat\theta)=-\frac{1}{a},~~~~~~
198: B_0^{\rm U}(\hat\theta)=-\frac{b}{6a^2}.
199: \label{DISPS}
200: \end{equation}
201:
202: If $f_i(\theta)$ are the linear functions of $\theta$ the
203: series (\ref{GENBIAS}) is truncated
204: and equation (\ref{BASEQN}) can be solved exactly.
205: One can see that in this case the estimator bias vanishes.
206: For a non-linear data model the expansion (\ref{GENBIAS})
207: contains an infinite number of terms, but
208: the contributions from the highest terms are
209: proportional to the powers of $D(\hat\theta$) and/or
210: to the central moments of $y_i$ higher than the second.
211: These contributions are progressively suppressed comparing with the main terms
212: if the data statistics rises. Here and through the paper we neglect the
213: contribution from the high
214: moments of $y_i$. Remind that the same approximation
215: is used in deducing of the central limit theorem of statistics.
216: This approach can
217: be also used to justify the analysis of a nonlinear
218: data model: The above formula can be applied to
219: the data model with a ``weak nonlinearity'', i.e. if its nonlinearity
220: is not significant on the scale of the parameter standard deviation.
221:
222: Now let the sample to have a common
223: additive systematic error. In accordance with the Bayesian approach
224: to the treatment of systematic errors the measured values are given by
225: \begin{equation}
226: y_i=t_i+\mu_i \sigma_i+\lambda s_i,
227: \label{CADDSET}
228: \end{equation}
229: where $s_i$ are systematic shifts for every point and $\lambda$
230: is the random variable with zero average and unity dispersion\footnote
231: {Emphasize, that $\lambda$ is not necessary Gaussian distributed.}.
232: Consider the case of one source of systematic error, generalization
233: on the many sources case is straightforward.
234: For the sample (\ref{CADDSET}) we loose statistical independence of
235: measurements and
236: with the account of the their correlations the relevant expression
237: for the dispersion and bias are more complicated
238: \begin{displaymath}
239: \left\langle X^2\right\rangle=\sum_{i,j=1}^{N}
240: \frac{C_{ij}}{\sigma_i^2 \sigma_j^2}f_i'(\theta_0)f_j'(\theta_0)
241: =-a+\left[\sum_{i=1}^N \frac{s_i}{\sigma_i^2}
242: f_i'(\theta_0)\right]^2,
243: \end{displaymath}
244: \begin{displaymath}
245: \left\langle X Y\right\rangle=\sum_{i,j=1}^{N}
246: \frac{C_{ij}}{\sigma_i^2 \sigma_j^2}f_i'(\theta_0)f_j''(\theta_0)
247: \end{displaymath}
248: \begin{displaymath}
249: =\frac{b}{3}+\left[\sum_{i=1}^N \frac{s_i}{\sigma_i^2}
250: f_i'(\theta_0)\right]
251: \left[\sum_{i=1}^N \frac{s_i}{\sigma_i^2}
252: f_i''(\theta_0)\right],
253: \end{displaymath}
254: where $a$ and $b$ are given by Eqn.~(\ref{eqn:ab0}),
255: $C_{ij}$ is the covariance matrix for $\{y_i\}$
256: \begin{equation}
257: C_{ij} = s_i s_j+\delta_{ij}\sigma_i\sigma_j,
258: \label{COVA}
259: \end{equation}
260: and $\delta_{ij}$ is Kronecker symbol. Expressions for $a$ and $b$
261: are the same as for the uncorrelated data case.
262: In terms of the $N$-component vectors
263: \begin{displaymath}
264: \rho_i=\frac{s_i}{\sigma_i},~~~~~
265: \phi_1^i=\frac{f_i'(\theta_0)}{\sigma_i},~~~~
266: \phi_2^i=\frac{f_i''(\theta_0)}{\sigma_i}
267: \end{displaymath}
268: the dispersion and the bias in this case can be expressed as
269: \begin{equation}
270: D_0^{\rm A}(\hat\theta)=\frac{1}{\phi_1^2}\left(1+\rho^2 z^2_1\right),
271: \label{DISPSA}
272: \end{equation}
273: \begin{equation}
274: B_0^{\rm A}(\hat\theta)=-\frac{\phi_2}
275: {2\phi_1^3}\left[\Bigl(1+\frac{3}{2}\rho^2 z^2_1\Bigr)z_{12} -
276: \rho^2 z_1 z_2\right],
277: \label{BIASSA}
278: \end{equation}
279: where $\rho, \phi_1, \phi_2$ denote the vectors modulus,
280: $z_1$ is the cosine of angle between $\vec\rho$ and $\vec\phi_1$,
281: $z_2$ -- between $\vec\rho$ and $\vec\phi_2$,
282: $z_{12}$ -- between $\vec\phi_1$ and $\vec\phi_2$.
283: The dispersion of $\hat\theta$ is larger than for uncorrelated data
284: because now it also accounts for the fluctuations due to
285: systematic errors. As to the bias it remains zero for the linear
286: model.
287:
288: If systematic errors are multiplicative
289: \begin{equation}
290: y_i=(t_i+\mu_i \sigma_i)(1+\lambda \eta_i),
291: \label{CMULSET}
292: \end{equation}
293: where $\eta_i$ quantify the systematic errors. If both statistical and
294: systematic errors are small comparing with $t_i$
295: $$
296: y_i\approx t_i+\mu_i \sigma_i+\lambda \eta_i t_i,
297: $$
298: the correlation matrix is
299: \begin{equation}
300: C_{ij} = \eta_i \eta_j t_i t_j+\delta_{ij}\sigma_i\sigma_j,
301: \label{COVM}
302: \end{equation}
303: and the expressions for the bias and dispersion are the
304: same as for the additive systematics case
305: after the substitution $s_i \rightarrow \eta_i t_i$.
306:
307: The Eqn.~(\ref{DISPSA}) can be split into the parts
308: which correspond to the
309: statistical and systematic fluctuations. One can see that when
310: vectors $\vec\rho$ and $\vec\phi_1$ are orthogonal
311: the systematic error on $\hat\theta$ is equal to zero
312: and the total dispersion is suppressed.
313: Such suppression can be illustrated on the example of
314: the extraction of asymmetry from
315: the data with general offset error.
316: Let $f_i(\theta)=\theta x_i$ and both statistical and systematic errors are
317: constant through the sample: $s_i=s$, $\sigma_i=\sigma$.
318: Then $\rho_i=s/\sigma$, $\phi_1^i=x_i/\sigma$ and
319: $z_1\sim \sum x_i$. If the positive and negative values of $x_i$
320: compensate each other in
321: the measurements, $z_1=0$ and the systematic error vanishes.
322: The appropriate data filtration can also be used to suppress the dispersion
323: (\ref{DISPSA}). To clarify the mechanism of this suppression let us trace the
324: effect of a separate data point on the dispersion value.
325: Add to the data set a point with statistical error $\sigma_0$,
326: systematic error $s_0$ and the data model $f_0(\theta)$. If the initial
327: data set is large and the systematic error is comparable with statistics, i.e.
328: $$
329: N \gg 1,~~~~~~~~~~~~~~\rho \gg 1,
330: $$
331: \begin{equation}
332: \phi_1 \gg \frac{f_0'(\theta_0)}{\sigma_0},~~~~~~~~~~
333: \rho\phi_1 z_1\gg \frac{s_0}{\sigma_0^2}f_0'(\theta_0),
334: \label{eqn:largeset}
335: \end{equation}
336: the change of $D_0^{\rm A}(\hat\theta)$ after adding the new point is
337: \begin{equation}
338: \Delta D_0^{\rm A}(\hat\theta)\approx\frac{2\rho}{\phi_1^3}
339: \frac{1}{\sigma_0^2}\left[z_1 s_0f_0'(\theta_0)
340: -\frac{\rho z_1^2}{\phi_1}\left[f_0'(\theta_0)\right]^2\right].
341: \label{eqn:deldisp0}
342: \end{equation}
343: The second term in brackets is always negative and gives
344: the decrease of dispersion
345: due to improved statistical precision. At the same time the first term
346: can be negative or positive, depending on the signs of $z_1$ and $s_0$.
347: Its absolute value can be larger than
348: the absolute value of the second term and then
349: $D_0^{\rm A}(\hat\theta)$ can increase or decrease after adding the new point.
350: This is manifestation of inconsistency of the SCE applied to the
351: correlated data set.
352: The balance between terms of Eqn.~({\ref{eqn:deldisp0})
353: is defined by the distribution
354: of $f_i'(\theta_0)/s_i$ and cuts of the tails of this distribution
355: can decrease the estimator dispersion.
356:
357: \section{THE COVARIANCE MATRIX ESTIMATOR}
358:
359: If systematic error is additive and covariance matrix is known
360: a priori and is given by (\ref{COVA}) one can use for the parameter
361: estimation the following functional minimization
362: \begin{equation}
363: \chi^2(\theta)=\sum_{i,j=1}^{N} (f_i(\theta)-y_i) E_{ij} (f_j(\theta)-y_j),
364: \label{CORCHI}
365: \end{equation}
366: where $E_{ij}$ is the inverted correlation matrix.
367: This problem can be reduced to the uncorrelated
368: case using the linear transformation of the vector $\{f_i(\theta)-y_i\}$
369: and the estimator is linear for the linear data model.
370: Besides, if statistical and systematics fluctuations obey
371: the Gaussian distribution,
372: this estimator provides minimal dispersion due to the Cramer-Rao
373: inequality.
374:
375: One can easily derive the expressions necessary to calculate the
376: estimator bias and dispersion
377: \begin{displaymath}
378: \left\langle X\right\rangle=0,
379: \end{displaymath}
380: \begin{displaymath}
381: \left\langle X^2\right\rangle=-a=\sum_{i,j=1}^{N}
382: f_i'(\theta_0)E_{ij}f_j'(\theta_0),
383: \end{displaymath}
384: \begin{displaymath}
385: \left\langle X Y\right\rangle=\frac{b}{3}=\sum_{i,j=1}^{N}
386: f_i'(\theta_0)E_{ij}f_j''(\theta_0).
387: \end{displaymath}
388: Substituting in the above relations the explicit expression for $E_{ij}$
389: \begin{displaymath}
390: E_{ij}=\frac{1}{\sigma_i \sigma_j}
391: \Bigl(\delta_{ij} -
392: \frac{\rho_i\rho_j}{1+\rho^2}\Bigr)
393: \end{displaymath}
394: we obtain the estimator dispersion
395: \begin{equation}
396: D_{\rm M}^{\rm A}(\hat\theta)=\frac{1}{\phi_1^2}
397: \left[1+\frac{\rho^2 z_1^2}{1+\rho^2(1-z_1^2)}\right]
398: =\frac{1}{\phi_1^2}\xi_{\rm M},
399: \label{DISPCA}
400: \end{equation}
401: where $\xi_{\rm M}$ is the ratio of the total dispersion to the
402: pure statistical one.
403: If $\vec\rho$ and $\vec\phi_1$ are collinear the dispersion of the estimator is
404: \begin{displaymath}
405: D^{A,\parallel}_{\rm M}(\hat\theta)=\frac{1+\rho^2}
406: {\phi_1^2},
407: \end{displaymath}
408: which coincide with the SCE dispersion (\ref{DISPSA}).
409: One can see that if $\vec\rho$ and $\vec\phi_1$ are not collinear
410: the SCE dispersion (\ref{DISPSA})
411: is always larger than the CME dispersion (\ref{DISPCA}).
412: This can be readily explained qualitatively.
413: For SCE the fitted curve tightly follows the
414: data points and, if these points are shifted due to the systematic errors
415: fluctuations, the parameter gains appropriate systematic errors.
416: At the same time, since for the CME the information on the data
417: correlations is explicitly included in $\chi^2$, the correlated
418: fluctuation of the data due to systematic shift does not necessary leads to
419: the fitted curve shift and the parameter deviation gets smaller than
420: for SCE.
421: The exclusion occurs if $z_1=0$, when $\vec\rho$ and $\vec\phi_1$ are collinear
422: and the systematic shift can be perfectly
423: compensated by the change of parameter.
424: If these vectors are orthogonal the CME dispersions is
425: \begin{displaymath}
426: D^{A,\perp}_{\rm M}(\hat\theta)=\frac{1}{\phi_1^2}
427: \end{displaymath}
428: i.e. it is just the same as the dispersion of SCE
429: applied to the data set without correlations (\ref{DISPS}).
430: Qualitatively it corresponds to the measurements scheme when
431: systematic shift for the different points compensate each other,
432: e.g. as in the example considered at the end of Sec.~1.
433:
434: For the modern experiments systematic errors are often of the same order
435: as statistical ones and if $N\gg 1$ then $\rho\gg 1$.
436: In this limit and if $\vec\rho$ and $\vec\phi_1$ are not collinear
437: \begin{equation}
438: D_{\rm M}^{\rm A}(\hat\theta)\approx\frac{1}{\phi_1^2(1-z^2_1)}
439: \label{eqn:dispmr}
440: \end{equation}
441: and
442: \begin{equation}
443: D_0^{\rm A}(\hat\theta)\approx\frac{\rho^2 z^2_1}{\phi_1^2}.
444: \label{eqn:disp0r}
445: \end{equation}
446: One can see that in the second case
447: the estimator standard deviation rises linearly with
448: the increase of the systematics, whereas the CME dispersion saturates.
449: This difference can be illustrated on the numerical example
450: inspired by the elastic proton-proton scattering. Let us choose
451: $$
452: f_i=U\exp^{(-V x_i)},~~~~x_i=0.1 i,
453: $$
454: where $U=100, V=10, i=1\ldots 9$.
455: Generating 100 data sets (\ref{CADDSET}) with these $f_i$ and
456: \begin{equation}
457: \sigma_i=0.01\sqrt{\frac{U}{f_i}},~~~~s_i=\frac{\kappa}{x_i}
458: \label{eqn:testset}
459: \end{equation}
460: we minimized functionals (\ref{SIMCHI}) and (\ref{CORCHI}) varying
461: $U$ and $V$ to obtain their estimators $\hat U$ and $\hat V$.
462: The values of $(\hat U-U)^2$ and $(\hat V-V)^2$
463: for all of the generated data sets were averaged to obtain the
464: estimators dispersions.
465: The results on the standard deviation of $\hat U$ for different values of
466: $\kappa$ are given in
467: Fig.~\ref{fig:disp} (the picture for $\hat V$ is similar).
468: One can see that at large $\kappa$ the CME and the SCE standard deviations
469: differ by factor of 3.
470:
471: The example of dispersion suppression observed in
472: the analysis of real experimental
473: data can be found in Ref.~\cite{Alekhin:1995dz}.
474: In this paper we performed the
475: leading order QCD fit to the inclusive deep inelastic scattering data
476: of Refs.~\cite{Benvenuti:1989rh,Benvenuti:1990fm}
477: obtained by the BCDMS collaboration
478: in order to determine the parton distribution functions and the
479: strong coupling constant value $\alpha_{\rm s}$. The two different estimators
480: were used and the different estimates were obtained. For the
481: SCE the standard deviation of $\alpha_{\rm s}(M_{\rm Z})$
482: is 0.015, while for the CME it is 0.007.
483: The difference in the gluon distribution bounds for these estimators
484: can is given in Fig.~\ref{fig:bcdms}. One can see that the standard deviation
485: of the gluon distribution for the CME is also about
486: a half of the SCE standard deviation.
487:
488: If $z_1\ne 1$, the change of CME dispersion
489: after adding a new point to the large sample as defined by
490: Eqn.~(\ref{eqn:largeset}) is
491: \begin{displaymath}
492: \Delta D_{\rm M}^{\rm A}(\hat\theta)\approx-\frac{1}{\phi_1^4(1-z_1^2)^2}
493: \frac{1}{\sigma_0^2}
494: \left[f_0'(\theta_0)
495: -\frac{\phi_1 z_1}{\rho}s_0\right]^2.
496: \end{displaymath}
497: This change is always negative that proves the CME
498: consistency. Remind, that this is not necessary for the
499: SCE (see Sec.~1). The same conclusion can be drawn
500: from the comparison of Eqns.~(\ref{eqn:dispmr})
501: and (\ref{eqn:disp0r}). Indeed, the CME dispersion falls with
502: increase of statistical significance of the data set (i.e. decrease of
503: $\sigma$ or rise of $N$) while the SCE dispersion does not.
504: Note, that due to consistency of the CME
505: the filtration procedure described in Sec.~1
506: is not meaningful for it.
507:
508: \begin{figure}[t]
509: \centerline{\psfig{figure=f1.ps,height=7cm}}
510: \caption{The standard deviations of SCE (circles) and CME (squares)
511: for $\hat U$ at different scales of systematic errors $\kappa$.
512: The lines correspond to the calculation performed with
513: the two-dimensional generalization of
514: Eqns.~(9,16).}
515: %Eqns.~(\ref{DISPCA},\ref{DISPSA}).}
516: \label{fig:disp}
517: \end{figure}
518:
519: The CME bias is
520: \begin{displaymath}
521: B_{\rm M}^{\rm A}(\hat\theta)=-\frac{\phi_1\phi_2}{2}
522: \left[D_{\rm M}^{\rm A}(\hat\theta)\right]^2\left(z_{12}
523: -\frac{\rho^2}{1+\rho^2}z_1 z_2\right),
524: \end{displaymath}
525: which vanishes for the linear data model and
526: saturates in the limit of $\rho\gg 1$ contrary to the SCE.
527: In the numerical example (\ref{eqn:testset}) at $\kappa=0.007$
528: the CME bias is 0.07, whereas the SCE bias is 0.13.
529:
530: \begin{figure}[t]
531: \centerline{\psfig{figure=bcdms.ps,height=7cm}}
532: \caption{Bounds of gluon distribution obtained from the
533: LO QCD fit to BCDMS data
534: with different estimators (the SCE: a; the CME: b).
535: Full lines correspond to the total experimental errors, dashed ones -- to
536: the statistical only.}
537: \label{fig:bcdms}
538: \end{figure}
539:
540: For the multiplicative systematic errors
541: the covariance matrix in unknown a priori and one is to calculate
542: it using the parameter estimator. Proceeding this
543: way in the minimization of the functional (\ref{CORCHI}) we get
544: \begin{equation}
545: a=-\sum_{i,j=1}^{N}f_i'(\theta^0)E_{ij}f_j'(\theta^0)-
546: \frac{1}{2}\sum_{i,j=1}^{N}E_{ij}''C_{ij}.
547: \label{eqn:dispm}
548: \end{equation}
549: The difference with corresponding expression for the case
550: of additive systematic errors
551: is in the second term of Eqn.~(\ref{eqn:dispm}).
552: For the linear data model this term is
553: $$
554: a^{(2)}=\frac{1}{2}\sum_{i,j=1}^{N}E_{ij}''C_{ij}=
555: \frac{\phi_3^2}{2(1+\rho^2)^2}
556: \left[\rho^4(z^2_3-1)-3\rho^2 z^2_{3}+1\right],
557: $$
558: where
559: \begin{displaymath}
560: \phi_3^i=\rho_i'=\frac{\rho_i}{f_i}f_i'(\theta^0)=\eta_i\phi_1^i,
561: \end{displaymath}
562: $\phi_3$ is modulus of $\vec\phi_3$
563: and $z_{3}$ is the cosine of the angle between $\vec\phi_3$ and $\vec\rho$.
564: The ratio of the second term of Eqn.~(\ref{eqn:dispm}) to the first term
565: $a^{(1)}=\sum f_i'(\theta^0)E_{ij}f_j'(\theta^0)$ is
566: \begin{equation}
567: \frac{a^{(2)}}{a^{(1)}}=
568: \frac{\phi_3^2}{\phi_1^2}\cdot
569: \frac{\rho^4(z^2_3-1)-3\rho^2 z^2_{3}+1}
570: {(1+\rho^2)^2}\xi_{\rm M}.
571: \label{eqn:dispmult}
572: \end{equation}
573: If $\xi_{\rm M}\sim O(1)$ (that is valid for most real cases),
574: $a^{(2)}\sim O(\eta^2)a^{(1)}$ for all values of $\rho$, i.e. it
575: is suppressed comparing with the first term for small $\eta$.
576: Neglecting as elsewhere the third and fourth central moments of $\{y_i\}$,
577: one can obtain that $<X^2>\approx -a$ and
578: the estimator dispersion for multiplicative
579: systematic errors $D_{\rm M}^{\rm M}\approx D_{\rm M}^{\rm A}$ .
580:
581: In the case of multiplicative systematics errors Eqn.~(\ref{BASEQN})
582: is nonlinear even for the linear data model.
583: As a consequence, the expressions responsible for the bias
584: $$
585: \left\langle X \right\rangle=
586: \frac{1}{2}\sum_{i,j=1}^{N} E_{ij}' C_{ij},
587: $$
588: $$
589: b=3 \sum_{i,j=1}^{N}
590: f_i'(\theta^0)E_{ij}f_j''(\theta^0)+
591: 3 \sum_{i,j=1}^{N}f_i'(\theta^0) E_{ij}'f_j'(\theta^0)+
592: \frac{1}{2}\sum_{i,j=1}^{N}E_{ij}'''C_{ij},
593: $$
594: \begin{equation}
595: \left\langle X Y\right\rangle=\sum_{i,j=1}^{N}
596: f_i'(\theta^0)E_{ij}f_j''(\theta^0)+
597: 2 \sum_{i,j=1}^{N}
598: f_i'(\theta^0) E_{ij}' f_j'(\theta^0)-
599: \frac{1}{4}\sum_{i,j=1}^{N}E_{ij}''C_{ij}\sum_{i,j=1}^{N} E_{ij}' C_{ij},
600: \label{eqn:biasm}
601: \end{equation}
602: do not vanish even if $f_i''({\theta})$ is equal to zero.
603: Meanwhile the bias due to the estimator nonlinearity is small comparing with
604: the estimator standard deviation. Since
605: $1/D_{\rm M}^{\rm M}\approx <X^2>\approx -a$ the bias of estimator with
606: multiplicative systematic errors is
607: \begin{equation}
608: B_{\rm M}^{\rm M}(\hat\theta)\approx\sqrt{D_{\rm M}^{\rm M}(\hat\theta)}
609: \left[\frac{\left\langle X\right\rangle}{\sqrt{-a}}+
610: \frac{\left\langle X Y\right\rangle-b/2}{(-a)^{3/2}}\right].
611: \label{eqn:biasmult}
612: \end{equation}
613: The first term in the brackets of Eqn.~(\ref{eqn:biasmult}) is
614: \begin{equation}
615: \frac{\left\langle X\right\rangle}{\sqrt{-a}}\approx
616: -\frac{\phi_3}{\phi_1}
617: \frac{\rho z_3}{1+\rho^2}\sqrt{\xi_{\rm M}}\sim O(\eta\sqrt{\xi_{\rm M}}).
618: \label{eqn:biasx}
619: \end{equation}
620: The contribution to the second term in brackets of Eqn.~(\ref{eqn:biasmult})
621: from $\sum f_i'(\theta^0) E_{ij}'f_j'(\theta^0)$ is proportional to
622: $$
623: \frac{\phi_3}{\phi_1}\frac{\rho z_1}{(1+\rho^2)}
624: \left(\frac{\rho^2}{1+\rho^2}z_1 z_3-z_{13}\right)\xi_{\rm M}^{3/2}
625: $$
626: and hence it is $\sim O(\eta\xi_{\rm M}^{3/2})$. As one can conclude from
627: Eqns.~(\ref{eqn:dispmult},\ref{eqn:biasx}) the contribution
628: to the same term from
629: $\sum E_{ij}''C_{ij}\cdot\sum E_{ij}' C_{ij}$ is $O(\eta^3\xi_{\rm M}^{3/2})$.
630: And finally since
631: $$
632: \frac{1}{2}
633: \sum_{i,j=1}^N E_{ij}'''C_{ij}=\frac{\rho z_1\phi_3^3}{(1+\rho^3)^2}
634: \left[\rho^4(z_3^2-1)+\rho^2(1-3z_3^2)+2\right]
635: $$
636: the contribution to Eqn.~(\ref{eqn:biasmult}) coming from this term
637: is $O(\eta^3\xi_{\rm M}^{3/2})$.
638: In summary, for the linear data model
639: the estimator bias is a sum of terms
640: $O(\eta^{p}\xi_{M}^{q})D^{\rm M}_{\rm M}$
641: with $p\ge 1$ and $q\le 3/2$.
642: Besides, at small $\rho$ all the four contributions to the bias
643: which survive for the linear data model are
644: $\sim \rho$ while at large $\rho$
645: they are $\sim1/\rho$. Summarizing, one can conclude that
646: the estimator is negligible excluding the extreme cases with very large
647: $\xi_{\rm M}$.
648:
649: The explicit estimate of the bias can be obtained from the
650: Eqns.~(\ref{eqn:biasm},\ref{eqn:biasmult}).
651: Meanwhile it requires rather lengthy calculations and more
652: simple tool for the bias evaluation is admirable.
653: A convenient way for this is to trace the net residual
654: \begin{displaymath}
655: R=-\frac{1}{N}\sum_{i=1}^{N}\frac{f_i(\hat\theta)-y_i}
656: {\sqrt{\sigma_i^2+s_i^2}}.
657: \end{displaymath}
658: Expanding $f_i(\theta)$ near $\theta_0$
659: and keeping only the first term in Eqn.~(\ref{GENBIAS})
660: one obtains for the sample (\ref{CADDSET})
661: $$
662: R\approx -\frac{1}{N}
663: \sum_{i=1}^{N}\frac{\mu_i+\lambda \rho_i}{\sqrt{1+\rho_i^2}}
664: +(\hat\theta-\theta_0)
665: \frac{1}{N}\sum_{i=1}^{N}\frac{\phi_1^i}{\sqrt{1+\rho_i^2}}.
666: $$
667: If the estimator is unbiased, the value of $R$ averaged over
668: the samples is equal to zero. Nevertheless the particular values of $R$
669: may be not equal to zero due to fluctuations.
670: For the limited $\xi_{\rm M}$ the dispersion of $R$ is
671: \begin{equation}
672: D(R)=\frac{1}{N^2}\sum_{i,j=1}^{N}\frac{\delta _{ij}+\rho_i\rho_j}
673: {\sqrt{1+\rho_i^2}\sqrt{1+\rho_j^2}}+O(1/N).
674: \label{eqn:Rdisp}
675: \end{equation}
676: If the analyzed data come from a single experiment
677: with dominating systematics (i.e with $ \rho > 1$) then
678: $D(R)\sim 1$. In particular for the BCDMS data
679: of Refs.~\cite{Benvenuti:1989rh,Benvenuti:1990fm} $D(R)\approx 0.7$.
680: For $N_{exp}$ independent experiments involved in the analysis
681: $D(R)\sim 1/N_{exp}$. Comparing the net residual $R$
682: with this value allows to get a guess about the estimator bias.
683: More definite conclusion
684: can be drawn after the comparison of $R$
685: with its dispersion calculated using Eqn.~(\ref{eqn:Rdisp}).
686:
687: \section{PLANNING OF THE COUNTING EXPERIMENTS}
688:
689: In a particular case when the differential cross section on the
690: variable $x$ is measured, the predicted average
691: number of events in the $i-$th bin of is
692: $$
693: \left\langle N_i\right\rangle =Lf_i\Delta x_i\beta_i,
694: $$
695: where $L$ is the integral experiment luminosity,
696: $\beta_i$ is the registration efficiency, and $\Delta x_i$ is
697: the bin width. Neglecting the fluctuations of $N_i$
698: the statistical error on the $i-$th measurement is
699: $$
700: \sigma_i=\frac{\sqrt{\left\langle N_i\right\rangle}}{L\Delta x_i\beta_i}
701: $$
702: and
703: $$
704: \frac{1}{\sigma_i^2}=\frac{L\beta_i}{f_i}\Delta x_i.
705: $$
706: The scalar product of the vectors $\vec \rho$ and $\vec \phi$
707: is
708: $$
709: \left(\vec\rho \cdot \vec\phi\right)=L\sum_{i=1}^{N}\frac{f_i's_i}
710: {f_i}\beta_i\Delta x_i
711: $$
712: and
713: $$
714: \phi^2=L\sum_{i=1}^{N}\frac{\left[f_i'\right]^2}
715: {f_i}\beta_i\Delta x_i,~~~~~~
716: \rho^2=L\sum_{i=1}^{N}\frac{\left[s_i\right]^2}
717: {f_i}\beta_i\Delta x_i.
718: $$
719: For the dense measurements these scalars can be
720: reduced to the integrals over the measurements region $\Omega$:
721: $$
722: \left(\vec\rho \cdot \vec\phi\right)=L\int_{\Omega} f'(x)s(x) d\tilde{x}
723: $$
724: and
725: $$
726: \phi^2=L\int_{\Omega}\left[f'(x)\right]^2 d\tilde x,~~~~~~
727: \rho^2=L\int_{\Omega}\left[s(x)\right]^2 d\tilde x,
728: $$
729: where $d\tilde{x}=\beta(x)/f(x) dx$.
730: The latter expressions can be used in
731: the equations for the estimators dispersions\footnote{As a result
732: one obtains the Fisher's information for the
733: correlated data case.}.
734: This approach is convenient for the future experiment optimization
735: since it allows for to analyze integrated expression
736: in order to search for the optimal region of measurements.
737: For the simple functions $f(x)$, $\beta(x)$, and $s(x)$ such
738: analysis sure can be performed analytically.
739:
740: \section{CONCLUSION}
741:
742: In conclusion, the CME is a convenient tool
743: for the analysis of the data sets with the account of correlations due to
744: systematic errors. The CME is consistent
745: for the realistic cases (when systematic errors on the fitted parameters are
746: not extremely large comparing with the statistical ones)
747: and its dispersion is always smaller, than
748: the dispersion of the $\chi^2$ estimator without account of correlations.
749: The estimator bias is negligible for the realistic
750: cases if the covariance matrix is calculated during the fit iteratively
751: using the parameter estimator itself. Analytical formula for the covariance
752: matrix inversion allows to perform fast and precise calculations
753: even for very large data sets. The latter
754: is especially important in view of numerical instabilities occurring
755: in the fits to precise data in the case of large correlation between
756: the fitted parameters (see in this connection
757: Ref.~\cite{Alekhin:1994}).
758:
759: A particular attention should be paid on
760: the connection between the estimator dispersion
761: and the confidence interval. For a known distribution
762: of the estimator the confidence interval
763: can be easily calculated
764: (e.g. it is well known that for the Gaussian distribution
765: one standard deviation corresponds to the 67\% confidence level).
766: Unfortunately due to the possible non-Gaussian nature of the systematic errors
767: one cannot prove that an estimator accounting for systematics
768: is Gaussian distributed.
769: However for the large number of systematic errors of comparable scale
770: the estimator should obey Gaussian distribution just to the central
771: limit theorem of statistics. Otherwise the robust estimates of the
772: confidence intervals, e.g. Chebyshev's inequality, should be used.
773:
774: {\bf Acknowledgments}
775:
776: I am indebted to S.Keller for valuable discussions and comments.
777: The work was supported by RFBR grant 00-02-17432.
778:
779: \begin{thebibliography}{99}
780:
781: \bibitem{:2000nr}
782: [ALEPH, DELPHI, L3, OPAL Collaborations,
783: SLD Heavy Flavour Group, and Electroweak Group], CERN-EP-2000-016.
784:
785: \bibitem{Catani:2000jh}
786: S.~Catani {\it et al.},
787: hep-ph/0005025.
788:
789: \bibitem{D'Agostini:1995fv}
790: G.~D'Agostini, hep-ph/9512295.
791:
792: \bibitem{Swartz:1994qz}
793: M.~L.~Swartz, hep-ph/9411353.
794:
795: \bibitem{Gates:1995rq}
796: E.~Gates, L.~M.~Krauss and M.~White,
797: Phys.\ Rev.\ {\bf D51}, 2631 (1995)
798: [hep-ph/9406396].
799:
800: \bibitem{Seibert:1994sf}
801: D.~Seibert,
802: Phys.\ Rev.\ {\bf D49}, 6240 (1994)
803: [hep-lat/9305014].
804:
805: \bibitem{Michael:1994yj}
806: C.~Michael,
807: Phys.\ Rev.\ {\bf D49}, 2616 (1994)
808: [hep-lat/9310026].
809:
810: \bibitem{D'Agostini:1994uj}
811: G.~D'Agostini,
812: Nucl.\ Instrum.\ Meth.\ {\bf A346}, 306 (1994).
813:
814: \bibitem{Swartz:1996hc}
815: M.~L.~Swartz,
816: Phys.\ Rev.\ {\bf D53}, 5268 (1996)
817: [hep-ph/9509248].
818:
819: \bibitem{Daniell:1984ea}
820: G.~J.~Daniell, A.~J.~Hey and J.~E.~Mandula,
821: Phys.\ Rev.\ {\bf D30}, 2230 (1984).
822:
823: \bibitem{Michael:1995sz}
824: C.~Michael and A.~McKerrell,
825: Phys.\ Rev.\ {\bf D51}, 3745 (1995)
826: [hep-lat/9412087].
827:
828: \bibitem{Alekhin:1995ij}
829: S.~I.~Alekhin,
830: IFVE-95-48.
831:
832: \bibitem{JAMES}
833: Eadie W.T., Drijard D., James F.E., Roos M., Sadoulet B.,
834: Statistical Methods in Experimental Physics, North Holland, 1971.
835:
836: \bibitem{Alekhin:1995dz}
837: S.~I.~Alekhin,
838: IFVE-95-65.
839:
840: \bibitem{Benvenuti:1989rh}
841: A.~C.~Benvenuti {\it et al.} [BCDMS Collaboration],
842: Phys.\ Lett.\ {\bf B223} (1989) 485.
843:
844: \bibitem{Benvenuti:1990fm}
845: A.~C.~Benvenuti {\it et al.} [BCDMS Collaboration],
846: Phys.\ Lett.\ {\bf B237} (1990) 592.
847:
848: \bibitem{Alekhin:1994}
849: S.~I.~Alekhin,
850: IFVE-94-70.
851:
852: \end{thebibliography}
853:
854: \end{document}
855: