hep-ex0005042/c.tex
1: \documentstyle[12pt,epsfig,amssymb]{article}
2: 
3: \voffset=-1in
4: \hoffset=-0.75in
5: \textwidth 6.5in
6: \textheight 9.5in
7: 
8: \begin{document}
9: 
10: \begin{center}
11: 
12: {\Large \bf Statistical properties of the estimator using covariance matrix}
13: 
14: \vspace{0.1in}
15: 
16: {\bf S.I. Alekhin}
17: 
18: \vspace{0.1in}
19: {\baselineskip=14pt Institute for High Energy Physics, 142284, Protvino, Russia}
20: 
21: \begin{abstract}
22: The statistical properties of estimator using covariance matrix 
23: for the account of point-to-point correlations due to systematic 
24: errors are analyzed. It is shown that the covariance matrix estimator
25: (CME) is consistent 
26: for the realistic cases (when systematic errors on the fitted parameters are 
27: not extremely large comparing with the statistical ones)
28: and its dispersion is always smaller, than 
29: the dispersion of the simplified $\chi^2$ estimator 
30: applied to the correlated data.
31: The CME bias is negligible for the realistic 
32: cases if the covariance matrix is calculated during the fit iteratively 
33: using the parameter estimator itself. Analytical formula for the covariance 
34: matrix inversion allows to perform fast and precise calculations 
35: even for very large data sets. All this allows for efficient use 
36: of the CME in the global fits. 
37: \end{abstract}
38: \end{center}
39: \newpage
40: 
41: \section*{INTRODUCTION}
42: 
43:  Modern particle physics development becomes more and more based 
44: on the analysis
45: of precise experimental data. This demands refining of all stages
46: of the data inference including the account of 
47: correlations due to systematic uncertainties 
48: which are often comparable or even larger than the statistical ones.
49: In particular this problem is important for the 
50: precise tests of Standard Model 
51: and determination of the parton distributions \cite{:2000nr,Catani:2000jh}.
52: Many authors for the sake of simplicity very often use
53: approaches which ignore point-to-point correlations due to systematic
54: errors, i.e. sum all errors in quadrature or drop systematics
55: at all. It is evident that if the systematic errors are important source of
56: the data uncertainty such approaches can lead to the 
57: distortion of the estimated errors on the fitted parameters.
58: At the same time the construction of estimators accounting for the 
59: correlations is not straightforward since
60: the competitive probabilistic model of data can be used in the analysis. 
61: Essentially two generic models are possible: One based on the frequentist  
62: treatment of systematic shifts and another one based on the Bayesian 
63: approach. This paper is concentrated on the analysis of statistical 
64: properties of the estimators within the Bayesian treatment of 
65: systematic errors. An introduction
66: into this scope given in Ref.~\cite{D'Agostini:1995fv} contains 
67: argumentation in favor of this approach. The only point 
68: that we would like in particular underline here is that 
69: the Bayesian treatment is the only constructive way in
70: the case of many sources of systematics when classical treatment which implies
71: introduction of additional parameter for every source of systematic errors
72: can cause great problem with the 
73: interpretation/representation of the function of the large number of arguments.
74: 
75: The natural way to account for point-to-point correlations due to 
76: systematic errors within Bayesian approach is to use
77: covariance matrix associated with systematic errors 
78: (see e.g. Refs.~\cite{Swartz:1994qz,Gates:1995rq}).
79: Meanwhile, there are concerns that the covariance matrix estimator (CME)  
80: can result in biased values of the parameters values and their dispersions (see 
81: Refs.~\cite{Seibert:1994sf,Michael:1994yj,D'Agostini:1994uj,Swartz:1996hc}). 
82: In this connection it worth to recall 
83: that the estimators accounting for the data correlations 
84: often exhibit poor statistical properties
85: regardless they use covariance matrix or not.
86: For example as it was shown in Ref.~\cite{Daniell:1984ea} 
87: the sample dispersion estimated from the 
88: the correlated Monte Carlo data sets can 
89: acquire the bias equal to the dispersion value itself\footnote{This 
90: effect is connected with the 
91: well known fact that the sample dispersion gives biased 
92: estimation of the studied distribution dispersion;
93: the correlations merely amplify this bias.}. 
94: At the same time the estimators would be unbiased if the 
95: covariance matrix is not evaluated from the 
96: measurements itself. Indeed, the unbiased estimator 
97: for the correlated Monte-Carlo data
98: was constructed in Ref.~\cite{Michael:1995sz}
99: using the modeled covariance matrix.
100: 
101: Running this way, one can hope to construct the unbiased estimators
102: accounting for systematic errors through covariance matrix, 
103: but to be aware of its unbiasness the study of their properties is needed.
104: In view of lack of the comprehensive information on this scope in 
105: literature, this paper is devoted to the analysis of the 
106: statistical properties of such estimators with a particular attention paid
107: on the control of the bias. Through the paper the CME  
108: properties are compared with the properties of the 
109: simplest $\chi^2$ estimator (SCE) as well as if was done 
110: earlier in Ref.~\cite{Alekhin:1995ij}.
111: 
112: \section{THE SIMPLEST $\chi^2$ ESTIMATOR}
113: 
114:  To illustrate our method of the statistical properties analysis
115: we start from the analysis of uncorrelated measurements. In this case,
116: if the data sample $\{y_i\}$ is supposed to be
117: explicitly described by a theoretical model $t_i=f_i(\theta^0)$,
118: \begin{equation}
119: y_i=t_i+\mu_i \sigma_i,
120: \label{UNCSET}
121: \end{equation}
122: where $\mu_i$ are independent random variables, $\sigma_i$ are 
123: statistical errors,
124: $i=1\ldots N$, $N$ is the total number of points in the sample. 
125: We adopt that theoretical model parameter $\theta^0$ is scalar,
126: the generalization of the 
127: formula on the case of vector parameter is evident.
128: If $y_i$ are obtained in the counting experiment with 
129: the large number of events, $\mu_i$ are Gaussian distributed,
130: although it is not crucial for our consideration.
131: As a rule the values of $\sigma_i$ 
132: given in the experimental publications, are the estimators of the
133: $y_i$ standard deviations, i.e. are random variables,
134: but we neglect their fluctuations.
135: The SCE is based on the minimization of functional
136: \begin{equation}
137: \chi^2(\theta)=\sum_{i=1}^{N} \frac{(f_i(\theta)-y_i)^2}{\sigma_i^2}
138: \label{SIMCHI}
139: \end{equation}
140: or, equivalently, solution of the equation
141: \begin{equation}
142: \xi(\theta)\equiv\frac{1}{2}\frac{\partial\chi^2}{\partial\theta}=0.
143: \label{BASEQN}
144: \end{equation}
145: The solution $\hat\theta$ is the estimator of
146: parameter $\theta$, which is the random variable depending on $\{y_i\}$.
147: To investigate statistical properties of $\hat\theta$ we expand 
148: the function $\xi(\theta)$
149: around $\theta^0$ and then apply Legendre inversion to 
150: obtain the series for $\hat\theta$ (see Ref.~\cite{JAMES} 
151: for the details of method).
152: Introducing
153: \begin{displaymath}
154: X=\xi(\theta^0),~~~~~~
155: a=-\left\langle\frac{\partial\xi(\theta^0)}{\partial\theta}\right\rangle ,
156: \end{displaymath}
157: \begin{displaymath}
158: b=\left\langle\frac{\partial^2\xi(\theta^0)}{\partial\theta^2}\right\rangle
159: ,~~~~~Y=\frac{\partial\xi(\theta^0)}{\partial\theta}
160: -\left\langle\frac{\partial\xi(\theta^0)}{\partial\theta}\right\rangle,
161: \end{displaymath}
162: one can obtain
163: \begin{equation}
164: \hat\theta-\theta^0=\frac{X}{a}+\frac{X Y}{a^2}+\frac{b X^2}{2a^3}+\ldots
165: \label{GENBIAS}
166: \end{equation}
167: where $<~>$ means averaging over the samples
168: and the rejected part of the expansion  contains the terms with the higher
169: powers of $1/a$ and/or $X$ and $Y$.
170: In this approximation the dispersion of $\hat\theta$ is 
171: \begin{displaymath}
172: D(\hat\theta)=\frac{\left\langle X^2\right\rangle}{a^2}
173: \end{displaymath}
174: and the bias is 
175: \begin{displaymath}
176: B(\hat\theta)=\frac{\left\langle X\right\rangle}{a}+
177: \frac{\left\langle X Y\right\rangle}{a^2}
178: +\frac{b\left\langle X^2\right\rangle}{2a^3}.
179: \end{displaymath}
180: For the SCE applied to the sample (\ref{UNCSET})
181: one can easily obtain
182: \begin{displaymath}
183: \left\langle X\right\rangle=0,
184: \end{displaymath}
185: \begin{displaymath}
186: \left\langle X^2\right\rangle=-a=\sum_{i=1}^{N} 
187: \frac{\left[f_i'(\theta_0)\right]^2}{\sigma_i^2},
188: \end{displaymath}
189: \begin{equation}
190: \left\langle X Y\right\rangle=\frac{b}{3}=\sum_{i=1}^{N} 
191: \frac{f_i'(\theta_0)f_i''(\theta_0)}{\sigma_i^2},
192: \label{eqn:ab0}
193: \end{equation}
194: where $f_i'(\theta)$ is the derivative on $\theta$.
195: The dispersion and the bias of this estimator are
196: \begin{equation}
197: D_0^{\rm U}(\hat\theta)=-\frac{1}{a},~~~~~~
198: B_0^{\rm U}(\hat\theta)=-\frac{b}{6a^2}.
199: \label{DISPS}
200: \end{equation}
201: 
202: If $f_i(\theta)$ are the linear functions of $\theta$ the
203: series (\ref{GENBIAS}) is truncated 
204: and equation (\ref{BASEQN}) can be solved exactly. 
205: One can see that in this case the estimator bias vanishes.
206: For a non-linear data model the expansion (\ref{GENBIAS})  
207: contains an infinite number of terms, but
208: the contributions from the highest terms are
209: proportional to the powers of $D(\hat\theta$) and/or 
210: to the central moments of $y_i$ higher than the second.
211: These contributions are progressively suppressed comparing with the main terms 
212: if the data statistics rises. Here and through the paper we neglect the 
213: contribution from the high 
214: moments of $y_i$. Remind that the same approximation  
215: is used in deducing of the central limit theorem of statistics.
216: This approach can 
217: be also used to justify the analysis of a nonlinear 
218: data model: The above formula can be applied to 
219: the data model with a ``weak nonlinearity'', i.e. if its nonlinearity 
220: is not significant on the scale of the parameter standard deviation.
221: 
222:  Now let the sample to have a common
223: additive systematic error. In accordance with the Bayesian approach
224: to the treatment of systematic errors the measured values are given by  
225: \begin{equation}
226: y_i=t_i+\mu_i \sigma_i+\lambda s_i,
227: \label{CADDSET}
228: \end{equation}
229: where $s_i$ are systematic shifts for every point and $\lambda$
230: is the random variable with zero average and unity dispersion\footnote
231: {Emphasize, that $\lambda$ is not necessary Gaussian distributed.}.
232: Consider the case of one source of systematic error, generalization
233: on the many sources case is straightforward.
234: For the sample (\ref{CADDSET}) we loose statistical independence of
235: measurements and
236: with the account of the their correlations the relevant expression
237: for the dispersion and bias are more complicated
238: \begin{displaymath}
239: \left\langle X^2\right\rangle=\sum_{i,j=1}^{N} 
240: \frac{C_{ij}}{\sigma_i^2 \sigma_j^2}f_i'(\theta_0)f_j'(\theta_0)
241: =-a+\left[\sum_{i=1}^N \frac{s_i}{\sigma_i^2}
242: f_i'(\theta_0)\right]^2,
243: \end{displaymath}
244: \begin{displaymath}
245: \left\langle X Y\right\rangle=\sum_{i,j=1}^{N} 
246: \frac{C_{ij}}{\sigma_i^2 \sigma_j^2}f_i'(\theta_0)f_j''(\theta_0)
247: \end{displaymath}
248: \begin{displaymath}
249: =\frac{b}{3}+\left[\sum_{i=1}^N \frac{s_i}{\sigma_i^2}
250: f_i'(\theta_0)\right]
251: \left[\sum_{i=1}^N \frac{s_i}{\sigma_i^2}
252: f_i''(\theta_0)\right],
253: \end{displaymath}
254: where $a$ and $b$ are given by Eqn.~(\ref{eqn:ab0}),
255: $C_{ij}$ is the covariance matrix for $\{y_i\}$
256: \begin{equation}
257: C_{ij} =  s_i s_j+\delta_{ij}\sigma_i\sigma_j,
258: \label{COVA}
259: \end{equation}
260: and $\delta_{ij}$ is Kronecker symbol. Expressions for $a$ and $b$
261: are the same as for the uncorrelated data case.
262: In terms of the $N$-component vectors
263: \begin{displaymath}
264: \rho_i=\frac{s_i}{\sigma_i},~~~~~
265: \phi_1^i=\frac{f_i'(\theta_0)}{\sigma_i},~~~~
266: \phi_2^i=\frac{f_i''(\theta_0)}{\sigma_i}
267: \end{displaymath}
268: the dispersion and the bias in this case can be expressed as
269: \begin{equation}
270: D_0^{\rm A}(\hat\theta)=\frac{1}{\phi_1^2}\left(1+\rho^2 z^2_1\right),
271: \label{DISPSA}
272: \end{equation}
273: \begin{equation}
274: B_0^{\rm A}(\hat\theta)=-\frac{\phi_2}
275: {2\phi_1^3}\left[\Bigl(1+\frac{3}{2}\rho^2 z^2_1\Bigr)z_{12} - 
276: \rho^2 z_1 z_2\right],
277: \label{BIASSA}
278: \end{equation}
279: where $\rho, \phi_1, \phi_2$ denote the vectors modulus,
280: $z_1$ is the cosine of angle between $\vec\rho$ and $\vec\phi_1$,
281: $z_2$ -- between $\vec\rho$ and $\vec\phi_2$,
282: $z_{12}$ -- between $\vec\phi_1$ and $\vec\phi_2$.
283: The dispersion of $\hat\theta$ is larger than for uncorrelated data
284: because now it also accounts for the fluctuations due to 
285: systematic errors. As to the bias it remains zero for the linear 
286: model.
287: 
288:  If systematic errors are multiplicative
289: \begin{equation}
290: y_i=(t_i+\mu_i \sigma_i)(1+\lambda \eta_i),
291: \label{CMULSET}
292: \end{equation}
293: where $\eta_i$ quantify the systematic errors. If both statistical and 
294: systematic errors are small comparing with $t_i$
295: $$
296: y_i\approx t_i+\mu_i \sigma_i+\lambda \eta_i t_i,
297: $$
298: the correlation matrix is
299: \begin{equation}
300: C_{ij} =  \eta_i \eta_j t_i t_j+\delta_{ij}\sigma_i\sigma_j,
301: \label{COVM}
302: \end{equation}
303: and the expressions for the bias and dispersion are the
304: same as for the additive systematics case
305: after the substitution $s_i \rightarrow \eta_i t_i$.
306: 
307: The Eqn.~(\ref{DISPSA}) can be split into the parts
308: which correspond to the  
309: statistical and systematic fluctuations. One can see that when 
310: vectors $\vec\rho$ and $\vec\phi_1$ are orthogonal
311: the systematic error on $\hat\theta$ is equal to zero
312: and the total dispersion is suppressed. 
313: Such suppression can be illustrated on the example of 
314: the extraction of asymmetry from 
315: the data with general offset error.
316: Let $f_i(\theta)=\theta x_i$ and both statistical and systematic errors are 
317: constant through the sample: $s_i=s$, $\sigma_i=\sigma$.
318: Then $\rho_i=s/\sigma$, $\phi_1^i=x_i/\sigma$ and 
319: $z_1\sim \sum x_i$. If the positive and negative values of $x_i$
320: compensate each other in 
321: the measurements, $z_1=0$ and  the systematic error vanishes.
322: The appropriate data filtration can also be used to suppress the dispersion 
323: (\ref{DISPSA}). To clarify the mechanism of this suppression let us trace the 
324: effect of a separate data point on the dispersion value. 
325: Add to the data set a point with statistical error $\sigma_0$, 
326: systematic error $s_0$ and the data model $f_0(\theta)$. If the initial 
327: data set is large and the systematic error is comparable with statistics, i.e.
328: $$
329: N \gg 1,~~~~~~~~~~~~~~\rho \gg 1,
330: $$
331: \begin{equation}
332: \phi_1 \gg \frac{f_0'(\theta_0)}{\sigma_0},~~~~~~~~~~  
333: \rho\phi_1 z_1\gg \frac{s_0}{\sigma_0^2}f_0'(\theta_0), 
334: \label{eqn:largeset}
335: \end{equation}
336: the change of $D_0^{\rm A}(\hat\theta)$ after adding the new point is
337: \begin{equation}
338: \Delta D_0^{\rm A}(\hat\theta)\approx\frac{2\rho}{\phi_1^3}
339: \frac{1}{\sigma_0^2}\left[z_1 s_0f_0'(\theta_0)
340: -\frac{\rho z_1^2}{\phi_1}\left[f_0'(\theta_0)\right]^2\right].
341: \label{eqn:deldisp0}
342: \end{equation}
343: The second term in brackets is always negative and gives 
344: the decrease of dispersion 
345: due to improved statistical precision. At the same time the first term 
346: can be negative or positive, depending on the signs of $z_1$ and $s_0$.
347: Its absolute value can be larger than 
348: the absolute value of the second term and then 
349: $D_0^{\rm A}(\hat\theta)$ can increase or decrease after adding the new point.
350: This is manifestation of inconsistency of the SCE applied to the 
351: correlated data set.
352: The balance between terms of Eqn.~({\ref{eqn:deldisp0})
353:  is defined by the distribution 
354: of $f_i'(\theta_0)/s_i$ and cuts of the tails of this distribution 
355: can decrease the estimator dispersion.
356: 
357: \section{THE COVARIANCE MATRIX ESTIMATOR}
358: 
359: If systematic error is additive and covariance matrix is known
360: a priori and is given by (\ref{COVA}) one can use for the parameter 
361: estimation the following functional minimization 
362: \begin{equation}
363: \chi^2(\theta)=\sum_{i,j=1}^{N} (f_i(\theta)-y_i) E_{ij} (f_j(\theta)-y_j),
364: \label{CORCHI}
365: \end{equation}
366: where $E_{ij}$ is the inverted correlation matrix.
367: This problem can be reduced to the uncorrelated
368: case using the linear transformation of the vector $\{f_i(\theta)-y_i\}$
369: and the estimator is linear for the linear data model.
370: Besides, if statistical and systematics fluctuations obey 
371: the Gaussian distribution,
372: this estimator provides minimal dispersion due to the Cramer-Rao
373: inequality. 
374: 
375: One can easily derive the expressions necessary to calculate the
376: estimator bias and dispersion
377: \begin{displaymath}
378: \left\langle X\right\rangle=0,
379: \end{displaymath}
380: \begin{displaymath}
381: \left\langle X^2\right\rangle=-a=\sum_{i,j=1}^{N}
382: f_i'(\theta_0)E_{ij}f_j'(\theta_0),
383: \end{displaymath}
384: \begin{displaymath}
385: \left\langle X Y\right\rangle=\frac{b}{3}=\sum_{i,j=1}^{N}
386: f_i'(\theta_0)E_{ij}f_j''(\theta_0).
387: \end{displaymath}
388: Substituting in the above relations the explicit expression for $E_{ij}$
389: \begin{displaymath}
390: E_{ij}=\frac{1}{\sigma_i \sigma_j}
391: \Bigl(\delta_{ij} -
392: \frac{\rho_i\rho_j}{1+\rho^2}\Bigr)
393: \end{displaymath}
394: we obtain the estimator dispersion
395: \begin{equation}
396: D_{\rm M}^{\rm A}(\hat\theta)=\frac{1}{\phi_1^2}
397: \left[1+\frac{\rho^2 z_1^2}{1+\rho^2(1-z_1^2)}\right]
398: =\frac{1}{\phi_1^2}\xi_{\rm M},
399: \label{DISPCA}
400: \end{equation}
401: where $\xi_{\rm M}$ is the ratio of the total dispersion to the 
402: pure statistical one.
403: If $\vec\rho$ and $\vec\phi_1$ are collinear the dispersion of the estimator is
404: \begin{displaymath}
405: D^{A,\parallel}_{\rm M}(\hat\theta)=\frac{1+\rho^2}
406: {\phi_1^2},
407: \end{displaymath}
408: which coincide with the SCE dispersion (\ref{DISPSA}).
409: One can see that if $\vec\rho$ and $\vec\phi_1$ are not collinear
410: the SCE dispersion (\ref{DISPSA})
411: is always larger than the CME dispersion (\ref{DISPCA}).
412: This can be readily explained qualitatively. 
413: For SCE the fitted curve tightly follows the 
414: data points and, if these points are shifted due to the systematic errors
415: fluctuations, the parameter gains appropriate systematic errors. 
416: At the same time, since for the CME the information on the data 
417: correlations is explicitly included in $\chi^2$, the correlated  
418: fluctuation of the data due to systematic shift does not necessary leads to 
419: the fitted curve shift and the parameter deviation gets smaller than 
420: for SCE. 
421: The exclusion occurs if $z_1=0$, when $\vec\rho$ and $\vec\phi_1$ are collinear 
422: and the systematic shift can be perfectly 
423: compensated by the change of parameter.
424: If these vectors are orthogonal the CME dispersions is 
425: \begin{displaymath}
426: D^{A,\perp}_{\rm M}(\hat\theta)=\frac{1}{\phi_1^2}
427: \end{displaymath}
428: i.e. it is just the same as the dispersion of SCE  
429: applied to the data set without correlations (\ref{DISPS}).
430: Qualitatively it corresponds to the measurements scheme when
431: systematic shift for the different points compensate each other,
432: e.g. as in the example considered at the end of Sec.~1.
433: 
434: For the modern experiments systematic errors are often of the same order 
435: as statistical ones and if $N\gg 1$ then $\rho\gg 1$.
436: In this limit and if $\vec\rho$ and $\vec\phi_1$ are not collinear
437: \begin{equation}
438: D_{\rm M}^{\rm A}(\hat\theta)\approx\frac{1}{\phi_1^2(1-z^2_1)}
439: \label{eqn:dispmr}
440: \end{equation}
441: and
442: \begin{equation}
443: D_0^{\rm A}(\hat\theta)\approx\frac{\rho^2 z^2_1}{\phi_1^2}.
444: \label{eqn:disp0r}
445: \end{equation}
446: One can see that in the second case 
447: the estimator standard deviation rises linearly with
448: the increase of the systematics, whereas the CME dispersion saturates. 
449: This difference can be illustrated on the numerical example
450: inspired by the elastic proton-proton scattering. Let us choose
451: $$
452: f_i=U\exp^{(-V x_i)},~~~~x_i=0.1 i,
453: $$
454: where $U=100, V=10, i=1\ldots 9$.
455: Generating 100 data sets (\ref{CADDSET}) with these $f_i$ and
456: \begin{equation}
457: \sigma_i=0.01\sqrt{\frac{U}{f_i}},~~~~s_i=\frac{\kappa}{x_i}
458: \label{eqn:testset}
459: \end{equation}
460: we minimized functionals (\ref{SIMCHI}) and (\ref{CORCHI}) varying
461: $U$ and $V$ to obtain their estimators $\hat U$ and $\hat V$.
462: The values of $(\hat U-U)^2$ and $(\hat V-V)^2$
463: for all of the generated data sets were averaged to obtain the
464: estimators dispersions.
465: The results on the standard deviation of $\hat U$ for different values of
466: $\kappa$ are given in 
467: Fig.~\ref{fig:disp} (the picture for $\hat V$ is similar).
468: One can see that at large $\kappa$ the CME and the SCE standard deviations   
469: differ by factor of 3.
470: 
471: The example of dispersion suppression observed in
472: the analysis of real experimental 
473: data can be found in Ref.~\cite{Alekhin:1995dz}.
474: In this paper we performed the 
475: leading order QCD fit to the inclusive deep inelastic scattering data 
476: of Refs.~\cite{Benvenuti:1989rh,Benvenuti:1990fm}
477: obtained by the BCDMS collaboration 
478: in order to determine the parton distribution functions and the 
479: strong coupling constant value $\alpha_{\rm s}$. The two different estimators
480: were used and the different estimates were obtained. For the 
481: SCE the standard deviation of $\alpha_{\rm s}(M_{\rm Z})$
482: is 0.015, while for the CME it is 0.007.
483: The difference in the gluon distribution bounds for these estimators
484: can is given in Fig.~\ref{fig:bcdms}. One can see that the standard deviation 
485: of the gluon distribution for the CME is also about 
486: a half of the SCE standard deviation.
487: 
488: If $z_1\ne 1$, the change of CME dispersion
489: after adding a new point to the large sample as defined by  
490: Eqn.~(\ref{eqn:largeset}) is 
491: \begin{displaymath}
492: \Delta D_{\rm M}^{\rm A}(\hat\theta)\approx-\frac{1}{\phi_1^4(1-z_1^2)^2}
493: \frac{1}{\sigma_0^2}
494: \left[f_0'(\theta_0)
495: -\frac{\phi_1 z_1}{\rho}s_0\right]^2.
496: \end{displaymath}
497: This change is always negative that proves the CME
498: consistency. Remind, that this is not necessary for the 
499: SCE (see Sec.~1). The same conclusion can be drawn 
500: from the comparison of Eqns.~(\ref{eqn:dispmr})
501: and (\ref{eqn:disp0r}). Indeed, the CME dispersion falls with 
502: increase of statistical significance of the data set (i.e. decrease of 
503: $\sigma$ or rise of $N$) while the SCE dispersion does not.
504: Note, that due to consistency of the CME 
505: the filtration procedure described in Sec.~1 
506: is not meaningful for it.
507: 
508: \begin{figure}[t]
509: \centerline{\psfig{figure=f1.ps,height=7cm}}
510: \caption{The standard deviations of SCE (circles) and CME (squares)
511: for $\hat U$ at different scales of systematic errors $\kappa$.
512: The lines correspond to the calculation performed with 
513: the two-dimensional generalization of
514: Eqns.~(9,16).}
515: %Eqns.~(\ref{DISPCA},\ref{DISPSA}).}
516: \label{fig:disp}
517: \end{figure}
518: 
519: The CME bias is
520: \begin{displaymath}
521: B_{\rm M}^{\rm A}(\hat\theta)=-\frac{\phi_1\phi_2}{2}
522: \left[D_{\rm M}^{\rm A}(\hat\theta)\right]^2\left(z_{12}
523: -\frac{\rho^2}{1+\rho^2}z_1 z_2\right),
524: \end{displaymath}
525: which vanishes for the linear data model and 
526: saturates in the limit of $\rho\gg 1$ contrary to the SCE. 
527: In the numerical example (\ref{eqn:testset}) at $\kappa=0.007$
528: the CME bias is 0.07, whereas the SCE bias is 0.13.
529: 
530: \begin{figure}[t]
531: \centerline{\psfig{figure=bcdms.ps,height=7cm}}
532: \caption{Bounds of gluon distribution obtained from the 
533: LO QCD fit to BCDMS data 
534: with different estimators (the SCE: a; the CME: b).
535: Full lines correspond to the total experimental errors, dashed ones -- to
536: the statistical only.}
537: \label{fig:bcdms}
538: \end{figure}
539: 
540: For the multiplicative systematic errors
541: the covariance matrix in unknown a priori and one is to calculate 
542: it using the parameter estimator. Proceeding this
543: way in the minimization of the functional (\ref{CORCHI}) we get 
544: \begin{equation}
545: a=-\sum_{i,j=1}^{N}f_i'(\theta^0)E_{ij}f_j'(\theta^0)-
546: \frac{1}{2}\sum_{i,j=1}^{N}E_{ij}''C_{ij}.
547: \label{eqn:dispm}
548: \end{equation}
549: The difference with corresponding expression for the case 
550: of additive systematic errors
551: is in the second term of Eqn.~(\ref{eqn:dispm}).
552: For the linear data model this term is
553: $$
554: a^{(2)}=\frac{1}{2}\sum_{i,j=1}^{N}E_{ij}''C_{ij}=
555: \frac{\phi_3^2}{2(1+\rho^2)^2}
556: \left[\rho^4(z^2_3-1)-3\rho^2 z^2_{3}+1\right],
557: $$
558: where
559: \begin{displaymath}
560: \phi_3^i=\rho_i'=\frac{\rho_i}{f_i}f_i'(\theta^0)=\eta_i\phi_1^i,
561: \end{displaymath}
562: $\phi_3$ is modulus of $\vec\phi_3$
563: and $z_{3}$ is the cosine of the angle between $\vec\phi_3$ and $\vec\rho$.
564: The ratio of the second term of Eqn.~(\ref{eqn:dispm}) to the first term
565: $a^{(1)}=\sum f_i'(\theta^0)E_{ij}f_j'(\theta^0)$ is 
566: \begin{equation}
567: \frac{a^{(2)}}{a^{(1)}}=
568: \frac{\phi_3^2}{\phi_1^2}\cdot
569: \frac{\rho^4(z^2_3-1)-3\rho^2 z^2_{3}+1}
570: {(1+\rho^2)^2}\xi_{\rm M}.
571: \label{eqn:dispmult}
572: \end{equation}
573: If $\xi_{\rm M}\sim O(1)$ (that is valid for most real cases),
574: $a^{(2)}\sim O(\eta^2)a^{(1)}$ for all values of $\rho$, i.e. it
575: is suppressed comparing with the first term for small $\eta$.
576: Neglecting as elsewhere the third and fourth central moments of $\{y_i\}$, 
577: one can obtain that $<X^2>\approx -a$ and 
578: the estimator dispersion for multiplicative 
579: systematic errors $D_{\rm M}^{\rm M}\approx D_{\rm M}^{\rm A}$ .
580: 
581: In the case of multiplicative systematics errors Eqn.~(\ref{BASEQN})
582: is nonlinear even for the linear data model.
583: As a consequence, the expressions responsible for the bias
584: $$
585: \left\langle X \right\rangle=
586: \frac{1}{2}\sum_{i,j=1}^{N} E_{ij}' C_{ij},
587: $$
588: $$
589: b=3 \sum_{i,j=1}^{N}
590: f_i'(\theta^0)E_{ij}f_j''(\theta^0)+
591: 3 \sum_{i,j=1}^{N}f_i'(\theta^0) E_{ij}'f_j'(\theta^0)+
592: \frac{1}{2}\sum_{i,j=1}^{N}E_{ij}'''C_{ij},
593: $$
594: \begin{equation}
595: \left\langle X Y\right\rangle=\sum_{i,j=1}^{N}
596: f_i'(\theta^0)E_{ij}f_j''(\theta^0)+
597: 2 \sum_{i,j=1}^{N}
598: f_i'(\theta^0) E_{ij}' f_j'(\theta^0)-
599: \frac{1}{4}\sum_{i,j=1}^{N}E_{ij}''C_{ij}\sum_{i,j=1}^{N} E_{ij}' C_{ij},
600: \label{eqn:biasm}
601: \end{equation}
602: do not vanish even if $f_i''({\theta})$ is equal to zero.
603: Meanwhile the bias due to the estimator nonlinearity is small comparing with 
604: the estimator standard deviation. Since 
605: $1/D_{\rm M}^{\rm M}\approx <X^2>\approx -a$ the bias of estimator with 
606: multiplicative systematic errors is
607: \begin{equation}
608: B_{\rm M}^{\rm M}(\hat\theta)\approx\sqrt{D_{\rm M}^{\rm M}(\hat\theta)}
609: \left[\frac{\left\langle X\right\rangle}{\sqrt{-a}}+
610: \frac{\left\langle X Y\right\rangle-b/2}{(-a)^{3/2}}\right].
611: \label{eqn:biasmult}
612: \end{equation}
613: The first term in the brackets of Eqn.~(\ref{eqn:biasmult}) is
614: \begin{equation}
615: \frac{\left\langle X\right\rangle}{\sqrt{-a}}\approx
616: -\frac{\phi_3}{\phi_1}
617: \frac{\rho z_3}{1+\rho^2}\sqrt{\xi_{\rm M}}\sim O(\eta\sqrt{\xi_{\rm M}}).
618: \label{eqn:biasx}
619: \end{equation}
620: The contribution to the second term in brackets of Eqn.~(\ref{eqn:biasmult})
621: from $\sum f_i'(\theta^0) E_{ij}'f_j'(\theta^0)$ is proportional to 
622: $$  
623: \frac{\phi_3}{\phi_1}\frac{\rho z_1}{(1+\rho^2)} 
624: \left(\frac{\rho^2}{1+\rho^2}z_1 z_3-z_{13}\right)\xi_{\rm M}^{3/2}
625: $$
626: and hence it is $\sim O(\eta\xi_{\rm M}^{3/2})$. As one can conclude from 
627: Eqns.~(\ref{eqn:dispmult},\ref{eqn:biasx}) the contribution 
628: to the same term from 
629: $\sum E_{ij}''C_{ij}\cdot\sum E_{ij}' C_{ij}$ is $O(\eta^3\xi_{\rm M}^{3/2})$. 
630: And finally since 
631: $$
632: \frac{1}{2}
633: \sum_{i,j=1}^N E_{ij}'''C_{ij}=\frac{\rho z_1\phi_3^3}{(1+\rho^3)^2}
634: \left[\rho^4(z_3^2-1)+\rho^2(1-3z_3^2)+2\right]
635: $$
636: the contribution to Eqn.~(\ref{eqn:biasmult}) coming from this term  
637: is $O(\eta^3\xi_{\rm M}^{3/2})$. 
638: In summary, for the linear data model
639: the estimator bias is a sum of terms 
640: $O(\eta^{p}\xi_{M}^{q})D^{\rm M}_{\rm M}$ 
641: with $p\ge 1$ and $q\le 3/2$.  
642: Besides, at small $\rho$ all the four contributions to the bias
643: which survive for the linear data model are 
644: $\sim \rho$ while at large $\rho$
645: they are  $\sim1/\rho$. Summarizing, one can conclude that 
646: the estimator is negligible excluding the extreme cases with very large 
647: $\xi_{\rm M}$.
648: 
649: The explicit estimate of the bias can be obtained from the 
650: Eqns.~(\ref{eqn:biasm},\ref{eqn:biasmult}). 
651: Meanwhile it requires rather lengthy calculations and more 
652: simple tool for the bias evaluation is admirable.
653: A convenient way for this is to trace the net residual
654: \begin{displaymath}
655: R=-\frac{1}{N}\sum_{i=1}^{N}\frac{f_i(\hat\theta)-y_i}
656: {\sqrt{\sigma_i^2+s_i^2}}.
657: \end{displaymath}
658: Expanding $f_i(\theta)$ near $\theta_0$ 
659: and keeping only the first term in Eqn.~(\ref{GENBIAS})
660: one obtains for the sample (\ref{CADDSET})
661: $$
662: R\approx -\frac{1}{N}
663: \sum_{i=1}^{N}\frac{\mu_i+\lambda \rho_i}{\sqrt{1+\rho_i^2}}
664: +(\hat\theta-\theta_0)
665: \frac{1}{N}\sum_{i=1}^{N}\frac{\phi_1^i}{\sqrt{1+\rho_i^2}}.
666: $$
667: If the estimator is unbiased, the value of $R$ averaged over 
668: the samples is equal to zero. Nevertheless the particular values of $R$ 
669: may be not equal to zero due to fluctuations. 
670: For the limited $\xi_{\rm M}$ the dispersion of $R$ is
671: \begin{equation}
672: D(R)=\frac{1}{N^2}\sum_{i,j=1}^{N}\frac{\delta _{ij}+\rho_i\rho_j}
673: {\sqrt{1+\rho_i^2}\sqrt{1+\rho_j^2}}+O(1/N).
674: \label{eqn:Rdisp}
675: \end{equation}
676: If the analyzed data come from a single experiment 
677: with dominating systematics (i.e with $ \rho > 1$) then 
678: $D(R)\sim 1$. In particular for the BCDMS data 
679: of Refs.~\cite{Benvenuti:1989rh,Benvenuti:1990fm} $D(R)\approx 0.7$.
680: For $N_{exp}$ independent experiments involved in the analysis
681: $D(R)\sim 1/N_{exp}$. Comparing the net residual $R$
682: with this value allows to get a guess about the estimator bias.
683: More definite conclusion 
684: can be drawn after the comparison of $R$
685: with its dispersion calculated using Eqn.~(\ref{eqn:Rdisp}).
686: 
687: \section{PLANNING OF THE COUNTING EXPERIMENTS} 
688: 
689: In a particular case when the differential cross section on the
690: variable $x$ is measured, the predicted average 
691: number of events in the $i-$th bin of is 
692: $$
693: \left\langle N_i\right\rangle =Lf_i\Delta x_i\beta_i,
694: $$
695: where $L$ is the integral experiment luminosity,
696: $\beta_i$ is the registration efficiency, and $\Delta x_i$ is 
697: the bin width. Neglecting the fluctuations of $N_i$
698: the statistical error on the $i-$th measurement is 
699: $$
700: \sigma_i=\frac{\sqrt{\left\langle N_i\right\rangle}}{L\Delta x_i\beta_i}
701: $$
702: and 
703: $$
704: \frac{1}{\sigma_i^2}=\frac{L\beta_i}{f_i}\Delta x_i.
705: $$
706: The scalar product of the vectors $\vec \rho$ and $\vec \phi$
707: is 
708: $$ 
709: \left(\vec\rho \cdot \vec\phi\right)=L\sum_{i=1}^{N}\frac{f_i's_i}
710: {f_i}\beta_i\Delta x_i
711: $$
712: and
713: $$ 
714: \phi^2=L\sum_{i=1}^{N}\frac{\left[f_i'\right]^2}
715: {f_i}\beta_i\Delta x_i,~~~~~~
716: \rho^2=L\sum_{i=1}^{N}\frac{\left[s_i\right]^2}
717: {f_i}\beta_i\Delta x_i.
718: $$
719: For the dense measurements these scalars can be  
720: reduced to the integrals over the measurements region $\Omega$: 
721: $$ 
722: \left(\vec\rho \cdot \vec\phi\right)=L\int_{\Omega} f'(x)s(x) d\tilde{x}
723: $$
724: and
725: $$ 
726: \phi^2=L\int_{\Omega}\left[f'(x)\right]^2 d\tilde x,~~~~~~
727: \rho^2=L\int_{\Omega}\left[s(x)\right]^2 d\tilde x,
728: $$
729: where $d\tilde{x}=\beta(x)/f(x) dx$.
730: The latter expressions can be used in 
731: the equations for the estimators dispersions\footnote{As a result 
732: one obtains the Fisher's information for the 
733: correlated data case.}.
734: This approach is convenient for the future experiment optimization 
735: since it allows for to analyze integrated expression  
736: in order to search for the optimal region of measurements. 
737: For the simple functions $f(x)$, $\beta(x)$, and $s(x)$ such 
738: analysis sure can be performed analytically.
739: 
740: \section{CONCLUSION}
741: 
742: In conclusion, the CME is a convenient tool 
743: for the analysis of the data sets with the account of correlations due to 
744: systematic errors. The CME is consistent 
745: for the realistic cases (when systematic errors on the fitted parameters are 
746: not extremely large comparing with the statistical ones)
747: and its dispersion is always smaller, than 
748: the dispersion of the $\chi^2$ estimator without account of correlations.
749: The estimator bias is negligible for the realistic 
750: cases if the covariance matrix is calculated during the fit iteratively 
751: using the parameter estimator itself. Analytical formula for the covariance 
752: matrix inversion allows to perform fast and precise calculations 
753: even for very large data sets. The latter 
754: is especially important in view of numerical instabilities occurring 
755: in the fits to precise data in the case of large correlation between 
756: the fitted parameters (see in this connection 
757: Ref.~\cite{Alekhin:1994}).
758: 
759: A particular attention should be paid on 
760: the connection between the estimator dispersion 
761: and the confidence interval. For a known distribution 
762: of the estimator the confidence interval 
763: can be easily calculated
764: (e.g. it is well known that for the Gaussian distribution 
765: one standard deviation corresponds to the 67\% confidence level).
766: Unfortunately due to the possible non-Gaussian nature of the systematic errors
767: one cannot prove that an estimator accounting for systematics
768: is Gaussian distributed.
769: However for the large number of systematic errors of comparable scale
770: the estimator should obey Gaussian distribution just to the central 
771: limit theorem of statistics. Otherwise the robust estimates of the 
772: confidence intervals, e.g. Chebyshev's inequality, should be used.
773: 
774: {\bf Acknowledgments}
775: 
776: I am indebted to S.Keller for valuable discussions and comments. 
777: The work was supported by RFBR grant 00-02-17432.
778: 
779: \begin{thebibliography}{99}
780: 
781: \bibitem{:2000nr}
782:   [ALEPH, DELPHI, L3, OPAL Collaborations, 
783: SLD Heavy Flavour Group, and Electroweak Group], CERN-EP-2000-016.
784: 
785: \bibitem{Catani:2000jh}
786: S.~Catani {\it et al.},
787: hep-ph/0005025.
788: 
789: \bibitem{D'Agostini:1995fv}
790: G.~D'Agostini, hep-ph/9512295.
791: 
792: \bibitem{Swartz:1994qz}
793: M.~L.~Swartz, hep-ph/9411353.
794: 
795: \bibitem{Gates:1995rq}
796: E.~Gates, L.~M.~Krauss and M.~White,
797: Phys.\ Rev.\  {\bf D51}, 2631 (1995)
798: [hep-ph/9406396].
799: 
800: \bibitem{Seibert:1994sf}
801: D.~Seibert,
802: Phys.\ Rev.\  {\bf D49}, 6240 (1994)
803: [hep-lat/9305014].
804: 
805: \bibitem{Michael:1994yj}
806: C.~Michael,
807: Phys.\ Rev.\  {\bf D49}, 2616 (1994)
808: [hep-lat/9310026].
809: 
810: \bibitem{D'Agostini:1994uj}
811: G.~D'Agostini,
812: Nucl.\ Instrum.\ Meth.\  {\bf A346}, 306 (1994).
813: 
814: \bibitem{Swartz:1996hc}
815: M.~L.~Swartz,
816: Phys.\ Rev.\  {\bf D53}, 5268 (1996)
817: [hep-ph/9509248].
818: 
819: \bibitem{Daniell:1984ea}
820: G.~J.~Daniell, A.~J.~Hey and J.~E.~Mandula,
821: Phys.\ Rev.\  {\bf D30}, 2230 (1984).
822: 
823: \bibitem{Michael:1995sz}
824: C.~Michael and A.~McKerrell,
825: Phys.\ Rev.\  {\bf D51}, 3745 (1995)
826: [hep-lat/9412087].
827: 
828: \bibitem{Alekhin:1995ij}
829: S.~I.~Alekhin,
830: IFVE-95-48.
831: 
832: \bibitem{JAMES}
833:     Eadie W.T., Drijard D., James F.E., Roos M., Sadoulet B.,
834: Statistical Methods in Experimental Physics, North Holland, 1971.
835: 
836: \bibitem{Alekhin:1995dz}
837: S.~I.~Alekhin,
838: IFVE-95-65.
839: 
840: \bibitem{Benvenuti:1989rh}
841: A.~C.~Benvenuti {\it et al.}  [BCDMS Collaboration],
842: Phys.\ Lett.\  {\bf B223} (1989) 485.
843: 
844: \bibitem{Benvenuti:1990fm}
845: A.~C.~Benvenuti {\it et al.}  [BCDMS Collaboration],
846: Phys.\ Lett.\  {\bf B237} (1990) 592.
847: 
848: \bibitem{Alekhin:1994}
849: S.~I.~Alekhin,
850: IFVE-94-70.
851: 
852: \end{thebibliography}
853: 
854: \end{document}
855: