0005:hep-ex0005042/c.tex

1: \documentstyle[12pt,epsfig,amssymb]{article}

2:

3: \voffset=-1in

4: \hoffset=-0.75in

5: \textwidth 6.5in

6: \textheight 9.5in

7:

8: \begin{document}

9:

10: \begin{center}

11:

12: {\Large \bf Statistical properties of the estimator using covariance matrix}

13:

14: \vspace{0.1in}

15:

16: {\bf S.I. Alekhin}

17:

18: \vspace{0.1in}

19: {\baselineskip=14pt Institute for High Energy Physics, 142284, Protvino, Russia}

20:

21: \begin{abstract}

22: The statistical properties of estimator using covariance matrix

23: for the account of point-to-point correlations due to systematic

24: errors are analyzed. It is shown that the covariance matrix estimator

25: (CME) is consistent

26: for the realistic cases (when systematic errors on the fitted parameters are

27: not extremely large comparing with the statistical ones)

28: and its dispersion is always smaller, than

29: the dispersion of the simplified $\chi^2$ estimator

30: applied to the correlated data.

31: The CME bias is negligible for the realistic

32: cases if the covariance matrix is calculated during the fit iteratively

33: using the parameter estimator itself. Analytical formula for the covariance

34: matrix inversion allows to perform fast and precise calculations

35: even for very large data sets. All this allows for efficient use

36: of the CME in the global fits.

37: \end{abstract}

38: \end{center}

39: \newpage

40:

41: \section*{INTRODUCTION}

42:

43:  Modern particle physics development becomes more and more based

44: on the analysis

45: of precise experimental data. This demands refining of all stages

46: of the data inference including the account of

47: correlations due to systematic uncertainties

48: which are often comparable or even larger than the statistical ones.

49: In particular this problem is important for the

50: precise tests of Standard Model

51: and determination of the parton distributions \cite{:2000nr,Catani:2000jh}.

52: Many authors for the sake of simplicity very often use

53: approaches which ignore point-to-point correlations due to systematic

54: errors, i.e. sum all errors in quadrature or drop systematics

55: at all. It is evident that if the systematic errors are important source of

56: the data uncertainty such approaches can lead to the

57: distortion of the estimated errors on the fitted parameters.

58: At the same time the construction of estimators accounting for the

59: correlations is not straightforward since

60: the competitive probabilistic model of data can be used in the analysis.

61: Essentially two generic models are possible: One based on the frequentist

62: treatment of systematic shifts and another one based on the Bayesian

63: approach. This paper is concentrated on the analysis of statistical

64: properties of the estimators within the Bayesian treatment of

65: systematic errors. An introduction

66: into this scope given in Ref.~\cite{D'Agostini:1995fv} contains

67: argumentation in favor of this approach. The only point

68: that we would like in particular underline here is that

69: the Bayesian treatment is the only constructive way in

70: the case of many sources of systematics when classical treatment which implies

71: introduction of additional parameter for every source of systematic errors

72: can cause great problem with the

73: interpretation/representation of the function of the large number of arguments.

74:

75: The natural way to account for point-to-point correlations due to

76: systematic errors within Bayesian approach is to use

77: covariance matrix associated with systematic errors

78: (see e.g. Refs.~\cite{Swartz:1994qz,Gates:1995rq}).

79: Meanwhile, there are concerns that the covariance matrix estimator (CME)

80: can result in biased values of the parameters values and their dispersions (see

81: Refs.~\cite{Seibert:1994sf,Michael:1994yj,D'Agostini:1994uj,Swartz:1996hc}).

82: In this connection it worth to recall

83: that the estimators accounting for the data correlations

84: often exhibit poor statistical properties

85: regardless they use covariance matrix or not.

86: For example as it was shown in Ref.~\cite{Daniell:1984ea}

87: the sample dispersion estimated from the

88: the correlated Monte Carlo data sets can

89: acquire the bias equal to the dispersion value itself\footnote{This

90: effect is connected with the

91: well known fact that the sample dispersion gives biased

92: estimation of the studied distribution dispersion;

93: the correlations merely amplify this bias.}.

94: At the same time the estimators would be unbiased if the

95: covariance matrix is not evaluated from the

96: measurements itself. Indeed, the unbiased estimator

97: for the correlated Monte-Carlo data

98: was constructed in Ref.~\cite{Michael:1995sz}

99: using the modeled covariance matrix.

100:

101: Running this way, one can hope to construct the unbiased estimators

102: accounting for systematic errors through covariance matrix,

103: but to be aware of its unbiasness the study of their properties is needed.

104: In view of lack of the comprehensive information on this scope in

105: literature, this paper is devoted to the analysis of the

106: statistical properties of such estimators with a particular attention paid

107: on the control of the bias. Through the paper the CME

108: properties are compared with the properties of the

109: simplest $\chi^2$ estimator (SCE) as well as if was done

110: earlier in Ref.~\cite{Alekhin:1995ij}.

111:

112: \section{THE SIMPLEST $\chi^2$ ESTIMATOR}

113:

114:  To illustrate our method of the statistical properties analysis

115: we start from the analysis of uncorrelated measurements. In this case,

116: if the data sample $\{y_i\}$ is supposed to be

117: explicitly described by a theoretical model $t_i=f_i(\theta^0)$,

118: \begin{equation}

119: y_i=t_i+\mu_i \sigma_i,

120: \label{UNCSET}

121: \end{equation}

122: where $\mu_i$ are independent random variables, $\sigma_i$ are

123: statistical errors,

124: $i=1\ldots N$, $N$ is the total number of points in the sample.

125: We adopt that theoretical model parameter $\theta^0$ is scalar,

126: the generalization of the

127: formula on the case of vector parameter is evident.

128: If $y_i$ are obtained in the counting experiment with

129: the large number of events, $\mu_i$ are Gaussian distributed,

130: although it is not crucial for our consideration.

131: As a rule the values of $\sigma_i$

132: given in the experimental publications, are the estimators of the

133: $y_i$ standard deviations, i.e. are random variables,

134: but we neglect their fluctuations.

135: The SCE is based on the minimization of functional

136: \begin{equation}

137: \chi^2(\theta)=\sum_{i=1}^{N} \frac{(f_i(\theta)-y_i)^2}{\sigma_i^2}

138: \label{SIMCHI}

139: \end{equation}

140: or, equivalently, solution of the equation

141: \begin{equation}

142: \xi(\theta)\equiv\frac{1}{2}\frac{\partial\chi^2}{\partial\theta}=0.

143: \label{BASEQN}

144: \end{equation}

145: The solution $\hat\theta$ is the estimator of

146: parameter $\theta$, which is the random variable depending on $\{y_i\}$.

147: To investigate statistical properties of $\hat\theta$ we expand

148: the function $\xi(\theta)$

149: around $\theta^0$ and then apply Legendre inversion to

150: obtain the series for $\hat\theta$ (see Ref.~\cite{JAMES}

151: for the details of method).

152: Introducing

153: \begin{displaymath}

154: X=\xi(\theta^0),~~~~~~

155: a=-\left\langle\frac{\partial\xi(\theta^0)}{\partial\theta}\right\rangle ,

156: \end{displaymath}

157: \begin{displaymath}

158: b=\left\langle\frac{\partial^2\xi(\theta^0)}{\partial\theta^2}\right\rangle

159: ,~~~~~Y=\frac{\partial\xi(\theta^0)}{\partial\theta}

160: -\left\langle\frac{\partial\xi(\theta^0)}{\partial\theta}\right\rangle,

161: \end{displaymath}

162: one can obtain

163: \begin{equation}

164: \hat\theta-\theta^0=\frac{X}{a}+\frac{X Y}{a^2}+\frac{b X^2}{2a^3}+\ldots

165: \label{GENBIAS}

166: \end{equation}

167: where $<~>$ means averaging over the samples

168: and the rejected part of the expansion  contains the terms with the higher

169: powers of $1/a$ and/or $X$ and $Y$.

170: In this approximation the dispersion of $\hat\theta$ is

171: \begin{displaymath}

172: D(\hat\theta)=\frac{\left\langle X^2\right\rangle}{a^2}

173: \end{displaymath}

174: and the bias is

175: \begin{displaymath}

176: B(\hat\theta)=\frac{\left\langle X\right\rangle}{a}+

177: \frac{\left\langle X Y\right\rangle}{a^2}

178: +\frac{b\left\langle X^2\right\rangle}{2a^3}.

179: \end{displaymath}

180: For the SCE applied to the sample (\ref{UNCSET})

181: one can easily obtain

182: \begin{displaymath}

183: \left\langle X\right\rangle=0,

184: \end{displaymath}

185: \begin{displaymath}

186: \left\langle X^2\right\rangle=-a=\sum_{i=1}^{N}

187: \frac{\left[f_i'(\theta_0)\right]^2}{\sigma_i^2},

188: \end{displaymath}

189: \begin{equation}

190: \left\langle X Y\right\rangle=\frac{b}{3}=\sum_{i=1}^{N}

191: \frac{f_i'(\theta_0)f_i''(\theta_0)}{\sigma_i^2},

192: \label{eqn:ab0}

193: \end{equation}

194: where $f_i'(\theta)$ is the derivative on $\theta$.

195: The dispersion and the bias of this estimator are

196: \begin{equation}

197: D_0^{\rm U}(\hat\theta)=-\frac{1}{a},~~~~~~

198: B_0^{\rm U}(\hat\theta)=-\frac{b}{6a^2}.

199: \label{DISPS}

200: \end{equation}

201:

202: If $f_i(\theta)$ are the linear functions of $\theta$ the

203: series (\ref{GENBIAS}) is truncated

204: and equation (\ref{BASEQN}) can be solved exactly.

205: One can see that in this case the estimator bias vanishes.

206: For a non-linear data model the expansion (\ref{GENBIAS})

207: contains an infinite number of terms, but

208: the contributions from the highest terms are

209: proportional to the powers of $D(\hat\theta$) and/or

210: to the central moments of $y_i$ higher than the second.

211: These contributions are progressively suppressed comparing with the main terms

212: if the data statistics rises. Here and through the paper we neglect the

213: contribution from the high

214: moments of $y_i$. Remind that the same approximation

215: is used in deducing of the central limit theorem of statistics.

216: This approach can

217: be also used to justify the analysis of a nonlinear

218: data model: The above formula can be applied to

219: the data model with a ``weak nonlinearity'', i.e. if its nonlinearity

220: is not significant on the scale of the parameter standard deviation.

221:

222:  Now let the sample to have a common

223: additive systematic error. In accordance with the Bayesian approach

224: to the treatment of systematic errors the measured values are given by

225: \begin{equation}

226: y_i=t_i+\mu_i \sigma_i+\lambda s_i,

227: \label{CADDSET}

228: \end{equation}

229: where $s_i$ are systematic shifts for every point and $\lambda$

230: is the random variable with zero average and unity dispersion\footnote

231: {Emphasize, that $\lambda$ is not necessary Gaussian distributed.}.

232: Consider the case of one source of systematic error, generalization

233: on the many sources case is straightforward.

234: For the sample (\ref{CADDSET}) we loose statistical independence of

235: measurements and

236: with the account of the their correlations the relevant expression

237: for the dispersion and bias are more complicated

238: \begin{displaymath}

239: \left\langle X^2\right\rangle=\sum_{i,j=1}^{N}

240: \frac{C_{ij}}{\sigma_i^2 \sigma_j^2}f_i'(\theta_0)f_j'(\theta_0)

241: =-a+\left[\sum_{i=1}^N \frac{s_i}{\sigma_i^2}

242: f_i'(\theta_0)\right]^2,

243: \end{displaymath}

244: \begin{displaymath}

245: \left\langle X Y\right\rangle=\sum_{i,j=1}^{N}

246: \frac{C_{ij}}{\sigma_i^2 \sigma_j^2}f_i'(\theta_0)f_j''(\theta_0)

247: \end{displaymath}

248: \begin{displaymath}

249: =\frac{b}{3}+\left[\sum_{i=1}^N \frac{s_i}{\sigma_i^2}

250: f_i'(\theta_0)\right]

251: \left[\sum_{i=1}^N \frac{s_i}{\sigma_i^2}

252: f_i''(\theta_0)\right],

253: \end{displaymath}

254: where $a$ and $b$ are given by Eqn.~(\ref{eqn:ab0}),

255: $C_{ij}$ is the covariance matrix for $\{y_i\}$

256: \begin{equation}

257: C_{ij} =  s_i s_j+\delta_{ij}\sigma_i\sigma_j,

258: \label{COVA}

259: \end{equation}

260: and $\delta_{ij}$ is Kronecker symbol. Expressions for $a$ and $b$

261: are the same as for the uncorrelated data case.

262: In terms of the $N$-component vectors

263: \begin{displaymath}

264: \rho_i=\frac{s_i}{\sigma_i},~~~~~

265: \phi_1^i=\frac{f_i'(\theta_0)}{\sigma_i},~~~~

266: \phi_2^i=\frac{f_i''(\theta_0)}{\sigma_i}

267: \end{displaymath}

268: the dispersion and the bias in this case can be expressed as

269: \begin{equation}

270: D_0^{\rm A}(\hat\theta)=\frac{1}{\phi_1^2}\left(1+\rho^2 z^2_1\right),

271: \label{DISPSA}

272: \end{equation}

273: \begin{equation}

274: B_0^{\rm A}(\hat\theta)=-\frac{\phi_2}

275: {2\phi_1^3}\left[\Bigl(1+\frac{3}{2}\rho^2 z^2_1\Bigr)z_{12} -

276: \rho^2 z_1 z_2\right],

277: \label{BIASSA}

278: \end{equation}

279: where $\rho, \phi_1, \phi_2$ denote the vectors modulus,

280: $z_1$ is the cosine of angle between $\vec\rho$ and $\vec\phi_1$,

281: $z_2$ -- between $\vec\rho$ and $\vec\phi_2$,

282: $z_{12}$ -- between $\vec\phi_1$ and $\vec\phi_2$.

283: The dispersion of $\hat\theta$ is larger than for uncorrelated data

284: because now it also accounts for the fluctuations due to

285: systematic errors. As to the bias it remains zero for the linear

286: model.

287:

288:  If systematic errors are multiplicative

289: \begin{equation}

290: y_i=(t_i+\mu_i \sigma_i)(1+\lambda \eta_i),

291: \label{CMULSET}

292: \end{equation}

293: where $\eta_i$ quantify the systematic errors. If both statistical and

294: systematic errors are small comparing with $t_i$

295: $$

296: y_i\approx t_i+\mu_i \sigma_i+\lambda \eta_i t_i,

297: $$

298: the correlation matrix is

299: \begin{equation}

300: C_{ij} =  \eta_i \eta_j t_i t_j+\delta_{ij}\sigma_i\sigma_j,

301: \label{COVM}

302: \end{equation}

303: and the expressions for the bias and dispersion are the

304: same as for the additive systematics case

305: after the substitution $s_i \rightarrow \eta_i t_i$.

306:

307: The Eqn.~(\ref{DISPSA}) can be split into the parts

308: which correspond to the

309: statistical and systematic fluctuations. One can see that when

310: vectors $\vec\rho$ and $\vec\phi_1$ are orthogonal

311: the systematic error on $\hat\theta$ is equal to zero

312: and the total dispersion is suppressed.

313: Such suppression can be illustrated on the example of

314: the extraction of asymmetry from

315: the data with general offset error.

316: Let $f_i(\theta)=\theta x_i$ and both statistical and systematic errors are

317: constant through the sample: $s_i=s$, $\sigma_i=\sigma$.

318: Then $\rho_i=s/\sigma$, $\phi_1^i=x_i/\sigma$ and

319: $z_1\sim \sum x_i$. If the positive and negative values of $x_i$

320: compensate each other in

321: the measurements, $z_1=0$ and  the systematic error vanishes.

322: The appropriate data filtration can also be used to suppress the dispersion

323: (\ref{DISPSA}). To clarify the mechanism of this suppression let us trace the

324: effect of a separate data point on the dispersion value.

325: Add to the data set a point with statistical error $\sigma_0$,

326: systematic error $s_0$ and the data model $f_0(\theta)$. If the initial

327: data set is large and the systematic error is comparable with statistics, i.e.

328: $$

329: N \gg 1,~~~~~~~~~~~~~~\rho \gg 1,

330: $$

331: \begin{equation}

332: \phi_1 \gg \frac{f_0'(\theta_0)}{\sigma_0},~~~~~~~~~~

333: \rho\phi_1 z_1\gg \frac{s_0}{\sigma_0^2}f_0'(\theta_0),

334: \label{eqn:largeset}

335: \end{equation}

336: the change of $D_0^{\rm A}(\hat\theta)$ after adding the new point is

337: \begin{equation}

338: \Delta D_0^{\rm A}(\hat\theta)\approx\frac{2\rho}{\phi_1^3}

339: \frac{1}{\sigma_0^2}\left[z_1 s_0f_0'(\theta_0)

340: -\frac{\rho z_1^2}{\phi_1}\left[f_0'(\theta_0)\right]^2\right].

341: \label{eqn:deldisp0}

342: \end{equation}

343: The second term in brackets is always negative and gives

344: the decrease of dispersion

345: due to improved statistical precision. At the same time the first term

346: can be negative or positive, depending on the signs of $z_1$ and $s_0$.

347: Its absolute value can be larger than

348: the absolute value of the second term and then

349: $D_0^{\rm A}(\hat\theta)$ can increase or decrease after adding the new point.

350: This is manifestation of inconsistency of the SCE applied to the

351: correlated data set.

352: The balance between terms of Eqn.~({\ref{eqn:deldisp0})

353:  is defined by the distribution

354: of $f_i'(\theta_0)/s_i$ and cuts of the tails of this distribution

355: can decrease the estimator dispersion.

356:

357: \section{THE COVARIANCE MATRIX ESTIMATOR}

358:

359: If systematic error is additive and covariance matrix is known

360: a priori and is given by (\ref{COVA}) one can use for the parameter

361: estimation the following functional minimization

362: \begin{equation}

363: \chi^2(\theta)=\sum_{i,j=1}^{N} (f_i(\theta)-y_i) E_{ij} (f_j(\theta)-y_j),

364: \label{CORCHI}

365: \end{equation}

366: where $E_{ij}$ is the inverted correlation matrix.

367: This problem can be reduced to the uncorrelated

368: case using the linear transformation of the vector $\{f_i(\theta)-y_i\}$

369: and the estimator is linear for the linear data model.

370: Besides, if statistical and systematics fluctuations obey

371: the Gaussian distribution,

372: this estimator provides minimal dispersion due to the Cramer-Rao

373: inequality.

374:

375: One can easily derive the expressions necessary to calculate the

376: estimator bias and dispersion

377: \begin{displaymath}

378: \left\langle X\right\rangle=0,

379: \end{displaymath}

380: \begin{displaymath}

381: \left\langle X^2\right\rangle=-a=\sum_{i,j=1}^{N}

382: f_i'(\theta_0)E_{ij}f_j'(\theta_0),

383: \end{displaymath}

384: \begin{displaymath}

385: \left\langle X Y\right\rangle=\frac{b}{3}=\sum_{i,j=1}^{N}

386: f_i'(\theta_0)E_{ij}f_j''(\theta_0).

387: \end{displaymath}

388: Substituting in the above relations the explicit expression for $E_{ij}$

389: \begin{displaymath}

390: E_{ij}=\frac{1}{\sigma_i \sigma_j}

391: \Bigl(\delta_{ij} -

392: \frac{\rho_i\rho_j}{1+\rho^2}\Bigr)

393: \end{displaymath}

394: we obtain the estimator dispersion

395: \begin{equation}

396: D_{\rm M}^{\rm A}(\hat\theta)=\frac{1}{\phi_1^2}

397: \left[1+\frac{\rho^2 z_1^2}{1+\rho^2(1-z_1^2)}\right]

398: =\frac{1}{\phi_1^2}\xi_{\rm M},

399: \label{DISPCA}

400: \end{equation}

401: where $\xi_{\rm M}$ is the ratio of the total dispersion to the

402: pure statistical one.

403: If $\vec\rho$ and $\vec\phi_1$ are collinear the dispersion of the estimator is

404: \begin{displaymath}

405: D^{A,\parallel}_{\rm M}(\hat\theta)=\frac{1+\rho^2}

406: {\phi_1^2},

407: \end{displaymath}

408: which coincide with the SCE dispersion (\ref{DISPSA}).

409: One can see that if $\vec\rho$ and $\vec\phi_1$ are not collinear

410: the SCE dispersion (\ref{DISPSA})

411: is always larger than the CME dispersion (\ref{DISPCA}).

412: This can be readily explained qualitatively.

413: For SCE the fitted curve tightly follows the

414: data points and, if these points are shifted due to the systematic errors

415: fluctuations, the parameter gains appropriate systematic errors.

416: At the same time, since for the CME the information on the data

417: correlations is explicitly included in $\chi^2$, the correlated

418: fluctuation of the data due to systematic shift does not necessary leads to

419: the fitted curve shift and the parameter deviation gets smaller than

420: for SCE.

421: The exclusion occurs if $z_1=0$, when $\vec\rho$ and $\vec\phi_1$ are collinear

422: and the systematic shift can be perfectly

423: compensated by the change of parameter.

424: If these vectors are orthogonal the CME dispersions is

425: \begin{displaymath}

426: D^{A,\perp}_{\rm M}(\hat\theta)=\frac{1}{\phi_1^2}

427: \end{displaymath}

428: i.e. it is just the same as the dispersion of SCE

429: applied to the data set without correlations (\ref{DISPS}).

430: Qualitatively it corresponds to the measurements scheme when

431: systematic shift for the different points compensate each other,

432: e.g. as in the example considered at the end of Sec.~1.

433:

434: For the modern experiments systematic errors are often of the same order

435: as statistical ones and if $N\gg 1$ then $\rho\gg 1$.

436: In this limit and if $\vec\rho$ and $\vec\phi_1$ are not collinear

437: \begin{equation}

438: D_{\rm M}^{\rm A}(\hat\theta)\approx\frac{1}{\phi_1^2(1-z^2_1)}

439: \label{eqn:dispmr}

440: \end{equation}

441: and

442: \begin{equation}

443: D_0^{\rm A}(\hat\theta)\approx\frac{\rho^2 z^2_1}{\phi_1^2}.

444: \label{eqn:disp0r}

445: \end{equation}

446: One can see that in the second case

447: the estimator standard deviation rises linearly with

448: the increase of the systematics, whereas the CME dispersion saturates.

449: This difference can be illustrated on the numerical example

450: inspired by the elastic proton-proton scattering. Let us choose

451: $$

452: f_i=U\exp^{(-V x_i)},~~~~x_i=0.1 i,

453: $$

454: where $U=100, V=10, i=1\ldots 9$.

455: Generating 100 data sets (\ref{CADDSET}) with these $f_i$ and

456: \begin{equation}

457: \sigma_i=0.01\sqrt{\frac{U}{f_i}},~~~~s_i=\frac{\kappa}{x_i}

458: \label{eqn:testset}

459: \end{equation}

460: we minimized functionals (\ref{SIMCHI}) and (\ref{CORCHI}) varying

461: $U$ and $V$ to obtain their estimators $\hat U$ and $\hat V$.

462: The values of $(\hat U-U)^2$ and $(\hat V-V)^2$

463: for all of the generated data sets were averaged to obtain the

464: estimators dispersions.

465: The results on the standard deviation of $\hat U$ for different values of

466: $\kappa$ are given in

467: Fig.~\ref{fig:disp} (the picture for $\hat V$ is similar).

468: One can see that at large $\kappa$ the CME and the SCE standard deviations

469: differ by factor of 3.

470:

471: The example of dispersion suppression observed in

472: the analysis of real experimental

473: data can be found in Ref.~\cite{Alekhin:1995dz}.

474: In this paper we performed the

475: leading order QCD fit to the inclusive deep inelastic scattering data

476: of Refs.~\cite{Benvenuti:1989rh,Benvenuti:1990fm}

477: obtained by the BCDMS collaboration

478: in order to determine the parton distribution functions and the

479: strong coupling constant value $\alpha_{\rm s}$. The two different estimators

480: were used and the different estimates were obtained. For the

481: SCE the standard deviation of $\alpha_{\rm s}(M_{\rm Z})$

482: is 0.015, while for the CME it is 0.007.

483: The difference in the gluon distribution bounds for these estimators

484: can is given in Fig.~\ref{fig:bcdms}. One can see that the standard deviation

485: of the gluon distribution for the CME is also about

486: a half of the SCE standard deviation.

487:

488: If $z_1\ne 1$, the change of CME dispersion

489: after adding a new point to the large sample as defined by

490: Eqn.~(\ref{eqn:largeset}) is

491: \begin{displaymath}

492: \Delta D_{\rm M}^{\rm A}(\hat\theta)\approx-\frac{1}{\phi_1^4(1-z_1^2)^2}

493: \frac{1}{\sigma_0^2}

494: \left[f_0'(\theta_0)

495: -\frac{\phi_1 z_1}{\rho}s_0\right]^2.

496: \end{displaymath}

497: This change is always negative that proves the CME

498: consistency. Remind, that this is not necessary for the

499: SCE (see Sec.~1). The same conclusion can be drawn

500: from the comparison of Eqns.~(\ref{eqn:dispmr})

501: and (\ref{eqn:disp0r}). Indeed, the CME dispersion falls with

502: increase of statistical significance of the data set (i.e. decrease of

503: $\sigma$ or rise of $N$) while the SCE dispersion does not.

504: Note, that due to consistency of the CME

505: the filtration procedure described in Sec.~1

506: is not meaningful for it.

507:

508: \begin{figure}[t]

509: \centerline{\psfig{figure=f1.ps,height=7cm}}

510: \caption{The standard deviations of SCE (circles) and CME (squares)

511: for $\hat U$ at different scales of systematic errors $\kappa$.

512: The lines correspond to the calculation performed with

513: the two-dimensional generalization of

514: Eqns.~(9,16).}

515: %Eqns.~(\ref{DISPCA},\ref{DISPSA}).}

516: \label{fig:disp}

517: \end{figure}

518:

519: The CME bias is

520: \begin{displaymath}

521: B_{\rm M}^{\rm A}(\hat\theta)=-\frac{\phi_1\phi_2}{2}

522: \left[D_{\rm M}^{\rm A}(\hat\theta)\right]^2\left(z_{12}

523: -\frac{\rho^2}{1+\rho^2}z_1 z_2\right),

524: \end{displaymath}

525: which vanishes for the linear data model and

526: saturates in the limit of $\rho\gg 1$ contrary to the SCE.

527: In the numerical example (\ref{eqn:testset}) at $\kappa=0.007$

528: the CME bias is 0.07, whereas the SCE bias is 0.13.

529:

530: \begin{figure}[t]

531: \centerline{\psfig{figure=bcdms.ps,height=7cm}}

532: \caption{Bounds of gluon distribution obtained from the

533: LO QCD fit to BCDMS data

534: with different estimators (the SCE: a; the CME: b).

535: Full lines correspond to the total experimental errors, dashed ones -- to

536: the statistical only.}

537: \label{fig:bcdms}

538: \end{figure}

539:

540: For the multiplicative systematic errors

541: the covariance matrix in unknown a priori and one is to calculate

542: it using the parameter estimator. Proceeding this

543: way in the minimization of the functional (\ref{CORCHI}) we get

544: \begin{equation}

545: a=-\sum_{i,j=1}^{N}f_i'(\theta^0)E_{ij}f_j'(\theta^0)-

546: \frac{1}{2}\sum_{i,j=1}^{N}E_{ij}''C_{ij}.

547: \label{eqn:dispm}

548: \end{equation}

549: The difference with corresponding expression for the case

550: of additive systematic errors

551: is in the second term of Eqn.~(\ref{eqn:dispm}).

552: For the linear data model this term is

553: $$

554: a^{(2)}=\frac{1}{2}\sum_{i,j=1}^{N}E_{ij}''C_{ij}=

555: \frac{\phi_3^2}{2(1+\rho^2)^2}

556: \left[\rho^4(z^2_3-1)-3\rho^2 z^2_{3}+1\right],

557: $$

558: where

559: \begin{displaymath}

560: \phi_3^i=\rho_i'=\frac{\rho_i}{f_i}f_i'(\theta^0)=\eta_i\phi_1^i,

561: \end{displaymath}

562: $\phi_3$ is modulus of $\vec\phi_3$

563: and $z_{3}$ is the cosine of the angle between $\vec\phi_3$ and $\vec\rho$.

564: The ratio of the second term of Eqn.~(\ref{eqn:dispm}) to the first term

565: $a^{(1)}=\sum f_i'(\theta^0)E_{ij}f_j'(\theta^0)$ is

566: \begin{equation}

567: \frac{a^{(2)}}{a^{(1)}}=

568: \frac{\phi_3^2}{\phi_1^2}\cdot

569: \frac{\rho^4(z^2_3-1)-3\rho^2 z^2_{3}+1}

570: {(1+\rho^2)^2}\xi_{\rm M}.

571: \label{eqn:dispmult}

572: \end{equation}

573: If $\xi_{\rm M}\sim O(1)$ (that is valid for most real cases),

574: $a^{(2)}\sim O(\eta^2)a^{(1)}$ for all values of $\rho$, i.e. it

575: is suppressed comparing with the first term for small $\eta$.

576: Neglecting as elsewhere the third and fourth central moments of $\{y_i\}$,

577: one can obtain that $<X^2>\approx -a$ and

578: the estimator dispersion for multiplicative

579: systematic errors $D_{\rm M}^{\rm M}\approx D_{\rm M}^{\rm A}$ .

580:

581: In the case of multiplicative systematics errors Eqn.~(\ref{BASEQN})

582: is nonlinear even for the linear data model.

583: As a consequence, the expressions responsible for the bias

584: $$

585: \left\langle X \right\rangle=

586: \frac{1}{2}\sum_{i,j=1}^{N} E_{ij}' C_{ij},

587: $$

588: $$

589: b=3 \sum_{i,j=1}^{N}

590: f_i'(\theta^0)E_{ij}f_j''(\theta^0)+

591: 3 \sum_{i,j=1}^{N}f_i'(\theta^0) E_{ij}'f_j'(\theta^0)+

592: \frac{1}{2}\sum_{i,j=1}^{N}E_{ij}'''C_{ij},

593: $$

594: \begin{equation}

595: \left\langle X Y\right\rangle=\sum_{i,j=1}^{N}

596: f_i'(\theta^0)E_{ij}f_j''(\theta^0)+

597: 2 \sum_{i,j=1}^{N}

598: f_i'(\theta^0) E_{ij}' f_j'(\theta^0)-

599: \frac{1}{4}\sum_{i,j=1}^{N}E_{ij}''C_{ij}\sum_{i,j=1}^{N} E_{ij}' C_{ij},

600: \label{eqn:biasm}

601: \end{equation}

602: do not vanish even if $f_i''({\theta})$ is equal to zero.

603: Meanwhile the bias due to the estimator nonlinearity is small comparing with

604: the estimator standard deviation. Since

605: $1/D_{\rm M}^{\rm M}\approx <X^2>\approx -a$ the bias of estimator with

606: multiplicative systematic errors is

607: \begin{equation}

608: B_{\rm M}^{\rm M}(\hat\theta)\approx\sqrt{D_{\rm M}^{\rm M}(\hat\theta)}

609: \left[\frac{\left\langle X\right\rangle}{\sqrt{-a}}+

610: \frac{\left\langle X Y\right\rangle-b/2}{(-a)^{3/2}}\right].

611: \label{eqn:biasmult}

612: \end{equation}

613: The first term in the brackets of Eqn.~(\ref{eqn:biasmult}) is

614: \begin{equation}

615: \frac{\left\langle X\right\rangle}{\sqrt{-a}}\approx

616: -\frac{\phi_3}{\phi_1}

617: \frac{\rho z_3}{1+\rho^2}\sqrt{\xi_{\rm M}}\sim O(\eta\sqrt{\xi_{\rm M}}).

618: \label{eqn:biasx}

619: \end{equation}

620: The contribution to the second term in brackets of Eqn.~(\ref{eqn:biasmult})

621: from $\sum f_i'(\theta^0) E_{ij}'f_j'(\theta^0)$ is proportional to

622: $$

623: \frac{\phi_3}{\phi_1}\frac{\rho z_1}{(1+\rho^2)}

624: \left(\frac{\rho^2}{1+\rho^2}z_1 z_3-z_{13}\right)\xi_{\rm M}^{3/2}

625: $$

626: and hence it is $\sim O(\eta\xi_{\rm M}^{3/2})$. As one can conclude from

627: Eqns.~(\ref{eqn:dispmult},\ref{eqn:biasx}) the contribution

628: to the same term from

629: $\sum E_{ij}''C_{ij}\cdot\sum E_{ij}' C_{ij}$ is $O(\eta^3\xi_{\rm M}^{3/2})$.

630: And finally since

631: $$

632: \frac{1}{2}

633: \sum_{i,j=1}^N E_{ij}'''C_{ij}=\frac{\rho z_1\phi_3^3}{(1+\rho^3)^2}

634: \left[\rho^4(z_3^2-1)+\rho^2(1-3z_3^2)+2\right]

635: $$

636: the contribution to Eqn.~(\ref{eqn:biasmult}) coming from this term

637: is $O(\eta^3\xi_{\rm M}^{3/2})$.

638: In summary, for the linear data model

639: the estimator bias is a sum of terms

640: $O(\eta^{p}\xi_{M}^{q})D^{\rm M}_{\rm M}$

641: with $p\ge 1$ and $q\le 3/2$.

642: Besides, at small $\rho$ all the four contributions to the bias

643: which survive for the linear data model are

644: $\sim \rho$ while at large $\rho$

645: they are  $\sim1/\rho$. Summarizing, one can conclude that

646: the estimator is negligible excluding the extreme cases with very large

647: $\xi_{\rm M}$.

648:

649: The explicit estimate of the bias can be obtained from the

650: Eqns.~(\ref{eqn:biasm},\ref{eqn:biasmult}).

651: Meanwhile it requires rather lengthy calculations and more

652: simple tool for the bias evaluation is admirable.

653: A convenient way for this is to trace the net residual

654: \begin{displaymath}

655: R=-\frac{1}{N}\sum_{i=1}^{N}\frac{f_i(\hat\theta)-y_i}

656: {\sqrt{\sigma_i^2+s_i^2}}.

657: \end{displaymath}

658: Expanding $f_i(\theta)$ near $\theta_0$

659: and keeping only the first term in Eqn.~(\ref{GENBIAS})

660: one obtains for the sample (\ref{CADDSET})

661: $$

662: R\approx -\frac{1}{N}

663: \sum_{i=1}^{N}\frac{\mu_i+\lambda \rho_i}{\sqrt{1+\rho_i^2}}

664: +(\hat\theta-\theta_0)

665: \frac{1}{N}\sum_{i=1}^{N}\frac{\phi_1^i}{\sqrt{1+\rho_i^2}}.

666: $$

667: If the estimator is unbiased, the value of $R$ averaged over

668: the samples is equal to zero. Nevertheless the particular values of $R$

669: may be not equal to zero due to fluctuations.

670: For the limited $\xi_{\rm M}$ the dispersion of $R$ is

671: \begin{equation}

672: D(R)=\frac{1}{N^2}\sum_{i,j=1}^{N}\frac{\delta _{ij}+\rho_i\rho_j}

673: {\sqrt{1+\rho_i^2}\sqrt{1+\rho_j^2}}+O(1/N).

674: \label{eqn:Rdisp}

675: \end{equation}

676: If the analyzed data come from a single experiment

677: with dominating systematics (i.e with $ \rho > 1$) then

678: $D(R)\sim 1$. In particular for the BCDMS data

679: of Refs.~\cite{Benvenuti:1989rh,Benvenuti:1990fm} $D(R)\approx 0.7$.

680: For $N_{exp}$ independent experiments involved in the analysis

681: $D(R)\sim 1/N_{exp}$. Comparing the net residual $R$

682: with this value allows to get a guess about the estimator bias.

683: More definite conclusion

684: can be drawn after the comparison of $R$

685: with its dispersion calculated using Eqn.~(\ref{eqn:Rdisp}).

686:

687: \section{PLANNING OF THE COUNTING EXPERIMENTS}

688:

689: In a particular case when the differential cross section on the

690: variable $x$ is measured, the predicted average

691: number of events in the $i-$th bin of is

692: $$

693: \left\langle N_i\right\rangle =Lf_i\Delta x_i\beta_i,

694: $$

695: where $L$ is the integral experiment luminosity,

696: $\beta_i$ is the registration efficiency, and $\Delta x_i$ is

697: the bin width. Neglecting the fluctuations of $N_i$

698: the statistical error on the $i-$th measurement is

699: $$

700: \sigma_i=\frac{\sqrt{\left\langle N_i\right\rangle}}{L\Delta x_i\beta_i}

701: $$

702: and

703: $$

704: \frac{1}{\sigma_i^2}=\frac{L\beta_i}{f_i}\Delta x_i.

705: $$

706: The scalar product of the vectors $\vec \rho$ and $\vec \phi$

707: is

708: $$

709: \left(\vec\rho \cdot \vec\phi\right)=L\sum_{i=1}^{N}\frac{f_i's_i}

710: {f_i}\beta_i\Delta x_i

711: $$

712: and

713: $$

714: \phi^2=L\sum_{i=1}^{N}\frac{\left[f_i'\right]^2}

715: {f_i}\beta_i\Delta x_i,~~~~~~

716: \rho^2=L\sum_{i=1}^{N}\frac{\left[s_i\right]^2}

717: {f_i}\beta_i\Delta x_i.

718: $$

719: For the dense measurements these scalars can be

720: reduced to the integrals over the measurements region $\Omega$:

721: $$

722: \left(\vec\rho \cdot \vec\phi\right)=L\int_{\Omega} f'(x)s(x) d\tilde{x}

723: $$

724: and

725: $$

726: \phi^2=L\int_{\Omega}\left[f'(x)\right]^2 d\tilde x,~~~~~~

727: \rho^2=L\int_{\Omega}\left[s(x)\right]^2 d\tilde x,

728: $$

729: where $d\tilde{x}=\beta(x)/f(x) dx$.

730: The latter expressions can be used in

731: the equations for the estimators dispersions\footnote{As a result

732: one obtains the Fisher's information for the

733: correlated data case.}.

734: This approach is convenient for the future experiment optimization

735: since it allows for to analyze integrated expression

736: in order to search for the optimal region of measurements.

737: For the simple functions $f(x)$, $\beta(x)$, and $s(x)$ such

738: analysis sure can be performed analytically.

739:

740: \section{CONCLUSION}

741:

742: In conclusion, the CME is a convenient tool

743: for the analysis of the data sets with the account of correlations due to

744: systematic errors. The CME is consistent

745: for the realistic cases (when systematic errors on the fitted parameters are

746: not extremely large comparing with the statistical ones)

747: and its dispersion is always smaller, than

748: the dispersion of the $\chi^2$ estimator without account of correlations.

749: The estimator bias is negligible for the realistic

750: cases if the covariance matrix is calculated during the fit iteratively

751: using the parameter estimator itself. Analytical formula for the covariance

752: matrix inversion allows to perform fast and precise calculations

753: even for very large data sets. The latter

754: is especially important in view of numerical instabilities occurring

755: in the fits to precise data in the case of large correlation between

756: the fitted parameters (see in this connection

757: Ref.~\cite{Alekhin:1994}).

758:

759: A particular attention should be paid on

760: the connection between the estimator dispersion

761: and the confidence interval. For a known distribution

762: of the estimator the confidence interval

763: can be easily calculated

764: (e.g. it is well known that for the Gaussian distribution

765: one standard deviation corresponds to the 67\% confidence level).

766: Unfortunately due to the possible non-Gaussian nature of the systematic errors

767: one cannot prove that an estimator accounting for systematics

768: is Gaussian distributed.

769: However for the large number of systematic errors of comparable scale

770: the estimator should obey Gaussian distribution just to the central

771: limit theorem of statistics. Otherwise the robust estimates of the

772: confidence intervals, e.g. Chebyshev's inequality, should be used.

773:

774: {\bf Acknowledgments}

775:

776: I am indebted to S.Keller for valuable discussions and comments.

777: The work was supported by RFBR grant 00-02-17432.

778:

779: \begin{thebibliography}{99}

780:

781: \bibitem{:2000nr}

782:   [ALEPH, DELPHI, L3, OPAL Collaborations,

783: SLD Heavy Flavour Group, and Electroweak Group], CERN-EP-2000-016.

784:

785: \bibitem{Catani:2000jh}

786: S.~Catani {\it et al.},

787: hep-ph/0005025.

788:

789: \bibitem{D'Agostini:1995fv}

790: G.~D'Agostini, hep-ph/9512295.

791:

792: \bibitem{Swartz:1994qz}

793: M.~L.~Swartz, hep-ph/9411353.

794:

795: \bibitem{Gates:1995rq}

796: E.~Gates, L.~M.~Krauss and M.~White,

797: Phys.\ Rev.\  {\bf D51}, 2631 (1995)

798: [hep-ph/9406396].

799:

800: \bibitem{Seibert:1994sf}

801: D.~Seibert,

802: Phys.\ Rev.\  {\bf D49}, 6240 (1994)

803: [hep-lat/9305014].

804:

805: \bibitem{Michael:1994yj}

806: C.~Michael,

807: Phys.\ Rev.\  {\bf D49}, 2616 (1994)

808: [hep-lat/9310026].

809:

810: \bibitem{D'Agostini:1994uj}

811: G.~D'Agostini,

812: Nucl.\ Instrum.\ Meth.\  {\bf A346}, 306 (1994).

813:

814: \bibitem{Swartz:1996hc}

815: M.~L.~Swartz,

816: Phys.\ Rev.\  {\bf D53}, 5268 (1996)

817: [hep-ph/9509248].

818:

819: \bibitem{Daniell:1984ea}

820: G.~J.~Daniell, A.~J.~Hey and J.~E.~Mandula,

821: Phys.\ Rev.\  {\bf D30}, 2230 (1984).

822:

823: \bibitem{Michael:1995sz}

824: C.~Michael and A.~McKerrell,

825: Phys.\ Rev.\  {\bf D51}, 3745 (1995)

826: [hep-lat/9412087].

827:

828: \bibitem{Alekhin:1995ij}

829: S.~I.~Alekhin,

830: IFVE-95-48.

831:

832: \bibitem{JAMES}

833:     Eadie W.T., Drijard D., James F.E., Roos M., Sadoulet B.,

834: Statistical Methods in Experimental Physics, North Holland, 1971.

835:

836: \bibitem{Alekhin:1995dz}

837: S.~I.~Alekhin,

838: IFVE-95-65.

839:

840: \bibitem{Benvenuti:1989rh}

841: A.~C.~Benvenuti {\it et al.}  [BCDMS Collaboration],

842: Phys.\ Lett.\  {\bf B223} (1989) 485.

843:

844: \bibitem{Benvenuti:1990fm}

845: A.~C.~Benvenuti {\it et al.}  [BCDMS Collaboration],

846: Phys.\ Lett.\  {\bf B237} (1990) 592.

847:

848: \bibitem{Alekhin:1994}

849: S.~I.~Alekhin,

850: IFVE-94-70.

851:

852: \end{thebibliography}

853:

854: \end{document}

855: