0303:cs0303015/cl2.tex

1: \documentclass[12pt]{article}

2:

3: \textwidth6.25in \textheight8.5in \oddsidemargin.25in

4: \topmargin0in

5:

6: \usepackage{epsfig}

7:

8: %\renewcommand{\baselinestretch}{1.7}

9:

10: \def\be{\begin{equation}}

11: \def\ee{\end{equation}}

12: \def\la{\langle}

13: \def\ra{\rangle}

14: \def\IP{\hbox{\rm I\kern -1.6pt{\rm P}}}

15: \def\IC{{\hbox{\rm C\kern-.58em{\raise.53ex\hbox{$\scriptscriptstyle|$}}

16:     \kern-.55em{\raise.53ex\hbox{$\scriptscriptstyle|$}} }}}

17: \def\IN{\hbox{I\kern-.2em\hbox{N}}}

18: \def\IR{\hbox{\rm I\kern-.2em\hbox{\rm R}}}

19: \def\ZZ{\hbox{{\rm Z}\kern-.3em{\rm Z}}}

20: \def\IT{\hbox{\rm T\kern-.38em{\raise.415ex\hbox{$\scriptstyle|$}} }}

21: %\newtheorem{theorem}{Theorem}[section]

22: \newtheorem{theorem}{Theorem}

23: \newtheorem{lemma}[theorem]{Lemma}

24: \newtheorem{sublemma}[theorem]{Sublemma}

25: \newtheorem{proposition}[theorem]{Proposition}

26: \newtheorem{corollary}[theorem]{Corollary}

27: \newtheorem{remark}[theorem]{Remark}

28:

29: \begin{document}

30:

31: \title{Statistical efficiency of curve fitting algorithms}

32: \author{N. Chernov and C. Lesort\\

33: Department of Mathematics\\

34: University of Alabama at Birmingham\\

35: Birmingham, AL 35294, USA}

36: \date{\today}

37: \maketitle

38:

39: \begin{abstract}

40: We study the problem of fitting parametrized curves to noisy data.

41: Under certain assumptions (known as Cartesian and radial

42: functional models), we derive asymptotic expressions for the bias

43: and the covariance matrix of the parameter estimates. We also

44: extend Kanatani's version of the Cramer-Rao lower bound, which he

45: proved for unbiased estimates only, to more general estimates that

46: include many popular algorithms (most notably, the orthogonal

47: least squares and algebraic fits). We then show that the

48: gradient-weighted algebraic fit is statistically efficient and

49: describe all other statistically efficient algebraic fits.

50: \end{abstract}

51:

52: \begin{center}

53: Keywords: least squares fit, curve fitting, circle fitting,

54: algebraic fit, Rao-Cramer bound, efficiency, functional model.

55: \end{center}

56:

57: \renewcommand{\theequation}{\arabic{section}.\arabic{equation}}

58:

59: \section{Introduction}

60: \label{secI} \setcounter{equation}{0}

61:

62: In many applications one fits a parametrized curve described by an

63: implicit equation $P(x,y;\Theta)=0$ to experimental data $(x_i,y_i)$,

64: $i=1,\ldots,n$. Here $\Theta$ denotes the vector of unknown parameters

65: to be estimated. Typically, $P$ is a polynomial in $x$ and $y$, and its

66: coefficients are unknown parameters (or functions of unknown

67: parameters). For example, a number of recent publications

68: \cite{ARW01,CBH01,GGS94,LM00,Sp97} are devoted to the problem of

69: fitting quadrics $Ax^2+ Bxy+ Cy^2+ Dx+ Ey+ F=0$, in which case

70: $\Theta=(A,B,C,D,E,F)$ is the parameter vector. The problem of fitting

71: circles, given by equation $(x-a)^2+ (y-b)^2 -R^2=0$ with three

72: parameters $a,b,R$, also attracted attention

73: \cite{CO84,Ka98,La87,Sp96}.

74:

75: We consider here the problem of fitting general curves given by

76: implicit equations $P(x,y;\Theta)=0$ with $\Theta= (\theta_1, \ldots,

77: \theta_k)$ being the parameter vector. Our goal is to investigate

78: statistical properties of various fitting algorithms. We are interested

79: in their biasedness, covariance matrices, and the Cramer-Rao lower

80: bound.

81:

82: First, we specify our model. We denote by $\bar{\Theta}$ the true

83: value of $\Theta$. Let $(\bar{x}_{i} ,\bar{y}_{i})$,

84: $i=1,\ldots,n$, be some points lying on the true curve

85: $P(x,y;\bar{\Theta})=0$. Experimentally observed data points

86: $(x_i, y_i)$, $i=1,\ldots,n$, are perceived as random

87: perturbations of the true points $(\bar{x}_{i} ,\bar{y}_{i})$. We

88: use notation ${\bf x}_i = ({x}_i, {y}_i)^T$ and $\bar{\bf x}_i =

89: ({\bar x}_i,\bar{y}_i)^T$, for brevity. The random vectors ${\bf

90: e}_i={\bf x}_i -\bar{\bf x}_{i}$ are assumed to be independent and

91: have zero mean. Two specific assumptions on their probability

92: distribution can be made, see \cite{BC86}:

93: \begin{itemize} \item[] {\em Cartesian model}: Each ${\bf e}_i$

94: is a two-dimensional normal vector with covariance matrix $\sigma^2_i

95: I$, where $I$ is the identity matrix. \item[] {\em Radial model}: ${\bf

96: e}_i = \xi_i {\bf n}_i$ where $\xi_i$ is a normal random variable

97: ${\cal N}(0,\sigma^2_i)$, and ${\bf n}_i$ is a unit normal vector to

98: the curve $P(x,y;\bar{\Theta})=0$ at the point ${\bf x}_i$.

99: \end{itemize} Our analysis covers both models, Cartesian and radial.

100: For simplicity, we assume that $\sigma^2_i=\sigma^2$ for all $i$,

101: but note that our results can be easily generalized to arbitrary

102: $\sigma_i^2>0$.

103:

104: Concerning the true points $\bar{\bf x}_i$, $i=1,\ldots,n$, two

105: assumptions are possible. Many researchers \cite{Ch65,Ka96,Ka98}

106: consider them as fixed, but unknown, points on the true curve. In

107: this case their coordinates $(\bar{x}_{i} ,\bar{y}_{i})$ can be

108: treated as additional parameters of the model (nuisance

109: parameters). Chan \cite{Ch65} and others \cite{An81,BC86} call

110: this assumption a {\em functional model}. Alternatively, one can

111: assume that the true points $\bar{\bf x}_i$ are sampled from the

112: curve $P(x,y ;\bar{\Theta} )=0$ according to some probability

113: distribution on it. This assumption is referred to as a {\em

114: structural model} \cite{An81,BC86}. We only consider the

115: functional model here.

116:

117: It is easy to verify that maximum likelihood estimation of the

118: parameter $\Theta$ for the functional model is given by the

119: orthogonal least squares fit (OLSF), which is based on

120: minimization of the function

121: \be

122:        {\cal F}_1(\Theta) =  \sum_{i=1}^n [d_i(\Theta)]^2

123:          \label{Fmain1}

124: \ee

125: where $d_i(\Theta)$ denotes the distance from the point ${\bf x}_i$ to

126: the curve $P(x,y;\Theta)=0$. The OLSF is the method of choice in

127: practice, especially when one fits simple curves such as lines and

128: circles. However, for more general curves the OLSF becomes intractable,

129: because the precise distance $d_i$ is hard to compute. For example,

130: when $P$ is a generic quadric (ellipse or hyperbola), the computation

131: of $d_i$ is equivalent to solving a polynomial equation of degree four,

132: and its direct solution is known to be numerically unstable, see

133: \cite{ARW01,GGS94} for more detail. Then one resorts to various

134: approximations. It is often convenient to minimize

135: \be

136:        {\cal F}_2(\Theta) =  \sum_{i=1}^n [P(x_i,y_i;\Theta)]^2

137:          \label{Fmain2}

138: \ee

139: instead of (\ref{Fmain1}). This method is referred to as a

140: (simple) {\em algebraic fit} (AF), in this case one calls

141: $|P(x_i,y_i;\Theta)|$ the {\em algebraic distance}

142: \cite{ARW01,CBH01,GGS94} from the point $(x_i,y_i)$ to the curve.

143: The AF is computationally cheaper than the OLSF, but its accuracy

144: is often unacceptable, see below.

145:

146: The simple AF (\ref{Fmain2}) can be generalized to a {\em weighted

147: algebraic fit}, which is based on minimization of

148: \be

149:        {\cal F}_3(\Theta) =  \sum_{i=1}^n w_i\, [P(x_i,y_i;\Theta)]^2

150:          \label{Fmain3}

151: \ee

152: where $w_i=w(x_i,y_i;\Theta)$ are some weights, which may balance

153: (\ref{Fmain2}) and improve its performance. One way to define

154: weights $w_i$ results from a linear approximation to $d_i$:

155: $$

156:    d_i \approx \frac{|P(x_i,y_i;\Theta)|}

157:      {\|\nabla_{\bf x}P(x_i,y_i;\Theta)\|}

158: $$

159: where $\nabla_{\bf x}P=(\partial P/\partial x,\partial P/\partial

160: y)$ is the gradient vector, see \cite{Ta91}. Then one minimizes

161: the function

162: \be

163:        {\cal F}_4(\Theta) =  \sum_{i=1}^n \frac{[P(x_i,y_i;\Theta)]^2}

164:        {\|\nabla_{\bf x}P(x_i,y_i;\Theta)\|^2}

165:          \label{Fmain4}

166: \ee

167: This method is called the {\em gradient weighted algebraic fit} (GRAF).

168: It is a particular case of (\ref{Fmain3}) with $w_i = 1/ \|\nabla_{\bf

169: x}P(x_i,y_i;\Theta)\|^2$.

170:

171: The GRAF is known since at least 1974 \cite{Tu74} and recently

172: became standard for polynomial curve fitting

173: \cite{Ta91,LM00,CBH01}. The computational cost of GRAF depends on

174: the function $P(x,y;\Theta)$, but, generally, the GRAF is much

175: faster than the OLSF. It is also known from practice that the

176: accuracy of GRAF is almost as good as that of the OLSF, and our

177: analysis below confirms this fact. The GRAF is often claimed to be

178: a {\em statistically optimal} weighted algebraic fit, and we will

179: prove this fact as well.

180:

181: Not much has been published on statistical properties of the OLSF and

182: algebraic fits, apart from the simplest case of fitting lines and

183: hyperplanes \cite{Hu97}. Chan \cite{Ch65}, Berman and Culpin

184: \cite{BC86} investigated circle fitting by the OLSF and the simple

185: algebraic fit (\ref{Fmain2}) assuming the structural model. Kanatani

186: \cite{Ka96,Ka98} used the Cartesian functional model and considered a

187: general curve fitting problem. He established an analogue of the

188: Rao-Cramer lower bound for unbiased estimates of $\Theta$, which we

189: call here Kanatani-Cramer-Rao (KCR) lower bound. He also showed that

190: the covariance matrices of the OLSF and the GRAF attain, to the leading

191: order in $\sigma$, his lower bound. We note, however, that in most

192: cases the OLSF and algebraic fits are {\em biased} \cite{BC86,Be89},

193: hence the KCR lower bound, as it is derived in \cite{Ka96,Ka98}, does

194: not immediately apply to these methods.

195:

196: In this paper we extend the KCR lower bound to biased estimates,

197: which include the OLSF and all weighted algebraic fits. We prove

198: the KCR bound for estimates satisfying the following mild

199: assumption:

200: \medskip

201:

202: \noindent{\bf Precision assumption}. For precise observations (when

203: ${\bf x}_i = \bar{\bf x}_i$ for all $1\leq i\leq n$), the estimate

204: $\hat{\Theta}$ is precise, i.e.

205: \be

206:   \hat{\Theta}(\bar{\bf x}_1, \ldots, \bar{\bf x}_n) = \bar{\Theta}

207:        \label{Tass}

208: \ee

209: It is easy to check that the OLSF and algebraic fits

210: (\ref{Fmain3}) satisfy this assumption. We will also show that all

211: unbiased estimates of $\hat{\Theta}$ satisfy (\ref{Tass}).

212:

213: We then prove that the GRAF is, indeed, a statistically efficient

214: fit, in the sense that its covariance matrix attains, to the

215: leading order in $\sigma$, the KCR lower bound. On the other hand,

216: rather surprisingly, we find that GRAF is not the only

217: statistically efficient algebraic fit, and we describe all

218: statistically efficient algebraic fits. Finally, we show that

219: Kanatani's theory and our extension to it remain valid for the

220: radial functional model. Our conclusions are illustrated by

221: numerical experiments on circle fitting algorithms.

222:

223:

224: \section{Kanatani-Cramer-Rao lower bound}

225: \label{secKCR} \setcounter{equation}{0}

226:

227: Recall that we have adopted the functional model, in which the true

228: points $\bar{\bf x}_i$, $1\leq i\leq n$, are fixed. This automatically

229: makes the sample size $n$ fixed, hence, many classical concepts of

230: statistics, such as consistency and asymptotic efficiency (which

231: require taking the limit $n\to\infty$) lose their meaning. It is

232: customary, in the studies of the functional model of the curve fitting

233: problem, to take the limit $\sigma \to 0$ instead of $n\to\infty$, cf.\

234: \cite{Ka96,Ka98}. This is, by the way, not unreasonable from the

235: practical point of view: in many experiments, $n$ is rather small and

236: cannot be (easily) increased, so the limit $n\to \infty$ is of little

237: interest. On the other hand, when the accuracy of experimental

238: observations is high (thus, $\sigma$ is small), the limit $\sigma\to 0$

239: is quite appropriate.

240:

241: Now, let $\hat{\Theta}({\bf x}_1,\ldots,{\bf x}_n)$ be an arbitrary

242: estimate of $\Theta$ satisfying the precision assumption (\ref{Tass}).

243: In our analysis we will always assume that all the underlying functions

244: are regular (continuous, have finite derivatives, etc.), which is a

245: standard assumption \cite{Ka96,Ka98}.

246:

247: The mean value of the estimate $\hat{\Theta}$ is

248: \be

249:     E(\hat{\Theta}) =

250:   \int\cdots\int \hat{\Theta}({\bf x}_1,\ldots,{\bf x}_n)

251:   \, \prod_{i=1}^n f({\bf x}_i)\,

252:   d{\bf x}_1\cdots d{\bf x}_n

253:     \label{ET}

254: \ee

255: where $f({\bf x}_i)$ is the probability density function for the

256: random point ${\bf x}_i$, as specified by a particular model

257: (Cartesian or radial).

258: %For the Cartesian model

259: %$$

260: %   f({\bf x}_i) = \frac{1}{2\pi\sigma^2}\,

261: %  e^{-\frac{(x_i-\bar{x}_{i})^2 + (y_i-\bar{y}_{i})^2}{2\sigma^2}}

262: %$$

263: %is the normal density function. For the radial model, the integral

264: %variables only vary along the normal vectors ${\bf n}_i$ to the curve

265: %$P(x,y;\bar{\Theta})=0$ at the points $\bar{\bf x}_i$, and the density

266: %function is

267: %$$

268: %   f({\bf x}_i) = \frac{1}{\sqrt{2\pi\sigma^2}}\,

269: %  e^{-\frac{(x_i-\bar{x}_{i})^2 + (y_i-\bar{y}_{i})^2}{2\sigma^2}}

270: %$$

271:

272: We now expand the estimate $\hat{\Theta}({\bf x}_1, \ldots, {\bf

273: x}_n)$ into a Taylor series about the true point $(\bar{\bf x}_1,

274: \ldots,

275: \bar{\bf x}_n)$ remembering (\ref{Tass}):

276: \be

277:    \hat{\Theta}({\bf x}_1, \ldots, {\bf x}_n) =

278:    \bar{\Theta} + \sum_{i=1}^n

279:    \Theta_i \times ({\bf x}_i - \bar{\bf x}_i)

280:    + {\cal O}(\sigma^2)

281:      \label{Texpand}

282: \ee

283: where

284: \be

285:   {\Theta}_i = \nabla_{{\bf x}_i}\hat{\Theta}

286:   (\bar{\bf x}_1, \ldots, \bar{\bf x}_n),

287:   \ \ \ \ \ i=1,\ldots,n

288:     \label{Ti}

289: \ee

290: and $\nabla_{{\bf x}_i}$ stands for the gradient with respect to

291: the variables $x_i,y_i$. In other words, $\Theta_i$ is a $k\times

292: 2$ matrix of partial derivatives of the $k$ components of the

293: function $\hat{\Theta}$ with respect to the two variables $x_i$

294: and $y_i$, and this derivative is taken at the point $(\bar{\bf

295: x}_1, \ldots, \bar{\bf x}_n)$,

296:

297: Substituting the expansion (\ref{Texpand}) into (\ref{ET}) gives

298: \be

299:    E(\hat{\Theta}) = \bar{\Theta} + {\cal O}(\sigma^2)

300:       \label{Tbias}

301: \ee

302: since $E({\bf x}_i - \bar{\bf x}_i)=0$. Hence, the bias of the

303: estimate $\hat{\Theta}$ is of order $\sigma^2$.

304:

305: It easily follows from the expansion (\ref{Texpand}) that the

306: covariance matrix of the estimate $\hat{\Theta}$ is given by

307: $$

308:   {\cal C}_{\hat{\Theta}} = \sum_{i=1}^n

309:   \Theta_i E[({\bf x}_i - \bar{\bf x}_i)({\bf x}_i - \bar{\bf x}_i)^T]

310:   \Theta_i^T + {\cal O}(\sigma^4)

311: $$

312: (it is not hard to see that the cubical terms ${\cal O}(\sigma^3)$

313: vanish because the normal random variables with zero mean also

314: have zero third moment, see also \cite{Ka96}). Now, for the

315: Cartesian model

316: $$

317:      E[({\bf x}_i - \bar{\bf x}_i)({\bf x}_i - \bar{\bf x}_i)^T]

318:      =\sigma^2 I

319: $$

320: and for the radial model

321: $$

322:      E[({\bf x}_i - \bar{\bf x}_i)({\bf x}_i - \bar{\bf x}_i)^T]

323:      =\sigma^2 {\bf n}_i {\bf n}_i^T

324: $$

325: where ${\bf n}_i$ is a unit normal vector to the curve

326: $P(x,y;\bar{\Theta})=0$ at the point $\bar{\bf x}_i$. Then we obtain

327: \be

328:   {\cal C}_{\hat{\Theta}} = \sigma^2 \sum_{i=1}^n

329:   \Theta_i \Lambda_i \Theta_i^T + {\cal O}(\sigma^4)

330:      \label{Csig0}

331: \ee

332: where $\Lambda_i=I$ for the Cartesian model and $\Lambda_i={\bf n}_i

333: {\bf n}_i^T$ for the radial model.

334: \\

335:

336: \noindent{\bf Lemma}. {\em We have $\Theta_i {\bf n}_i {\bf n}_i^T

337: \Theta_i^T = \Theta_i \Theta_i^T$ for each $i=1,\ldots,n$. Hence,

338: for both models, Cartesian and radial, the matrix ${\cal

339: C}_{\hat{\Theta}}$ is given by the same expression:}

340: \be

341:   {\cal C}_{\hat{\Theta}} = \sigma^2 \sum_{i=1}^n

342:   \Theta_i \Theta_i^T + {\cal O}(\sigma^4)

343:      \label{Csig}

344: \ee

345:

346: This lemma is proved in Appendix.

347:

348: Our next goal is now to find a lower bound for the matrix

349: \be

350:       {\cal D}_1:= \sum_{i=1}^n \Theta_i\Theta_i^T

351:         \label{calC1}

352: \ee

353: Following \cite{Ka96,Ka98}, we consider perturbations of the parameter

354: vector $\bar{\Theta} +\delta \Theta$ and the true points $\bar{\bf x}_i

355: + \delta \bar{\bf x}_i$ satisfying two constraints. First, since the

356: true points must belong to the true curve, $P(\bar{\bf

357: x}_i;\bar{\Theta})=0$, we obtain, by the chain rule,

358: \be

359:    \la \nabla_{{\bf x}}\, P(\bar{\bf x}_i;\bar{\Theta}), \delta \bar{\bf x}_i \ra

360:    + \la \nabla_{\Theta} P(\bar{\bf x}_i;\bar{\Theta}), \delta \Theta \ra = 0

361:       \label{Tcon1}

362: \ee

363: where $\la \cdot, \cdot \ra$ stands for the scalar product of vectors.

364: Second, since the identity (\ref{Tass}) holds for all $\Theta$, we get

365: \be

366:    \sum_{i=1}^n

367:    \Theta_i\, \delta \bar{\bf x}_i

368:    = \delta \Theta

369:      \label{Tcon2}

370: \ee

371: by using the notation (\ref{Ti}).

372:

373: Now we need to find a lower bound for the matrix (\ref{calC1})

374: subject to the constraints (\ref{Tcon1}) and (\ref{Tcon2}). That

375: bound follows from a general theorem in linear algebra:

376: \\

377:

378: \noindent{\bf Theorem (Linear Algebra)}. {\em Let $n\geq k\geq 1$ and

379: $m\geq 1$. Suppose $n$ nonzero vectors $u_i\in\IR^m$ and $n$ nonzero

380: vectors $v_i\in\IR^k$ are given, $1\leq i\leq n$. Consider $k\times m$

381: matrices

382: $$

383:         X_i = \frac{v_iu_i^T}{u_i^Tu_i}\

384: $$

385: for $1\leq i\leq n$, and $k\times k$ matrix

386: $$

387:    B = \sum_{i=1}^n X_i X_i^T

388:    = \sum_{i=1}^n \frac{v_iv_i^T}{u_i^Tu_i}

389: $$

390: Assume that the vectors $v_1,\ldots,v_n$ span $\IR^k$ (hence $B$

391: is nonsingular). We say that a set of $n$ matrices

392: $A_1,\ldots,A_n$ (each of size $k\times m$) is {\bf proper} if

393: \be

394:     \sum_{i=1}^n A_i w_i = r

395:       \label{properA1}

396: \ee

397: for any vectors $w_i\in\IR^m$ and $r\in \IR^k$ such that

398: \be

399:    u_i^Tw_i + v_i^Tr = 0

400:       \label{properA2}

401: \ee

402: for all $1\leq i\leq n$. Then for any proper set of matrices

403: $A_1,\ldots,A_n$ the $k\times k$ matrix $D = \sum_{i=1}^n A_iA_i^T$ is

404: bounded from below by $B^{-1}$ in the sense that $D - B^{-1}$ is a

405: positive semidefinite matrix. The equality $D=B^{-1}$ holds if and only

406: if $A_i = - B^{-1} X_i$ for all $i=1,\ldots,n$.}

407: \\

408:

409: This theorem is, probably, known, but we provide a full proof in

410: Appendix, for the sake of completeness.

411:

412: As a direct consequence of the above theorem we obtain the lower

413: bound for our matrix ${\cal D}_1$:

414: \\

415:

416: \noindent{\bf Theorem (Kanatani-Cramer-Rao lower bound)}. {\em We

417: have ${\cal D}_1\geq{\cal D}_{\min}$, in the sense that ${\cal

418: D}_1 - {\cal D}_{\min}$ is a positive semidefinite matrix, where}

419: \be

420:    {\cal D}_{\min}^{-1} = \sum_{i=1}^n

421:    \frac{(\nabla_{\Theta} P(\bar{\bf x}_i;\Theta))

422:    (\nabla_{\Theta} P(\bar{\bf x}_i;\Theta))^T}

423:    {\|\nabla_{{\bf x}}\, P(\bar{\bf x}_i;\Theta)\|^2}

424:      \label{Dmin}

425: \ee

426:

427:

428: In view of (\ref{Csig}) and (\ref{calC1}), the above theorem says that

429: the lower bound for the covariance matrix ${\cal C}_{\hat{\Theta}}$ is,

430: to the leading order,

431: \be

432:   {\cal C}_{\hat{\Theta}} \geq {\cal C}_{\min}

433:   = \sigma^2 {\cal D}_{\min}

434:     \label{RC}

435: \ee

436: The standard deviations of the components of the estimate

437: $\hat{\Theta}$ are of order $\sigma_{\hat{\Theta}} = {\cal

438: O}(\sigma)$. Therefore, the bias of $\hat{\Theta}$, which is at

439: most of order $\sigma^2$ by (\ref{Tbias}), is infinitesimally

440: small, as $\sigma \to 0$, compared to the standard deviations.

441: This means that the estimates satisfying (\ref{Tass}) are

442: practically unbiased.

443:

444: The bound (\ref{RC}) was first derived by Kanatani

445: \cite{Ka96,Ka98} for the Cartesian functional model and strictly

446: unbiased estimates of $\Theta$, i.e.\ satisfying $E(\hat{\Theta})

447: =\bar{\Theta}$. One can easily derive (\ref{Tass}) from

448: $E(\hat{\Theta}) =\bar{\Theta}$ by taking the limit $\sigma \to

449: 0$, hence our results generalize those of Kanatani.

450:

451:

452: \section{Statistical efficiency of algebraic fits}

453: \label{secSE} \setcounter{equation}{0}

454:

455:

456: Here we derive an explicit formula for the covariance matrix of the

457: weighted algebraic fit (\ref{Fmain3}) and describe the weights $w_i$

458: for which the fit is statistically efficient. For brevity, we write

459: $P_i = P(x_i,y_i;\Theta)$. We assume that the weight function

460: $w(x,y,;\Theta)$ is regular, in particular has bounded derivatives with

461: respect to $\Theta$, the next section will demonstrate the importance

462: of this condition. The solution of the minimization problem

463: (\ref{Fmain3}) satisfies

464: \be

465:    \sum P_i^2 \, \nabla_{\Theta} w_i +

466:    2 \sum w_i \, P_i \, \nabla_{\Theta} P_i = 0

467:       \label{weq}

468: \ee

469: Observe that $P_i = {\cal O} (\sigma)$, so that the first sum in

470: (\ref{weq}) is ${\cal O}(\sigma^2)$ and the second sum is ${\cal

471: O} (\sigma)$. Hence, to the leading order, the solution of

472: (\ref{weq}) can be found by discarding the first sum and solving

473: the reduced equation

474: \be

475:    \sum w_i\, P_i\, \nabla_{\Theta} P_i = 0

476:       \label{weq1}

477: \ee

478: More precisely, if $\hat{\Theta}_1$ and $\hat{\Theta}_2$ are

479: solutions of (\ref{weq}) and (\ref{weq1}), respectively, then

480: $\hat{\Theta}_1 -\bar{\Theta} = {\cal O} (\sigma)$,

481: $\hat{\Theta}_2 -\bar{\Theta} = {\cal O} (\sigma)$, and

482: $\|\hat{\Theta}_1 -\hat{\Theta}_2 \|= {\cal O} (\sigma^2)$.

483: Furthermore, the covariance matrices of $\hat{\Theta}_1$ and

484: $\hat{\Theta}_2$ coincide, to the leading order, i.e.\ ${\cal

485: C}_{\hat{\Theta}_1} {\cal C}_{\hat{\Theta}_2}^{-1} \to I$ as

486: $\sigma \to 0$. Therefore, in what follows, we only deal with the

487: solution of equation (\ref{weq1}).

488:

489: To find the covariance matrix of $\hat{\Theta}$ satisfying

490: (\ref{weq1}) we put $\hat{\Theta} =\bar{\Theta} +\delta \Theta$

491: and ${\bf x}_i = \bar{\bf x}_i + \delta {\bf x}_i$ and obtain,

492: working to the leading order,

493: $$

494:   \sum w_i (\nabla_{\Theta} P_i)

495:   (\nabla_{\Theta} P_i)^T\, (\delta \Theta)

496:     = - \sum w_i (\nabla_{\bf x} P_i)^T \, (\delta {\bf x}_i) \,

497:     (\nabla_{\Theta} P_i) + {\cal O}(\sigma^2)

498: $$

499: hence

500: $$

501:    \delta \Theta  = -

502:    \left [ \sum w_i (\nabla_{\Theta} P_i)

503:    (\nabla_{\Theta} P_i)^T \right ]^{-1}

504:    \left [ \sum w_i (\nabla_{\bf x} P_i)^T \,

505:    (\delta {\bf x}_i)\, (\nabla_{\Theta} P_i)\right ]

506:     + {\cal O}(\sigma^2)

507: $$

508: The covariance matrix is then

509: \begin{eqnarray*}

510:    {\cal C}_{\hat{\Theta}} & = &

511:    E \left [ (\delta \Theta)\, (\delta \Theta)^T \right ]\\

512:    & = & \sigma^2

513:    \left [ \sum w_i (\nabla_{\Theta} P_i)

514:    (\nabla_{\Theta} P_i)^T \right ]^{-1}

515:    \left [ \sum w_i^2 \|\nabla_{\bf x} P_i\|^2

516:    (\nabla_{\Theta} P_i)

517:    (\nabla_{\Theta} P_i)^T \right ]\\

518:    & & \times \left [ \sum w_i (\nabla_{\Theta} P_i)

519:    (\nabla_{\Theta} P_i)^T \right ]^{-1}

520:    + {\cal O}(\sigma^3)

521: \end{eqnarray*}

522: Denote by ${\cal D}_2$ the principal factor here, i.e.\

523: $$

524:    {\cal D}_2 =

525:    \left [ \sum w_i (\nabla_{\Theta} P_i)

526:    (\nabla_{\Theta} P_i)^T \right ]^{-1}

527:    \left [ \sum w_i^2 \|\nabla_{\bf x} P_i\|^2

528:    (\nabla_{\Theta} P_i)

529:    (\nabla_{\Theta} P_i)^T \right ]\,

530:    \left [ \sum w_i (\nabla_{\Theta} P_i)

531:    (\nabla_{\Theta} P_i)^T \right ]^{-1}

532: $$

533: The following theorem establishes a lower bound for ${\cal D}_2$:

534: \\

535:

536: \noindent{\bf Theorem}. {\em We have ${\cal D}_2\geq{\cal

537: D}_{\min}$, in the sense that ${\cal D}_2 - {\cal D}_{\min}$ is a

538: positive semidefinite matrix, where ${\cal D}_{\min}$ is given by

539: (\ref{Dmin}). The equality ${\cal D}_2 ={\cal D}_{\min}$ holds if

540: and only if $w_i = {\rm const}/\|\nabla_{{\bf x}}\, P_i\|^2$ for

541: all $i=1,\ldots,n$. In other words, an algebraic fit

542: (\ref{Fmain3}) is {\bf statistically efficient} if and only if the

543: weight function $w(x,y;\Theta)$ satisfies

544: \be

545:    w(x,y;\Theta) = \frac{c(\Theta)}{\|\nabla_{{\bf x}}\, P(x,y;\Theta)\|^2}

546:       \label{wopt}

547: \ee

548: for all triples $x,y,\Theta$ such that $P(x,y;\Theta)=0$. Here

549: $c(\Theta)$ may be an arbitrary function of $\Theta$.}

550: \\

551:

552: The bound ${\cal D}_2\geq{\cal D}_{\min}$ here is a particular case of

553: the previous theorem. It also can be obtained directly from the linear

554: algebra theorem if one sets $u_i= \nabla_{\bf x} P_i$, $v_i=

555: \nabla_{\Theta} P_i$, and

556: $$

557:    A_i = - w_i\, \left [ \sum_{j=1}^n w_j (\nabla_{\Theta} P_j)

558:    (\nabla_{\Theta} P_j)^T \right ]^{-1}

559:    (\nabla_{\Theta} P_i) \,

560:    (\nabla_{\bf x} P_i)^T

561: $$

562: for $1\leq i\leq n$.

563:

564: The expression (\ref{wopt}) characterizing the efficiency, follows from

565: the last claim in the linear algebra theorem.

566:

567: \section{Circle fit}

568: \label{secCF} \setcounter{equation}{0}

569:

570: Here we illustrate our conclusions by the relatively simple

571: problem of fitting circles. The canonical equation of a circle is

572: \be

573:          (x-a)^2+ (y-b)^2 -R^2=0

574:            \label{circ0}

575: \ee

576: and we need to estimate three parameters $a,b,R$. The simple

577: algebraic fit (\ref{Fmain2}) takes form

578: \be

579:        {\cal F}_2(a,b,R) =

580:        \sum_{i=1}^n [(x_i-a)^2+ (y_i-b)^2 -R^2]^2

581:        \ \  \to\ \  \min

582:          \label{F2}

583: \ee

584: and the weighted algebraic fit (\ref{Fmain3}) takes form

585: \be

586:        {\cal F}_3(a,b,R) =

587:        \sum_{i=1}^n w_i [(x_i-a)^2+ (y_i-b)^2 -R^2]^2

588:        \ \  \to\ \  \min

589:          \label{F3}

590: \ee

591: In particular, the GRAF becomes

592: \be

593:        {\cal F}_4(a,b,R) =

594:        \sum_{i=1}^n \frac{[(x_i-a)^2+ (y_i-b)^2 -R^2]^2}

595:        {(x_i-a)^2+ (y_i-b)^2}

596:        \ \  \to\ \  \min

597:          \label{F4}

598: \ee

599: (where the irrelevant constant factor of 4 in the denominator is

600: dropped).

601:

602: In terms of (\ref{Dmin}), we have

603: $$

604:   \nabla_{\Theta} P(\bar{\bf x}_i;\Theta)

605:   = -2(\bar{x}_i-a,\bar{y}_i-b,R)^T

606: $$

607: and $\nabla_{{\bf x}}\, P(\bar{\bf x}_i;\Theta) =

608: 2(\bar{x}_i-a,\bar{y}_i-b)^T$, hence

609: $$

610:   \|\nabla_{{\bf x}}\, P(\bar{\bf x}_i;\Theta)\|^2 =

611:     4[(\bar{x}_i-a)^2+(\bar{y}_i-b)^2]=4R^2

612: $$

613: Therefore,

614: \be

615:     {\cal D}_{\min} = \left (\begin{array}{ccc}

616:     \sum u_i^2 & \sum u_iv_i & \sum u_i \\

617:     \sum u_iv_i & \sum v_i^2 & \sum v_i \\

618:     \sum u_i & \sum v_i & n \\

619:     \end{array} \right )^{-1}

620:       \label{Dmincir}

621: \ee

622: where we denote, for brevity,

623: $$

624:    u_i=\frac{\bar{x}_i-a}{R},\ \ \ \

625:    v_i=\frac{\bar{y}_i-b}{R}

626: $$

627: The above expression for ${\cal D}_{\min}$ was derived earlier in

628: \cite{CT95,Ka98}.

629:

630: Now, our Theorem in Section~\ref{secSE} shows that the weighted

631: algebraic fit (\ref{F3}) is statistically efficient if and only if

632: the weight function satisfies $w(x,y;a,b,R)=c(a,b,R)/(4R^2)$.

633: Since $c(a,b,R)$ may be an arbitrary function, then the

634: denominator $4R^2$ here is irrelevant. Hence, statistically

635: efficiency is achieved whenever $w(x,y;a,b,R)$ is simply

636: independent of $x$ and $y$ for all $(x,y)$ lying on the circle. In

637: particular, the GRAF (\ref{F4}) is statistically efficient because

638: $w(x,y;a,b,R)=[(x-a)^2+(y-b)^2]^{-1}=R^{-2}$. The simple AF

639: (\ref{F2}) is also statistically efficient since $w(x,y;a,b,R)=1$.

640:

641: We note that the GRAF (\ref{F4}) is a highly nonlinear problem, and in

642: its exact form (\ref{F4}) is not used in practice. Instead, there are

643: two modifications of GRAF popular among experimenters. One is due to

644: Chernov and Ososkov \cite{CO84} and Pratt \cite{Pr87}:

645: \be

646:        {\cal F}_4'(a,b,R) =

647:        R^{-2}\sum_{i=1}^n [(x_i-a)^2+ (y_i-b)^2 -R^2]^2

648:        \ \  \to\ \  \min

649:          \label{F4a}

650: \ee

651: (it is based on the approximation $(x_i-a)^2+ (y_i-b)^2 \approx R^2$),

652: and the other due to Agin \cite{Ag81} and Taubin \cite{Ta91}:

653: \be

654:        {\cal F}_4''(a,b,R) =

655:        \frac{1}{\sum (x_i-a)^2+ (y_i-b)^2}

656:        \sum_{i=1}^n [(x_i-a)^2+ (y_i-b)^2 -R^2]^2

657:        \ \  \to\ \  \min

658:          \label{F4b}

659: \ee

660: (here one simply averages the denominator of (\ref{F4}) over $1\leq

661: i\leq n$). We refer the reader to \cite{CL02} for a detailed analysis

662: of these and other circle fitting algorithms, including their numerical

663: implementations.

664:

665: We have tested experimentally the efficiency of four circle fitting

666: algorithms: the OLSF (\ref{Fmain1}), the simple AF (\ref{F2}), the

667: Pratt method (\ref{F4a}), and the Taubin method (\ref{F4b}). We have

668: generated $n=20$ points equally spaced on a circle, added an isotropic

669: Gaussian noise with variance $\sigma^2$ (according to the Cartesian

670: model), and estimated the efficiency of the estimate of the center by

671: \be

672:    E = \frac{\sigma^2 ({\cal D}_{11}+{\cal D}_{22})}

673:    {\la (\hat{a}-a)^2 + (\hat{b}-b)^2 \ra}

674:       \label{E}

675: \ee

676: Here $(a,b)$ is the true center, $(\hat{a},\hat{b})$ is its estimate,

677: $\la \cdots \ra$ denotes averaging over many random samples, and ${\cal

678: D}_{11}$, ${\cal D}_{22}$ are the first two diagonal entries of the

679: matrix (\ref{Dmincir}). Table~1 shows the efficiency of the above

680: mentioned four algorithms for various values of $\sigma/R$. We see that

681: they all perform very well, and indeed are efficient as $\sigma\to 0$.

682: One might notice that the OLSF slightly outperforms the other methods,

683: and the AF is the second best.

684:

685: \begin{center}

686: \begin{tabular}{||r||c|c|c|c||}

687: \hline\hline $\sigma/R$ & OLSF & AF & Pratt & Taubin \\

688: \hline \hline $<0.01$ & $\sim 1$ & $\sim 1$ & $\sim 1$ & $\sim 1$ \\

689:         \hline 0.01 & 0.999 & 0.999 & 0.999 & 0.999 \\ \hline

690:        0.02 & 0.999 & 0.998 & 0.997 & 0.997 \\ \hline

691:        0.03 & 0.998 & 0.996 & 0.995 & 0.995 \\ \hline

692:        0.05 & 0.996 & 0.992 & 0.987 & 0.987 \\ \hline

693:        0.10 & 0.985 & 0.970 & 0.953 & 0.953 \\ \hline

694:        0.20 & 0.935 & 0.900 & 0.837 & 0.835 \\ \hline

695:        0.30 & 0.825 & 0.824 & 0.701 & 0.692 \\ \hline

696: \hline

697: \end{tabular}\vspace*{0.2cm}

698: \end{center}

699:

700: \begin{center}

701: Table 1. Efficiency of circle fitting algorithms. Data are sampled

702: along a full circle.

703: \end{center}

704:

705: Table~2 shows the efficiency of the same algorithms as the data points

706: are sampled along half a circle, rather than a full circle. Again, the

707: efficiency as $\sigma\to 0$ is clear, but we also make another

708: observation. The AF now consistently falls behind the other methods for

709: all $\sigma/R\leq 0.2$, but for $\sigma/R=0.3$ the others suddenly

710: break down, while the AF keeps afloat.

711:

712: \begin{center}

713: \begin{tabular}{||r||c|c|c|c||}

714: \hline\hline $\sigma/R$ & OLSF & AF & Pratt & Taubin \\

715: \hline \hline $<0.01$ & $\sim 1$ & $\sim 1$ & $\sim 1$ & $\sim 1$ \\

716:         \hline 0.01 & 0.999 & 0.996 & 0.999 & 0.999 \\ \hline

717:        0.02 & 0.997 & 0.983 & 0.997 & 0.997 \\ \hline

718:        0.03 & 0.994 & 0.961 & 0.992 & 0.992 \\ \hline

719:        0.05 & 0.984 & 0.902 & 0.978 & 0.978 \\ \hline

720:        0.10 & 0.935 & 0.720 & 0.916 & 0.916 \\ \hline

721:        0.20 & 0.720 & 0.493 & 0.703 & 0.691 \\ \hline

722:        0.30 & 0.122 & 0.437 & 0.186 & 0.141 \\ \hline

723: \hline

724: \end{tabular}\vspace*{0.2cm}

725: \end{center}

726:

727: \begin{center}

728: Table 2. Efficiency of circle fitting algorithms with data sampled

729: along half a circle.

730: \end{center}

731:

732: The reason of the above turnaround is that at large noise the data

733: points may occasionally line up along a circular arc of a very large

734: radius. Then the OLSF, Pratt and Taubin dutifully return a large circle

735: whose center lies far away, and such fits blow up the denominator of

736: (\ref{E}), a typical effect of large outliers. On the contrary, the AF

737: is notoriously known for its systematic bias toward smaller circles

738: \cite{CO84,GGS94,Pr87}, hence while it is less accurate than other fits

739: for typical random samples, its bias safeguards it from large outliers.

740:

741: This behavior is even more pronounced when the data are sampled along

742: quarter\footnote{All our algorithms are invariant under simple

743: geometric transformations such as translations, rotations and

744: similarities, hence our experimental results do not depend on the

745: choice of the circle, its size, and the part of the circle the data are

746: sampled from.} of a circle (Table~3). We see that the AF is now far

747: worse than the other fits for $\sigma/R<0.1$ but the others

748: characteristically break down at some point ($\sigma/R=0.1$).

749:

750: \begin{center}

751: \begin{tabular}{||r||c|c|c|c||}

752: \hline\hline $\sigma/R$ & OLSF & AF & Pratt & Taubin \\

753: \hline \hline  0.01 & 0.997 & 0.911 & 0.997 & 0.997 \\ \hline

754:        0.02 & 0.977 & 0.722 & 0.978 & 0.978 \\ \hline

755:        0.03 & 0.944 & 0.555 & 0.946 & 0.946 \\ \hline

756:        0.05 & 0.837 & 0.365 & 0.843 & 0.842 \\ \hline

757:        0.10 & 0.155 & 0.275 & 0.163 & 0.158 \\ \hline

758: \hline

759: \end{tabular}\vspace*{0.2cm}

760: \end{center}

761:

762: \begin{center}

763: Table 3. Data are sampled along a quarter of a circle.

764: \end{center}

765:

766: It is interesting to test smaller circular arcs, too. Figure 1

767: shows a color-coded diagram of the efficiency of the OLSF and the

768: AF for arcs from $0^{\rm o}$ to $50^{\rm o}$ and variable $\sigma$

769: (we set $\sigma=ch$, where $h$ is the height of the circular arc,

770: see Fig.~2, and $c$ varies from 0 to 0.5). The efficiency of the

771: Pratt and Taubin is virtually identical to that of the OLSF, so it

772: is not shown here. We see that the OLSF and AF are efficient as

773: $\sigma\to 0$ (both squares in the diagram get white at the

774: bottom), but the AF loses its efficiency at moderate levels of

775: noise ($c>0.1$), while the OLSF remains accurate up to $c=0.3$

776: after which it rather sharply breaks down.

777:

778: \vspace*{10mm} \centerline{\epsffile{PrattD.eps}$\ \ \ \ $

779: \epsffile{AlgD.eps} $\ \ \ \ $ \epsffile{Bar.eps}}

780:

781: \begin{center}

782: Figure 1: The efficiency of the simple OLSF (left) and the AF (center).

783: The bar on the right explains color codes.

784: \end{center} \vspace*{5mm}

785:

786: The following analysis sheds more light on the behavior of the

787: circle fitting algorithms. When the curvature of the arc

788: decreases, the center coordinates $a,b$ and the radius $R$ grow to

789: infinity and their estimates become highly unreliable. In that

790: case the circle equation (\ref{circ0}) can be converted to a more

791: convenient algebraic form

792: \be

793:      A(x^2+y^2) + Bx + Cy + D = 0

794:        \label{ABCD}

795: \ee

796: with an additional constrain on the parameters: $B^2+C^2-4AD = 1$. This

797: parametrization was used in \cite{Pr87,GGS94}, and analyzed in detail

798: in \cite{CL02}. We note that the original parameters can be recovered

799: via $a=-B/2A$, $b=-C/2A$, and $R=(2\,|A|)^{-1}$. The new

800: parametrization (\ref{ABCD}) is safe to use for arcs with arbitrary

801: small curvature: the parameters $A,B,C,D$ remain bounded and never

802: develop singularities, see \cite{CL02}. Even as the curvature vanishes,

803: we simply get $A=0$, and the equation (\ref{ABCD}) represents a line

804: $Bx+Cy+D=0$.

805:

806: \vspace*{5mm} \centerline{\epsffile{cl2-02.eps}}

807:

808: \begin{center}

809: Figure 2: The height of an arc, $h$, and our formula for $\sigma$.

810: \end{center} \vspace*{5mm}

811:

812: In terms of the new parameters $A,B,C,D$, the weighted algebraic fit

813: (\ref{Fmain3}) takes form

814: \be

815:        {\cal F}_3(A,B,C,D) =

816:        \sum_{i=1}^n w_i [A(x^2+y^2) + Bx + Cy + D]^2

817:        \ \  \to\ \  \min

818:          \label{FF3}

819: \ee

820: (under the constraint $B^2+C^2-4AD = 1$). Converting the AF (\ref{F2})

821: to the new parameters gives

822: \be

823:        {\cal F}_2(A,B,C,D) =

824:        \sum_{i=1}^n A^{-2} [A(x^2+y^2) + Bx + Cy + D]^2

825:        \ \  \to\ \  \min

826:          \label{FF2}

827: \ee

828: which corresponds to the weight function $w=1/A^2$. The Pratt method

829: (\ref{F4a}) turns to

830: \be

831:        {\cal F}_4(A,B,C,D) =

832:        \sum_{i=1}^n [A(x^2+y^2) + Bx + Cy + D]^2

833:        \ \  \to\ \  \min

834:          \label{FF4}

835: \ee

836: We now see why the AF is unstable and inaccurate for arcs with

837: small curvature: its weight function $w=1/A^2$ develops a

838: singularity (it explodes) in the limit $A\to 0$. Recall that, in

839: our derivation of the statistical efficiency theorem (Section~3),

840: we assumed that the weight function was regular (had bounded

841: derivatives). This assumption is clearly violated by the AF

842: (\ref{FF2}). On the contrary, the Pratt fit (\ref{FF4}) uses a

843: safe choice $w=1$ and thus behaves decently on arcs with small

844: curvature, see next.

845:

846: \vspace*{10mm} \centerline{\epsffile{AlgA.eps}$\ \ \ \ $

847: \epsffile{PrattA.eps} $\ \ \ \ $ \epsffile{Bar.eps}}

848:

849: \begin{center}

850: Figure 3: The efficiency of the simple AF (left) and the Pratt

851: method (center). The bar on the right explains color codes.

852: \end{center} \vspace*{5mm}

853:

854: Figure 3 shows a color-coded diagram of the efficiency of the estimate

855: of the parameter\footnote{Note that $|A|=1/2R$, hence the estimation of

856: $A$ is equivalent to that of the curvature, an important geometric

857: parameter of the arc.} $A$ by the AF (\ref{FF2}) versus Pratt

858: (\ref{FF4}) for arcs from $0^{\rm o}$ to $50^{\rm o}$ and the noise

859: level $\sigma=ch$, where $h$ is the height of the circular arc and $c$

860: varies from 0 to 0.5. The efficiency of the OLSF and the Taubin method

861: is visually indistinguishable from that of Pratt (the central square in

862: Fig.~3), so we did not include it here.

863:

864: We see that the AF performs significantly worse than the Pratt

865: method for all arcs and most of the values of $c$ (i.e.,

866: $\sigma$). The Pratt's efficiency is close 100\%, its lowest point

867: is 89\% for $50^{\rm o}$ arcs and $c=0.5$ (the top right corner of

868: the central square barely gets grey). The AF's efficiency is below

869: 10\% for all $c>0.2$ and almost zero for $c>0.4$. Still, the AF

870: remains efficient as $\sigma\to 0$ (as the tiny white strip at the

871: bottom of the left square proves), but its efficiency can be only

872: counted on when $\sigma$ is extremely small.

873:

874: Our analysis demonstrates that the choice of the weights $w_i$ in the

875: weighted algebraic fit (\ref{Fmain3}) should be made according to our

876: theorem in Section~3, and, in addition, one should avoid singularities

877: in the domain of parameters.

878:

879:

880: \renewcommand{\theequation}{A.\arabic{equation}}

881:

882: \section*{Appendix}

883: \label{secA} \setcounter{equation}{0}

884:

885: Here we prove the theorem of linear algebra stated in

886: Section~\ref{secKCR}. For the sake of clarity, we divide our proof into

887: small lemmas:

888: \medskip

889:

890: \noindent{\bf Lemma 1}. {\em The matrix $B$ is indeed nonsingular}.

891:

892: {\em Proof}. If $Bz=0$ for some nonzero vector $z\in \IR^k$, then

893: $0 = z^TBz = \sum_{i=1}^n (v_i^Tz)^2/\|u_i\|^2$, hence $v_i^Tz=0$

894: for all $1\leq i\leq k$, a contradiction.

895: \medskip

896:

897: \noindent{\bf Lemma 2}. {\em If a set of $n$ matrices $A_1,\ldots,A_n$

898: is proper, then rank$(A_i)\leq 1$. Furthermore, each $A_i$ is given by

899: $A_i = z_iu_i^T$ for some vector $z_i\in \IR^k$, and the vectors

900: $z_1,\ldots,z_n$ satisfy $\sum_{i=1}^n z_iv_i^T = -I$ where $I$ is the

901: $k\times k$ identity matrix. The converse is also true.}

902:

903: {\em Proof}. Let vectors $w_1,\ldots,w_n$ and $r$ satisfy the

904: requirements (\ref{properA1}) and (\ref{properA2}) of the theorem.

905: Consider the orthogonal decomposition $w_i = c_iu_i + w_i^\perp$ where

906: $w_i^\perp$ is perpendicular to $u_i$, i.e.\ $u_i^Tw_i^\perp = 0$. Then

907: the constraint (\ref{properA2}) can be rewritten as

908: \be

909:     c_i = -\frac{v_i^Tr}{u_i^Tu_i}

910:       \label{properA3}

911: \ee

912: for all $i=1,\ldots,n$ and (\ref{properA1}) takes form

913: \be

914:     \sum_{i=1}^n c_iA_iu_i + \sum_{i=1}^n A_iw_i^\perp = r

915:       \label{properA4}

916: \ee

917: We conclude that $A_iw_i^\perp = 0$ for every vector $w_i^\perp$

918: orthogonal to $u_i$, hence $A_i$ has a $(k-1)$-dimensional kernel, so

919: indeed its rank is zero or one. If we denote $z_i = A_iu_i/ \|u_i\|^2$,

920: we obtain $A_i=z_iu_i^T$. Combining this with

921: (\ref{properA3})-(\ref{properA4}) gives

922: $$

923:    r = - \sum_{i=1}^n (v_i^Tr)z_i =

924:    - \left (\sum_{i=1}^n z_iv_i^T\right )\, r

925: $$

926: Since this identity holds for any vector $r\in \IR^k$, the expression

927: within parentheses is $-I$. The converse is obtained by straightforward

928: calculations. Lemma is proved. \medskip

929:

930: \noindent{\bf Corollary}. {\em Let ${\bf n}_i = u_i/\|u_i\|$. Then

931: $A_i{\bf n}_i{\bf n}_i^TA_i = A_iA_i^T$ for each $i$}.

932: \medskip

933:

934: This corollary implies our lemma stated in Section~\ref{secKCR}. We now

935: continue the proof of the theorem.\medskip

936:

937: \noindent{\bf Lemma 3}. {\em The sets of proper matrices make a linear

938: variety, in the following sense. Let $A_1',\ldots,A_n'$ and

939: $A_1'',\ldots,A_n''$ be two proper sets of matrices, then the set

940: $A_1,\ldots,A_n$ defined by $A_i = A_i' + c(A_i''- A_i')$ is proper for

941: every $c\in\IR$.}

942:

943: {\em Proof}. According to the previous lemma, $A_i'=z_i'u_i^T$ and

944: $A_i''=z_i''u_i^T$ for some vectors $z_i',z_i''$, $1\leq i\leq n$.

945: Therefore, $A_i=z_iu_i^T$ for $z_i= z_i' + c(z_i''- z_i')$. Lastly,

946: $$

947:    \sum_{i=1}^n z_iv_i^T = \sum_{i=1}^n z_i'v_i^T

948:    +c\sum_{i=1}^n z_i''v_i^T - c\sum_{i=1}^n z_i'v_i^T = - I

949: $$

950: Lemma is proved.

951: \medskip

952:

953: \noindent{\bf Lemma 4}. {\em If a set of $n$ matrices $A_1,\ldots,A_n$

954: is proper, then $\sum_{i=1}^n A_iX_i^T = -I$, where $I$ is the $k\times

955: k$ identity matrix.}

956:

957:

958: {\em Proof}. By using Lemma~2 $\sum_{i=1}^n A_iX_i^T =

959: \sum_{i=1}^n z_iv_i^T = -I$. Lemma is proved.

960: \medskip

961:

962: \noindent{\bf Lemma 5}. {\em We have indeed $D \geq B^{-1}$.}

963:

964: {\em Proof}. For each $i=1,\ldots,n$ consider the $2k\times m$

965: matrix $Y_i = \left (\begin{array}{c} A_i\\X_i\end{array} \right

966: )$. Using the previous lemma gives

967: $$

968:    \sum_{i=1}^n Y_i\,Y_i^T =

969:    \left (\begin{array}{rr} D & -I\\ -I & B

970:    \end{array} \right )

971: $$

972: By construction, this matrix is positive semidefinite. Hence, the

973: following matrix is also positive semidefinite:

974: $$

975:    \left (\begin{array}{rr} I & B^{-1} \\ 0 & B^{-1}

976:    \end{array} \right )

977:    \left (\begin{array}{rr} D & -I\\ -I & B

978:    \end{array} \right )

979:    \left (\begin{array}{cc} I & 0 \\ B^{-1} & B^{-1}

980:    \end{array} \right ) =

981:    \left (\begin{array}{cc} D-B^{-1} & 0\\ 0 & B^{-1}

982:    \end{array} \right )

983: $$

984: By Sylvester's theorem, the matrix $D-B^{-1}$ is positive semidefinite.

985: \medskip

986:

987: \noindent{\bf Lemma 6}. {\em The set of matrices $A_i^{\rm o} = -

988: B^{-1} X_i$ is proper, and for this set we have $D=B^{-1}$.}

989:

990: {\em Proof}. Straightforward calculation.

991: \medskip

992:

993: \noindent{\bf Lemma 7}. {\em If $D=B^{-1}$ for some proper set of

994: matrices $A_1,\ldots,A_n$, then $A_i=A_i^{\rm o}$ for all $1\leq i\leq

995: n$}.

996:

997:

998: {\em Proof}. Assume that there is a proper set of matrices

999: $A_1',\ldots,A_n'$, different from $A_1^{\rm o},\ldots,A_n^{\rm

1000: o}$, for which $D=B^{-1}$. Denote $\delta A_i = A_i'-A_i^{\rm o}$.

1001: By Lemma 3, the set of matrices $A_i(\gamma) = A_i^{\rm o} +

1002: \gamma (\delta A_i)$ is proper for every real $\gamma$. Consider

1003: the variable matrix

1004: \begin{eqnarray*}

1005:    D(\gamma) & = & \sum_{i=1}^n [A_i(\gamma)] [A_i(\gamma)]^T\\

1006:    & = & \sum_{i=1}^n A_i^{\rm o}(A_i^{\rm o})^T

1007:    + \gamma\left (\sum_{i=1}^n A_i^{\rm o}(\delta A_i)^T

1008:    +\sum_{i=1}^n (\delta A_i)(A_i^{\rm o})^T\right )

1009:    +\gamma^2\sum_{i=1}^n (\delta A_i)(\delta A_i)^T

1010: \end{eqnarray*}

1011: Note that the matrix $R = \sum_{i=1}^n A_i^{\rm o}(\delta A_i)^T

1012: +\sum_{i=1}^n (\delta A_i)(A_i^{\rm o})^T$ is symmetric. By

1013: Lemma~5 we have $D(\gamma)\geq B^{-1}$ for all $\gamma$, and by

1014: Lemma~6 we have $D(0)=B^{-1}$. It is then easy to derive that

1015: $R=0$. Next, the matrix $S =\sum_{i=1}^n (\delta A_i)(\delta

1016: A_i)^T$ is symmetric positive semidefinite. Since we assumed that

1017: $D(1)=D(0)=B^{-1}$, it is easy to derive that $S=0$ as well.

1018: Therefore, $\delta A_i = 0$ for every $i=1,\ldots,n$. The theorem

1019: is proved.

1020:

1021: \begin{thebibliography}{99}

1022:

1023: \bibitem{Ag81} G.J. Agin,

1024:     {\em Fitting Ellipses and General Second-Order Curves},

1025:     Carnegi Mellon University, Robotics Institute, Technical Report 81-5, 1981.

1026:

1027: \bibitem{ARW01} S.J. Ahn, W. Rauh, and H.J. Warnecke,

1028:     {\rm Least-squares orthogonal distances fitting of circle,

1029:     sphere, ellipse, hyperbola, and parabola},

1030:     {\em Pattern Recog.}, {\bf 34}, 2001, 2283--2303.

1031:

1032: \bibitem{An81} D. A. Anderson,

1033:     The circular structural model,

1034:     {\em J. R. Statist. Soc. B}, {\bf 27}, 1981, 131--141.

1035:

1036: \bibitem{BC86} M. Berman and D. Culpin,

1037:     The statistical behaviour of some least squares estimators of the centre and radius of a

1038:     circle,  {\em J. R. Statist. Soc. B}, {\bf 48}, 1986, 183--196.

1039:

1040: \bibitem{Be89} M. Berman,

1041:     Large sample bias in least squares estimators of a circular arc center and its

1042:     radius,

1043:     {\em Computer Vision, Graphics and Image Processing}, {\bf 45}, 1989, 126--128.

1044:

1045: \bibitem{Ch65} N. N. Chan,

1046:     On circular functional relationships,

1047:     {\em J. R. Statist. Soc. B}, {\bf 27}, 1965, 45--56.

1048:

1049: \bibitem{CT95} Y. T. Chan and S. M. Thomas,

1050:     {\rm Cramer-Rao Lower Bounds for Estimation of a Circular Arc Center and Its

1051:     Radius},  {\em Graph. Models Image Proc.} {\bf 57}, 1995, 527--532.

1052:

1053: \bibitem{CO84} N. I. Chernov and G. A. Ososkov,

1054:     Effective algorithms for circle fitting,

1055:     {\em Comp. Phys. Comm.} {\bf 33}, 1984, 329--333.

1056:

1057: \bibitem{CL02} N. Chernov and C. Lesort,

1058:     {\rm Fitting circles and lines by least squares: theory and experiment},

1059:     preprint, available at http://www.math.uab.edu/cl/cl1

1060:

1061: \bibitem{CBH01} W. Chojnacki, M.J. Brooks, and A. van den Hengel,

1062:     {\rm Rationalising the renormalisation method of Kanatani},

1063:     {\em J. Math. Imaging \& Vision}, {\bf 14}, 2001, 21--38.

1064:

1065: \bibitem{GGS94} W. Gander, G.H. Golub, and R. Strebel,

1066:     {\rm Least squares fitting of circles and ellipses},

1067:     {\em BIT} {\bf 34}, 1994, 558--578.

1068:

1069: \bibitem{Hu97} {\em Recent advances in total least squares techniques

1070: and errors-in-variables modeling}, Ed. by S. van Huffel, SIAM,

1071: Philadelphia, 1997.

1072:

1073: \bibitem{Ka96} K. Kanatani,

1074:     {\em Statistical Optimization for Geometric Computation: Theory and Practice},

1075:     Elsevier Science, Amsterdam, 1996.

1076:

1077: \bibitem{Ka98} K. Kanatani,

1078:     {\rm Cramer-Rao lower bounds for curve fitting},

1079:     {\em Graph. Models Image Proc.} {\bf 60}, 1998, 93--99.

1080:

1081: \bibitem{La87} U.M. Landau,

1082:     {\rm Estimation of a circular arc center and its radius},

1083:     {\em Computer Vision, Graphics and Image Processing}, {\bf 38} (1987),

1084:     317--326.

1085:

1086: \bibitem{LM00} Y. Leedan and P. Meer,

1087:     {\rm Heteroscedastic regression in computer vision: Problems with bilinear

1088:     constraint},

1089:     {\em Intern. J. Comp. Vision}, {\bf 37}, 2000, 127--150.

1090:

1091: \bibitem{Pr87} V. Pratt,

1092:     {\rm Direct least-squares fitting of algebraic surfaces},

1093:     {\em Computer Graphics} {\bf 21}, 1987, 145--152.

1094:

1095: \bibitem{Sp96} H. Spath,

1096:     {\rm Least-Squares Fitting By Circles},

1097:     {\em Computing}, {\bf 57}, 1996, 179--185.

1098:

1099: \bibitem{Sp97} H. Spath,

1100:     {\rm Orthogonal least squares fitting by conic sections},

1101:     in {\em Recent Advances in Total Least Squares techniques and

1102:     Errors-in-Variables Modeling}, SIAM, 1997, pp. 259--264.

1103:

1104: \bibitem{Ta91} G. Taubin,

1105:     {\rm Estimation Of Planar Curves, Surfaces And Nonplanar

1106:     Space Curves Defined By Implicit Equations,

1107:     With Applications To Edge And Range Image Segmentation},

1108:     {\em IEEE Transactions on Pattern Analysis and Machine

1109:     Intelligence},  {\bf 13}, 1991, 1115--1138.

1110:

1111: \bibitem{Tu74} K. Turner, {\em Computer perception of curved

1112: objects using a television camera}, Ph.D.\ Thesis, Dept.\ of Machine

1113: Intelligence, University of Edinburgh, 1974.

1114:

1115:

1116: \end{thebibliography}

1117:

1118: \end{document}

1119: \end

1120:

1121: \bibitem{Ag81} Agin, G. J., 1981.

1122:     {\em Fitting Ellipses and General Second-Order Curves},

1123:     Carnegi Mellon University, Robotics Institute, Technical Report 81-5.

1124:

1125: \bibitem{ARW01} Ahn, S. J., Rauh, W., and Warnecke, H. J., 2001.

1126:     {\rm Least-squares orthogonal distances fitting of circle,

1127:     sphere, ellipse, hyperbola, and parabola},

1128:     {\em Pattern Recog.}, {\bf 34}, 2283--2303.

1129:

1130: \bibitem{An81} Anderson, D. A., 1981.

1131:     The circular structural model,

1132:     {\em J. R. Statist. Soc. B}, {\bf 27}, 131--141.

1133:

1134: \bibitem{BC86} Berman, M. and Culpin, D., 1986.

1135:     The statistical behaviour of some least squares estimators of the centre and radius of a

1136:     circle,  {\em J. R. Statist. Soc. B}, {\bf 48}, 183--196.

1137:

1138: \bibitem{Be89} Berman, M., 1989.

1139:     Large sample bias in least squares estimators of a circular arc center and its

1140:     radius,  {\em Computer Vision, Graphics and Image Processing}, {\bf 45}, 126--128.

1141:

1142: \bibitem{Ch65} Chan, N. N., 1965.

1143:     On circular functional relationships,

1144:     {\em J. R. Statist. Soc. B}, {\bf 27}, 45--56.

1145:

1146: \bibitem{CT95} Chan, Y. T. and Thomas, S. M., 1995.

1147:     {\rm Cramer-Rao Lower Bounds for Estimation of a Circular Arc Center and Its

1148:     Radius},  {\em Graph. Models Image Proc.} {\bf 57}, 527--532.

1149:

1150: \bibitem{CO84} Chernov, N. I. and Ososkov, G. A., 1984.

1151:     Effective algorithms for circle fitting,

1152:     {\em Comp. Phys. Comm.} {\bf 33}, 329--333.

1153:

1154: \bibitem{CL02} N. Chernov and C. Lesort,

1155:     {\rm Fitting circles and lines by least squares: theory and experiment},

1156:     preprint, available at http://www.math.uab.edu/cl/cl1

1157:

1158: \bibitem{CBH01} Chojnacki, W., Brooks, M. J., and van den Hengel,A., 2001.

1159:     {\rm Rationalising the renormalisation method of Kanatani},

1160:     {\em J. Math. Imaging \& Vision}, {\bf 14}, 21--38.

1161:

1162: \bibitem{GGS94} Gander, W., Golub, G. H., and Strebel, R., 1994.

1163:     {\rm Least squares fitting of circles and ellipses},

1164:     {\em BIT} {\bf 34}, 558--578.

1165:

1166: \bibitem{Hu97} {\em Recent advances in total least squares techniques

1167: and errors-in-variables modeling}, Ed. by S. van Huffel, SIAM,

1168: Philadelphia, 1997.

1169:

1170: \bibitem{Ka96} Kanatani, K., 1996.

1171:     {\em Statistical Optimization for Geometric Computation: Theory and Practice},

1172:     Elsevier Science, Amsterdam.

1173:

1174: \bibitem{Ka98} Kanatani, K., 1998.

1175:     {\rm Cramer-Rao lower bounds for curve fitting},

1176:     {\em Graph. Models Image Proc.} {\bf 60}, 93--99.

1177:

1178: \bibitem{La87} Landau, U. M., 1987.

1179:     {\rm Estimation of a circular arc center and its radius},

1180:     {\em Computer Vision, Graphics and Image Processing}, {\bf 38},

1181:     317--326.

1182:

1183: \bibitem{LM00} Leedan, Y. and Meer, P., 2000.

1184:     {\rm Heteroscedastic regression in computer vision: Problems with bilinear

1185:     constraint},

1186:     {\em Intern. J. Comp. Vision}, {\bf 37}, 127--150.

1187:

1188: \bibitem{Pr87} Pratt, V., 1987.

1189:     {\rm Direct least-squares fitting of algebraic surfaces},

1190:     {\em Computer Graphics} {\bf 21}, 145--152.

1191:

1192: \bibitem{Sp96} Spath, H., 1996.

1193:     {\rm Least-Squares Fitting By Circles},

1194:     {\em Computing}, {\bf 57}, 179--185.

1195:

1196: \bibitem{Sp97} Spath, H., 1997.

1197:     {\rm Orthogonal least squares fitting by conic sections},

1198:     in {\em Recent Advances in Total Least Squares techniques and

1199:     Errors-in-Variables Modeling}, SIAM, 259--264.

1200:

1201: \bibitem{Ta91} Taubin, G., 1991.

1202:     {\rm Estimation Of Planar Curves, Surfaces And Nonplanar

1203:     Space Curves Defined By Implicit Equations,

1204:     With Applications To Edge And Range Image Segmentation},

1205:     {\em IEEE Transactions on Pattern Analysis and Machine

1206:     Intelligence},  {\bf 13}, 1115--1138.

1207:

1208: \bibitem{Tu74} Turner, K., 1974. {\em Computer perception of curved

1209: objects using a television camera}, Ph.D.\ Thesis, Dept.\ of Machine

1210: Intelligence, University of Edinburgh.

1211:

1212: {\em Final remark}. In the theorem, we assumed that the vectors

1213: $v_1,\ldots,v_n$ spanned $\IR^k$. If they do not, then the matrix $B$

1214: will be singular and, furthermore, no proper sets of matrices

1215: $A_1,\ldots,A_n$ would exist in the above sense. However, the theorem

1216: can be modified as follows: first, in the definition of proper sets of

1217: matrices we must require that $r\in\, $span$\, \{v_1,\ldots,v_n\}$

1218: rather than $r\in\IR^k$, and second, the matrix $B^{-1}$ must be

1219: replaced by its generalized (Moore-Penrose) inverse $B^-$. The proof of

1220: the theorem in this case only requires minor changes, which we omit.

1221:

1222: {\em Remark}. Consider the following popular iterative algorithm:

1223: using the $k$-th approximation $\Theta^{(k)}$, one computes the

1224: weight $w_i = w(x_i,y_i;\Theta^{(k)})$, then substitutes $w_i$

1225: into (\ref{Fmain3}) and finds $\Theta^{(k+1)}$ by solving

1226: minimizing ${\cal F}_3(\Theta)$ assuming that the weights $w_i$

1227: are fixed (this often becomes a linear problem in $\Theta$, so it

1228: is easily solvable). If this algorithm converges, i.e.\ if

1229: $\Theta^{(k)}\to\hat{\Theta}$, then the limit $\hat{\Theta}$ is a

1230: solution of (\ref{weq1}). We emphasize that this method solves

1231: (\ref{weq1}) rather than (\ref{weq}). Therefore, the above

1232: procedure fails to minimize the proper objective function

1233: (\ref{Fmain3}). But the resulting error is negligibly small, as

1234: $\sigma \to 0$. This error does not alter the principal term of

1235: the covariance matrix of the solution $\hat{\Theta}$, hence it

1236: does not affect the statistical behavior of $\hat{\Theta}$. In

1237: practice, one often uses the above iterative procedure for

1238: minimizing (\ref{Fmain3}) and ignores the error it involves, see

1239: \cite{Sa82,Ta91}.

1240: