1: \documentclass[12pt]{article}
2:
3: \textwidth6.25in \textheight8.5in \oddsidemargin.25in
4: \topmargin0in
5:
6: \usepackage{epsfig}
7:
8: %\renewcommand{\baselinestretch}{1.7}
9:
10: \def\be{\begin{equation}}
11: \def\ee{\end{equation}}
12: \def\la{\langle}
13: \def\ra{\rangle}
14: \def\IP{\hbox{\rm I\kern -1.6pt{\rm P}}}
15: \def\IC{{\hbox{\rm C\kern-.58em{\raise.53ex\hbox{$\scriptscriptstyle|$}}
16: \kern-.55em{\raise.53ex\hbox{$\scriptscriptstyle|$}} }}}
17: \def\IN{\hbox{I\kern-.2em\hbox{N}}}
18: \def\IR{\hbox{\rm I\kern-.2em\hbox{\rm R}}}
19: \def\ZZ{\hbox{{\rm Z}\kern-.3em{\rm Z}}}
20: \def\IT{\hbox{\rm T\kern-.38em{\raise.415ex\hbox{$\scriptstyle|$}} }}
21: %\newtheorem{theorem}{Theorem}[section]
22: \newtheorem{theorem}{Theorem}
23: \newtheorem{lemma}[theorem]{Lemma}
24: \newtheorem{sublemma}[theorem]{Sublemma}
25: \newtheorem{proposition}[theorem]{Proposition}
26: \newtheorem{corollary}[theorem]{Corollary}
27: \newtheorem{remark}[theorem]{Remark}
28:
29: \begin{document}
30:
31: \title{Statistical efficiency of curve fitting algorithms}
32: \author{N. Chernov and C. Lesort\\
33: Department of Mathematics\\
34: University of Alabama at Birmingham\\
35: Birmingham, AL 35294, USA}
36: \date{\today}
37: \maketitle
38:
39: \begin{abstract}
40: We study the problem of fitting parametrized curves to noisy data.
41: Under certain assumptions (known as Cartesian and radial
42: functional models), we derive asymptotic expressions for the bias
43: and the covariance matrix of the parameter estimates. We also
44: extend Kanatani's version of the Cramer-Rao lower bound, which he
45: proved for unbiased estimates only, to more general estimates that
46: include many popular algorithms (most notably, the orthogonal
47: least squares and algebraic fits). We then show that the
48: gradient-weighted algebraic fit is statistically efficient and
49: describe all other statistically efficient algebraic fits.
50: \end{abstract}
51:
52: \begin{center}
53: Keywords: least squares fit, curve fitting, circle fitting,
54: algebraic fit, Rao-Cramer bound, efficiency, functional model.
55: \end{center}
56:
57: \renewcommand{\theequation}{\arabic{section}.\arabic{equation}}
58:
59: \section{Introduction}
60: \label{secI} \setcounter{equation}{0}
61:
62: In many applications one fits a parametrized curve described by an
63: implicit equation $P(x,y;\Theta)=0$ to experimental data $(x_i,y_i)$,
64: $i=1,\ldots,n$. Here $\Theta$ denotes the vector of unknown parameters
65: to be estimated. Typically, $P$ is a polynomial in $x$ and $y$, and its
66: coefficients are unknown parameters (or functions of unknown
67: parameters). For example, a number of recent publications
68: \cite{ARW01,CBH01,GGS94,LM00,Sp97} are devoted to the problem of
69: fitting quadrics $Ax^2+ Bxy+ Cy^2+ Dx+ Ey+ F=0$, in which case
70: $\Theta=(A,B,C,D,E,F)$ is the parameter vector. The problem of fitting
71: circles, given by equation $(x-a)^2+ (y-b)^2 -R^2=0$ with three
72: parameters $a,b,R$, also attracted attention
73: \cite{CO84,Ka98,La87,Sp96}.
74:
75: We consider here the problem of fitting general curves given by
76: implicit equations $P(x,y;\Theta)=0$ with $\Theta= (\theta_1, \ldots,
77: \theta_k)$ being the parameter vector. Our goal is to investigate
78: statistical properties of various fitting algorithms. We are interested
79: in their biasedness, covariance matrices, and the Cramer-Rao lower
80: bound.
81:
82: First, we specify our model. We denote by $\bar{\Theta}$ the true
83: value of $\Theta$. Let $(\bar{x}_{i} ,\bar{y}_{i})$,
84: $i=1,\ldots,n$, be some points lying on the true curve
85: $P(x,y;\bar{\Theta})=0$. Experimentally observed data points
86: $(x_i, y_i)$, $i=1,\ldots,n$, are perceived as random
87: perturbations of the true points $(\bar{x}_{i} ,\bar{y}_{i})$. We
88: use notation ${\bf x}_i = ({x}_i, {y}_i)^T$ and $\bar{\bf x}_i =
89: ({\bar x}_i,\bar{y}_i)^T$, for brevity. The random vectors ${\bf
90: e}_i={\bf x}_i -\bar{\bf x}_{i}$ are assumed to be independent and
91: have zero mean. Two specific assumptions on their probability
92: distribution can be made, see \cite{BC86}:
93: \begin{itemize} \item[] {\em Cartesian model}: Each ${\bf e}_i$
94: is a two-dimensional normal vector with covariance matrix $\sigma^2_i
95: I$, where $I$ is the identity matrix. \item[] {\em Radial model}: ${\bf
96: e}_i = \xi_i {\bf n}_i$ where $\xi_i$ is a normal random variable
97: ${\cal N}(0,\sigma^2_i)$, and ${\bf n}_i$ is a unit normal vector to
98: the curve $P(x,y;\bar{\Theta})=0$ at the point ${\bf x}_i$.
99: \end{itemize} Our analysis covers both models, Cartesian and radial.
100: For simplicity, we assume that $\sigma^2_i=\sigma^2$ for all $i$,
101: but note that our results can be easily generalized to arbitrary
102: $\sigma_i^2>0$.
103:
104: Concerning the true points $\bar{\bf x}_i$, $i=1,\ldots,n$, two
105: assumptions are possible. Many researchers \cite{Ch65,Ka96,Ka98}
106: consider them as fixed, but unknown, points on the true curve. In
107: this case their coordinates $(\bar{x}_{i} ,\bar{y}_{i})$ can be
108: treated as additional parameters of the model (nuisance
109: parameters). Chan \cite{Ch65} and others \cite{An81,BC86} call
110: this assumption a {\em functional model}. Alternatively, one can
111: assume that the true points $\bar{\bf x}_i$ are sampled from the
112: curve $P(x,y ;\bar{\Theta} )=0$ according to some probability
113: distribution on it. This assumption is referred to as a {\em
114: structural model} \cite{An81,BC86}. We only consider the
115: functional model here.
116:
117: It is easy to verify that maximum likelihood estimation of the
118: parameter $\Theta$ for the functional model is given by the
119: orthogonal least squares fit (OLSF), which is based on
120: minimization of the function
121: \be
122: {\cal F}_1(\Theta) = \sum_{i=1}^n [d_i(\Theta)]^2
123: \label{Fmain1}
124: \ee
125: where $d_i(\Theta)$ denotes the distance from the point ${\bf x}_i$ to
126: the curve $P(x,y;\Theta)=0$. The OLSF is the method of choice in
127: practice, especially when one fits simple curves such as lines and
128: circles. However, for more general curves the OLSF becomes intractable,
129: because the precise distance $d_i$ is hard to compute. For example,
130: when $P$ is a generic quadric (ellipse or hyperbola), the computation
131: of $d_i$ is equivalent to solving a polynomial equation of degree four,
132: and its direct solution is known to be numerically unstable, see
133: \cite{ARW01,GGS94} for more detail. Then one resorts to various
134: approximations. It is often convenient to minimize
135: \be
136: {\cal F}_2(\Theta) = \sum_{i=1}^n [P(x_i,y_i;\Theta)]^2
137: \label{Fmain2}
138: \ee
139: instead of (\ref{Fmain1}). This method is referred to as a
140: (simple) {\em algebraic fit} (AF), in this case one calls
141: $|P(x_i,y_i;\Theta)|$ the {\em algebraic distance}
142: \cite{ARW01,CBH01,GGS94} from the point $(x_i,y_i)$ to the curve.
143: The AF is computationally cheaper than the OLSF, but its accuracy
144: is often unacceptable, see below.
145:
146: The simple AF (\ref{Fmain2}) can be generalized to a {\em weighted
147: algebraic fit}, which is based on minimization of
148: \be
149: {\cal F}_3(\Theta) = \sum_{i=1}^n w_i\, [P(x_i,y_i;\Theta)]^2
150: \label{Fmain3}
151: \ee
152: where $w_i=w(x_i,y_i;\Theta)$ are some weights, which may balance
153: (\ref{Fmain2}) and improve its performance. One way to define
154: weights $w_i$ results from a linear approximation to $d_i$:
155: $$
156: d_i \approx \frac{|P(x_i,y_i;\Theta)|}
157: {\|\nabla_{\bf x}P(x_i,y_i;\Theta)\|}
158: $$
159: where $\nabla_{\bf x}P=(\partial P/\partial x,\partial P/\partial
160: y)$ is the gradient vector, see \cite{Ta91}. Then one minimizes
161: the function
162: \be
163: {\cal F}_4(\Theta) = \sum_{i=1}^n \frac{[P(x_i,y_i;\Theta)]^2}
164: {\|\nabla_{\bf x}P(x_i,y_i;\Theta)\|^2}
165: \label{Fmain4}
166: \ee
167: This method is called the {\em gradient weighted algebraic fit} (GRAF).
168: It is a particular case of (\ref{Fmain3}) with $w_i = 1/ \|\nabla_{\bf
169: x}P(x_i,y_i;\Theta)\|^2$.
170:
171: The GRAF is known since at least 1974 \cite{Tu74} and recently
172: became standard for polynomial curve fitting
173: \cite{Ta91,LM00,CBH01}. The computational cost of GRAF depends on
174: the function $P(x,y;\Theta)$, but, generally, the GRAF is much
175: faster than the OLSF. It is also known from practice that the
176: accuracy of GRAF is almost as good as that of the OLSF, and our
177: analysis below confirms this fact. The GRAF is often claimed to be
178: a {\em statistically optimal} weighted algebraic fit, and we will
179: prove this fact as well.
180:
181: Not much has been published on statistical properties of the OLSF and
182: algebraic fits, apart from the simplest case of fitting lines and
183: hyperplanes \cite{Hu97}. Chan \cite{Ch65}, Berman and Culpin
184: \cite{BC86} investigated circle fitting by the OLSF and the simple
185: algebraic fit (\ref{Fmain2}) assuming the structural model. Kanatani
186: \cite{Ka96,Ka98} used the Cartesian functional model and considered a
187: general curve fitting problem. He established an analogue of the
188: Rao-Cramer lower bound for unbiased estimates of $\Theta$, which we
189: call here Kanatani-Cramer-Rao (KCR) lower bound. He also showed that
190: the covariance matrices of the OLSF and the GRAF attain, to the leading
191: order in $\sigma$, his lower bound. We note, however, that in most
192: cases the OLSF and algebraic fits are {\em biased} \cite{BC86,Be89},
193: hence the KCR lower bound, as it is derived in \cite{Ka96,Ka98}, does
194: not immediately apply to these methods.
195:
196: In this paper we extend the KCR lower bound to biased estimates,
197: which include the OLSF and all weighted algebraic fits. We prove
198: the KCR bound for estimates satisfying the following mild
199: assumption:
200: \medskip
201:
202: \noindent{\bf Precision assumption}. For precise observations (when
203: ${\bf x}_i = \bar{\bf x}_i$ for all $1\leq i\leq n$), the estimate
204: $\hat{\Theta}$ is precise, i.e.
205: \be
206: \hat{\Theta}(\bar{\bf x}_1, \ldots, \bar{\bf x}_n) = \bar{\Theta}
207: \label{Tass}
208: \ee
209: It is easy to check that the OLSF and algebraic fits
210: (\ref{Fmain3}) satisfy this assumption. We will also show that all
211: unbiased estimates of $\hat{\Theta}$ satisfy (\ref{Tass}).
212:
213: We then prove that the GRAF is, indeed, a statistically efficient
214: fit, in the sense that its covariance matrix attains, to the
215: leading order in $\sigma$, the KCR lower bound. On the other hand,
216: rather surprisingly, we find that GRAF is not the only
217: statistically efficient algebraic fit, and we describe all
218: statistically efficient algebraic fits. Finally, we show that
219: Kanatani's theory and our extension to it remain valid for the
220: radial functional model. Our conclusions are illustrated by
221: numerical experiments on circle fitting algorithms.
222:
223:
224: \section{Kanatani-Cramer-Rao lower bound}
225: \label{secKCR} \setcounter{equation}{0}
226:
227: Recall that we have adopted the functional model, in which the true
228: points $\bar{\bf x}_i$, $1\leq i\leq n$, are fixed. This automatically
229: makes the sample size $n$ fixed, hence, many classical concepts of
230: statistics, such as consistency and asymptotic efficiency (which
231: require taking the limit $n\to\infty$) lose their meaning. It is
232: customary, in the studies of the functional model of the curve fitting
233: problem, to take the limit $\sigma \to 0$ instead of $n\to\infty$, cf.\
234: \cite{Ka96,Ka98}. This is, by the way, not unreasonable from the
235: practical point of view: in many experiments, $n$ is rather small and
236: cannot be (easily) increased, so the limit $n\to \infty$ is of little
237: interest. On the other hand, when the accuracy of experimental
238: observations is high (thus, $\sigma$ is small), the limit $\sigma\to 0$
239: is quite appropriate.
240:
241: Now, let $\hat{\Theta}({\bf x}_1,\ldots,{\bf x}_n)$ be an arbitrary
242: estimate of $\Theta$ satisfying the precision assumption (\ref{Tass}).
243: In our analysis we will always assume that all the underlying functions
244: are regular (continuous, have finite derivatives, etc.), which is a
245: standard assumption \cite{Ka96,Ka98}.
246:
247: The mean value of the estimate $\hat{\Theta}$ is
248: \be
249: E(\hat{\Theta}) =
250: \int\cdots\int \hat{\Theta}({\bf x}_1,\ldots,{\bf x}_n)
251: \, \prod_{i=1}^n f({\bf x}_i)\,
252: d{\bf x}_1\cdots d{\bf x}_n
253: \label{ET}
254: \ee
255: where $f({\bf x}_i)$ is the probability density function for the
256: random point ${\bf x}_i$, as specified by a particular model
257: (Cartesian or radial).
258: %For the Cartesian model
259: %$$
260: % f({\bf x}_i) = \frac{1}{2\pi\sigma^2}\,
261: % e^{-\frac{(x_i-\bar{x}_{i})^2 + (y_i-\bar{y}_{i})^2}{2\sigma^2}}
262: %$$
263: %is the normal density function. For the radial model, the integral
264: %variables only vary along the normal vectors ${\bf n}_i$ to the curve
265: %$P(x,y;\bar{\Theta})=0$ at the points $\bar{\bf x}_i$, and the density
266: %function is
267: %$$
268: % f({\bf x}_i) = \frac{1}{\sqrt{2\pi\sigma^2}}\,
269: % e^{-\frac{(x_i-\bar{x}_{i})^2 + (y_i-\bar{y}_{i})^2}{2\sigma^2}}
270: %$$
271:
272: We now expand the estimate $\hat{\Theta}({\bf x}_1, \ldots, {\bf
273: x}_n)$ into a Taylor series about the true point $(\bar{\bf x}_1,
274: \ldots,
275: \bar{\bf x}_n)$ remembering (\ref{Tass}):
276: \be
277: \hat{\Theta}({\bf x}_1, \ldots, {\bf x}_n) =
278: \bar{\Theta} + \sum_{i=1}^n
279: \Theta_i \times ({\bf x}_i - \bar{\bf x}_i)
280: + {\cal O}(\sigma^2)
281: \label{Texpand}
282: \ee
283: where
284: \be
285: {\Theta}_i = \nabla_{{\bf x}_i}\hat{\Theta}
286: (\bar{\bf x}_1, \ldots, \bar{\bf x}_n),
287: \ \ \ \ \ i=1,\ldots,n
288: \label{Ti}
289: \ee
290: and $\nabla_{{\bf x}_i}$ stands for the gradient with respect to
291: the variables $x_i,y_i$. In other words, $\Theta_i$ is a $k\times
292: 2$ matrix of partial derivatives of the $k$ components of the
293: function $\hat{\Theta}$ with respect to the two variables $x_i$
294: and $y_i$, and this derivative is taken at the point $(\bar{\bf
295: x}_1, \ldots, \bar{\bf x}_n)$,
296:
297: Substituting the expansion (\ref{Texpand}) into (\ref{ET}) gives
298: \be
299: E(\hat{\Theta}) = \bar{\Theta} + {\cal O}(\sigma^2)
300: \label{Tbias}
301: \ee
302: since $E({\bf x}_i - \bar{\bf x}_i)=0$. Hence, the bias of the
303: estimate $\hat{\Theta}$ is of order $\sigma^2$.
304:
305: It easily follows from the expansion (\ref{Texpand}) that the
306: covariance matrix of the estimate $\hat{\Theta}$ is given by
307: $$
308: {\cal C}_{\hat{\Theta}} = \sum_{i=1}^n
309: \Theta_i E[({\bf x}_i - \bar{\bf x}_i)({\bf x}_i - \bar{\bf x}_i)^T]
310: \Theta_i^T + {\cal O}(\sigma^4)
311: $$
312: (it is not hard to see that the cubical terms ${\cal O}(\sigma^3)$
313: vanish because the normal random variables with zero mean also
314: have zero third moment, see also \cite{Ka96}). Now, for the
315: Cartesian model
316: $$
317: E[({\bf x}_i - \bar{\bf x}_i)({\bf x}_i - \bar{\bf x}_i)^T]
318: =\sigma^2 I
319: $$
320: and for the radial model
321: $$
322: E[({\bf x}_i - \bar{\bf x}_i)({\bf x}_i - \bar{\bf x}_i)^T]
323: =\sigma^2 {\bf n}_i {\bf n}_i^T
324: $$
325: where ${\bf n}_i$ is a unit normal vector to the curve
326: $P(x,y;\bar{\Theta})=0$ at the point $\bar{\bf x}_i$. Then we obtain
327: \be
328: {\cal C}_{\hat{\Theta}} = \sigma^2 \sum_{i=1}^n
329: \Theta_i \Lambda_i \Theta_i^T + {\cal O}(\sigma^4)
330: \label{Csig0}
331: \ee
332: where $\Lambda_i=I$ for the Cartesian model and $\Lambda_i={\bf n}_i
333: {\bf n}_i^T$ for the radial model.
334: \\
335:
336: \noindent{\bf Lemma}. {\em We have $\Theta_i {\bf n}_i {\bf n}_i^T
337: \Theta_i^T = \Theta_i \Theta_i^T$ for each $i=1,\ldots,n$. Hence,
338: for both models, Cartesian and radial, the matrix ${\cal
339: C}_{\hat{\Theta}}$ is given by the same expression:}
340: \be
341: {\cal C}_{\hat{\Theta}} = \sigma^2 \sum_{i=1}^n
342: \Theta_i \Theta_i^T + {\cal O}(\sigma^4)
343: \label{Csig}
344: \ee
345:
346: This lemma is proved in Appendix.
347:
348: Our next goal is now to find a lower bound for the matrix
349: \be
350: {\cal D}_1:= \sum_{i=1}^n \Theta_i\Theta_i^T
351: \label{calC1}
352: \ee
353: Following \cite{Ka96,Ka98}, we consider perturbations of the parameter
354: vector $\bar{\Theta} +\delta \Theta$ and the true points $\bar{\bf x}_i
355: + \delta \bar{\bf x}_i$ satisfying two constraints. First, since the
356: true points must belong to the true curve, $P(\bar{\bf
357: x}_i;\bar{\Theta})=0$, we obtain, by the chain rule,
358: \be
359: \la \nabla_{{\bf x}}\, P(\bar{\bf x}_i;\bar{\Theta}), \delta \bar{\bf x}_i \ra
360: + \la \nabla_{\Theta} P(\bar{\bf x}_i;\bar{\Theta}), \delta \Theta \ra = 0
361: \label{Tcon1}
362: \ee
363: where $\la \cdot, \cdot \ra$ stands for the scalar product of vectors.
364: Second, since the identity (\ref{Tass}) holds for all $\Theta$, we get
365: \be
366: \sum_{i=1}^n
367: \Theta_i\, \delta \bar{\bf x}_i
368: = \delta \Theta
369: \label{Tcon2}
370: \ee
371: by using the notation (\ref{Ti}).
372:
373: Now we need to find a lower bound for the matrix (\ref{calC1})
374: subject to the constraints (\ref{Tcon1}) and (\ref{Tcon2}). That
375: bound follows from a general theorem in linear algebra:
376: \\
377:
378: \noindent{\bf Theorem (Linear Algebra)}. {\em Let $n\geq k\geq 1$ and
379: $m\geq 1$. Suppose $n$ nonzero vectors $u_i\in\IR^m$ and $n$ nonzero
380: vectors $v_i\in\IR^k$ are given, $1\leq i\leq n$. Consider $k\times m$
381: matrices
382: $$
383: X_i = \frac{v_iu_i^T}{u_i^Tu_i}\
384: $$
385: for $1\leq i\leq n$, and $k\times k$ matrix
386: $$
387: B = \sum_{i=1}^n X_i X_i^T
388: = \sum_{i=1}^n \frac{v_iv_i^T}{u_i^Tu_i}
389: $$
390: Assume that the vectors $v_1,\ldots,v_n$ span $\IR^k$ (hence $B$
391: is nonsingular). We say that a set of $n$ matrices
392: $A_1,\ldots,A_n$ (each of size $k\times m$) is {\bf proper} if
393: \be
394: \sum_{i=1}^n A_i w_i = r
395: \label{properA1}
396: \ee
397: for any vectors $w_i\in\IR^m$ and $r\in \IR^k$ such that
398: \be
399: u_i^Tw_i + v_i^Tr = 0
400: \label{properA2}
401: \ee
402: for all $1\leq i\leq n$. Then for any proper set of matrices
403: $A_1,\ldots,A_n$ the $k\times k$ matrix $D = \sum_{i=1}^n A_iA_i^T$ is
404: bounded from below by $B^{-1}$ in the sense that $D - B^{-1}$ is a
405: positive semidefinite matrix. The equality $D=B^{-1}$ holds if and only
406: if $A_i = - B^{-1} X_i$ for all $i=1,\ldots,n$.}
407: \\
408:
409: This theorem is, probably, known, but we provide a full proof in
410: Appendix, for the sake of completeness.
411:
412: As a direct consequence of the above theorem we obtain the lower
413: bound for our matrix ${\cal D}_1$:
414: \\
415:
416: \noindent{\bf Theorem (Kanatani-Cramer-Rao lower bound)}. {\em We
417: have ${\cal D}_1\geq{\cal D}_{\min}$, in the sense that ${\cal
418: D}_1 - {\cal D}_{\min}$ is a positive semidefinite matrix, where}
419: \be
420: {\cal D}_{\min}^{-1} = \sum_{i=1}^n
421: \frac{(\nabla_{\Theta} P(\bar{\bf x}_i;\Theta))
422: (\nabla_{\Theta} P(\bar{\bf x}_i;\Theta))^T}
423: {\|\nabla_{{\bf x}}\, P(\bar{\bf x}_i;\Theta)\|^2}
424: \label{Dmin}
425: \ee
426:
427:
428: In view of (\ref{Csig}) and (\ref{calC1}), the above theorem says that
429: the lower bound for the covariance matrix ${\cal C}_{\hat{\Theta}}$ is,
430: to the leading order,
431: \be
432: {\cal C}_{\hat{\Theta}} \geq {\cal C}_{\min}
433: = \sigma^2 {\cal D}_{\min}
434: \label{RC}
435: \ee
436: The standard deviations of the components of the estimate
437: $\hat{\Theta}$ are of order $\sigma_{\hat{\Theta}} = {\cal
438: O}(\sigma)$. Therefore, the bias of $\hat{\Theta}$, which is at
439: most of order $\sigma^2$ by (\ref{Tbias}), is infinitesimally
440: small, as $\sigma \to 0$, compared to the standard deviations.
441: This means that the estimates satisfying (\ref{Tass}) are
442: practically unbiased.
443:
444: The bound (\ref{RC}) was first derived by Kanatani
445: \cite{Ka96,Ka98} for the Cartesian functional model and strictly
446: unbiased estimates of $\Theta$, i.e.\ satisfying $E(\hat{\Theta})
447: =\bar{\Theta}$. One can easily derive (\ref{Tass}) from
448: $E(\hat{\Theta}) =\bar{\Theta}$ by taking the limit $\sigma \to
449: 0$, hence our results generalize those of Kanatani.
450:
451:
452: \section{Statistical efficiency of algebraic fits}
453: \label{secSE} \setcounter{equation}{0}
454:
455:
456: Here we derive an explicit formula for the covariance matrix of the
457: weighted algebraic fit (\ref{Fmain3}) and describe the weights $w_i$
458: for which the fit is statistically efficient. For brevity, we write
459: $P_i = P(x_i,y_i;\Theta)$. We assume that the weight function
460: $w(x,y,;\Theta)$ is regular, in particular has bounded derivatives with
461: respect to $\Theta$, the next section will demonstrate the importance
462: of this condition. The solution of the minimization problem
463: (\ref{Fmain3}) satisfies
464: \be
465: \sum P_i^2 \, \nabla_{\Theta} w_i +
466: 2 \sum w_i \, P_i \, \nabla_{\Theta} P_i = 0
467: \label{weq}
468: \ee
469: Observe that $P_i = {\cal O} (\sigma)$, so that the first sum in
470: (\ref{weq}) is ${\cal O}(\sigma^2)$ and the second sum is ${\cal
471: O} (\sigma)$. Hence, to the leading order, the solution of
472: (\ref{weq}) can be found by discarding the first sum and solving
473: the reduced equation
474: \be
475: \sum w_i\, P_i\, \nabla_{\Theta} P_i = 0
476: \label{weq1}
477: \ee
478: More precisely, if $\hat{\Theta}_1$ and $\hat{\Theta}_2$ are
479: solutions of (\ref{weq}) and (\ref{weq1}), respectively, then
480: $\hat{\Theta}_1 -\bar{\Theta} = {\cal O} (\sigma)$,
481: $\hat{\Theta}_2 -\bar{\Theta} = {\cal O} (\sigma)$, and
482: $\|\hat{\Theta}_1 -\hat{\Theta}_2 \|= {\cal O} (\sigma^2)$.
483: Furthermore, the covariance matrices of $\hat{\Theta}_1$ and
484: $\hat{\Theta}_2$ coincide, to the leading order, i.e.\ ${\cal
485: C}_{\hat{\Theta}_1} {\cal C}_{\hat{\Theta}_2}^{-1} \to I$ as
486: $\sigma \to 0$. Therefore, in what follows, we only deal with the
487: solution of equation (\ref{weq1}).
488:
489: To find the covariance matrix of $\hat{\Theta}$ satisfying
490: (\ref{weq1}) we put $\hat{\Theta} =\bar{\Theta} +\delta \Theta$
491: and ${\bf x}_i = \bar{\bf x}_i + \delta {\bf x}_i$ and obtain,
492: working to the leading order,
493: $$
494: \sum w_i (\nabla_{\Theta} P_i)
495: (\nabla_{\Theta} P_i)^T\, (\delta \Theta)
496: = - \sum w_i (\nabla_{\bf x} P_i)^T \, (\delta {\bf x}_i) \,
497: (\nabla_{\Theta} P_i) + {\cal O}(\sigma^2)
498: $$
499: hence
500: $$
501: \delta \Theta = -
502: \left [ \sum w_i (\nabla_{\Theta} P_i)
503: (\nabla_{\Theta} P_i)^T \right ]^{-1}
504: \left [ \sum w_i (\nabla_{\bf x} P_i)^T \,
505: (\delta {\bf x}_i)\, (\nabla_{\Theta} P_i)\right ]
506: + {\cal O}(\sigma^2)
507: $$
508: The covariance matrix is then
509: \begin{eqnarray*}
510: {\cal C}_{\hat{\Theta}} & = &
511: E \left [ (\delta \Theta)\, (\delta \Theta)^T \right ]\\
512: & = & \sigma^2
513: \left [ \sum w_i (\nabla_{\Theta} P_i)
514: (\nabla_{\Theta} P_i)^T \right ]^{-1}
515: \left [ \sum w_i^2 \|\nabla_{\bf x} P_i\|^2
516: (\nabla_{\Theta} P_i)
517: (\nabla_{\Theta} P_i)^T \right ]\\
518: & & \times \left [ \sum w_i (\nabla_{\Theta} P_i)
519: (\nabla_{\Theta} P_i)^T \right ]^{-1}
520: + {\cal O}(\sigma^3)
521: \end{eqnarray*}
522: Denote by ${\cal D}_2$ the principal factor here, i.e.\
523: $$
524: {\cal D}_2 =
525: \left [ \sum w_i (\nabla_{\Theta} P_i)
526: (\nabla_{\Theta} P_i)^T \right ]^{-1}
527: \left [ \sum w_i^2 \|\nabla_{\bf x} P_i\|^2
528: (\nabla_{\Theta} P_i)
529: (\nabla_{\Theta} P_i)^T \right ]\,
530: \left [ \sum w_i (\nabla_{\Theta} P_i)
531: (\nabla_{\Theta} P_i)^T \right ]^{-1}
532: $$
533: The following theorem establishes a lower bound for ${\cal D}_2$:
534: \\
535:
536: \noindent{\bf Theorem}. {\em We have ${\cal D}_2\geq{\cal
537: D}_{\min}$, in the sense that ${\cal D}_2 - {\cal D}_{\min}$ is a
538: positive semidefinite matrix, where ${\cal D}_{\min}$ is given by
539: (\ref{Dmin}). The equality ${\cal D}_2 ={\cal D}_{\min}$ holds if
540: and only if $w_i = {\rm const}/\|\nabla_{{\bf x}}\, P_i\|^2$ for
541: all $i=1,\ldots,n$. In other words, an algebraic fit
542: (\ref{Fmain3}) is {\bf statistically efficient} if and only if the
543: weight function $w(x,y;\Theta)$ satisfies
544: \be
545: w(x,y;\Theta) = \frac{c(\Theta)}{\|\nabla_{{\bf x}}\, P(x,y;\Theta)\|^2}
546: \label{wopt}
547: \ee
548: for all triples $x,y,\Theta$ such that $P(x,y;\Theta)=0$. Here
549: $c(\Theta)$ may be an arbitrary function of $\Theta$.}
550: \\
551:
552: The bound ${\cal D}_2\geq{\cal D}_{\min}$ here is a particular case of
553: the previous theorem. It also can be obtained directly from the linear
554: algebra theorem if one sets $u_i= \nabla_{\bf x} P_i$, $v_i=
555: \nabla_{\Theta} P_i$, and
556: $$
557: A_i = - w_i\, \left [ \sum_{j=1}^n w_j (\nabla_{\Theta} P_j)
558: (\nabla_{\Theta} P_j)^T \right ]^{-1}
559: (\nabla_{\Theta} P_i) \,
560: (\nabla_{\bf x} P_i)^T
561: $$
562: for $1\leq i\leq n$.
563:
564: The expression (\ref{wopt}) characterizing the efficiency, follows from
565: the last claim in the linear algebra theorem.
566:
567: \section{Circle fit}
568: \label{secCF} \setcounter{equation}{0}
569:
570: Here we illustrate our conclusions by the relatively simple
571: problem of fitting circles. The canonical equation of a circle is
572: \be
573: (x-a)^2+ (y-b)^2 -R^2=0
574: \label{circ0}
575: \ee
576: and we need to estimate three parameters $a,b,R$. The simple
577: algebraic fit (\ref{Fmain2}) takes form
578: \be
579: {\cal F}_2(a,b,R) =
580: \sum_{i=1}^n [(x_i-a)^2+ (y_i-b)^2 -R^2]^2
581: \ \ \to\ \ \min
582: \label{F2}
583: \ee
584: and the weighted algebraic fit (\ref{Fmain3}) takes form
585: \be
586: {\cal F}_3(a,b,R) =
587: \sum_{i=1}^n w_i [(x_i-a)^2+ (y_i-b)^2 -R^2]^2
588: \ \ \to\ \ \min
589: \label{F3}
590: \ee
591: In particular, the GRAF becomes
592: \be
593: {\cal F}_4(a,b,R) =
594: \sum_{i=1}^n \frac{[(x_i-a)^2+ (y_i-b)^2 -R^2]^2}
595: {(x_i-a)^2+ (y_i-b)^2}
596: \ \ \to\ \ \min
597: \label{F4}
598: \ee
599: (where the irrelevant constant factor of 4 in the denominator is
600: dropped).
601:
602: In terms of (\ref{Dmin}), we have
603: $$
604: \nabla_{\Theta} P(\bar{\bf x}_i;\Theta)
605: = -2(\bar{x}_i-a,\bar{y}_i-b,R)^T
606: $$
607: and $\nabla_{{\bf x}}\, P(\bar{\bf x}_i;\Theta) =
608: 2(\bar{x}_i-a,\bar{y}_i-b)^T$, hence
609: $$
610: \|\nabla_{{\bf x}}\, P(\bar{\bf x}_i;\Theta)\|^2 =
611: 4[(\bar{x}_i-a)^2+(\bar{y}_i-b)^2]=4R^2
612: $$
613: Therefore,
614: \be
615: {\cal D}_{\min} = \left (\begin{array}{ccc}
616: \sum u_i^2 & \sum u_iv_i & \sum u_i \\
617: \sum u_iv_i & \sum v_i^2 & \sum v_i \\
618: \sum u_i & \sum v_i & n \\
619: \end{array} \right )^{-1}
620: \label{Dmincir}
621: \ee
622: where we denote, for brevity,
623: $$
624: u_i=\frac{\bar{x}_i-a}{R},\ \ \ \
625: v_i=\frac{\bar{y}_i-b}{R}
626: $$
627: The above expression for ${\cal D}_{\min}$ was derived earlier in
628: \cite{CT95,Ka98}.
629:
630: Now, our Theorem in Section~\ref{secSE} shows that the weighted
631: algebraic fit (\ref{F3}) is statistically efficient if and only if
632: the weight function satisfies $w(x,y;a,b,R)=c(a,b,R)/(4R^2)$.
633: Since $c(a,b,R)$ may be an arbitrary function, then the
634: denominator $4R^2$ here is irrelevant. Hence, statistically
635: efficiency is achieved whenever $w(x,y;a,b,R)$ is simply
636: independent of $x$ and $y$ for all $(x,y)$ lying on the circle. In
637: particular, the GRAF (\ref{F4}) is statistically efficient because
638: $w(x,y;a,b,R)=[(x-a)^2+(y-b)^2]^{-1}=R^{-2}$. The simple AF
639: (\ref{F2}) is also statistically efficient since $w(x,y;a,b,R)=1$.
640:
641: We note that the GRAF (\ref{F4}) is a highly nonlinear problem, and in
642: its exact form (\ref{F4}) is not used in practice. Instead, there are
643: two modifications of GRAF popular among experimenters. One is due to
644: Chernov and Ososkov \cite{CO84} and Pratt \cite{Pr87}:
645: \be
646: {\cal F}_4'(a,b,R) =
647: R^{-2}\sum_{i=1}^n [(x_i-a)^2+ (y_i-b)^2 -R^2]^2
648: \ \ \to\ \ \min
649: \label{F4a}
650: \ee
651: (it is based on the approximation $(x_i-a)^2+ (y_i-b)^2 \approx R^2$),
652: and the other due to Agin \cite{Ag81} and Taubin \cite{Ta91}:
653: \be
654: {\cal F}_4''(a,b,R) =
655: \frac{1}{\sum (x_i-a)^2+ (y_i-b)^2}
656: \sum_{i=1}^n [(x_i-a)^2+ (y_i-b)^2 -R^2]^2
657: \ \ \to\ \ \min
658: \label{F4b}
659: \ee
660: (here one simply averages the denominator of (\ref{F4}) over $1\leq
661: i\leq n$). We refer the reader to \cite{CL02} for a detailed analysis
662: of these and other circle fitting algorithms, including their numerical
663: implementations.
664:
665: We have tested experimentally the efficiency of four circle fitting
666: algorithms: the OLSF (\ref{Fmain1}), the simple AF (\ref{F2}), the
667: Pratt method (\ref{F4a}), and the Taubin method (\ref{F4b}). We have
668: generated $n=20$ points equally spaced on a circle, added an isotropic
669: Gaussian noise with variance $\sigma^2$ (according to the Cartesian
670: model), and estimated the efficiency of the estimate of the center by
671: \be
672: E = \frac{\sigma^2 ({\cal D}_{11}+{\cal D}_{22})}
673: {\la (\hat{a}-a)^2 + (\hat{b}-b)^2 \ra}
674: \label{E}
675: \ee
676: Here $(a,b)$ is the true center, $(\hat{a},\hat{b})$ is its estimate,
677: $\la \cdots \ra$ denotes averaging over many random samples, and ${\cal
678: D}_{11}$, ${\cal D}_{22}$ are the first two diagonal entries of the
679: matrix (\ref{Dmincir}). Table~1 shows the efficiency of the above
680: mentioned four algorithms for various values of $\sigma/R$. We see that
681: they all perform very well, and indeed are efficient as $\sigma\to 0$.
682: One might notice that the OLSF slightly outperforms the other methods,
683: and the AF is the second best.
684:
685: \begin{center}
686: \begin{tabular}{||r||c|c|c|c||}
687: \hline\hline $\sigma/R$ & OLSF & AF & Pratt & Taubin \\
688: \hline \hline $<0.01$ & $\sim 1$ & $\sim 1$ & $\sim 1$ & $\sim 1$ \\
689: \hline 0.01 & 0.999 & 0.999 & 0.999 & 0.999 \\ \hline
690: 0.02 & 0.999 & 0.998 & 0.997 & 0.997 \\ \hline
691: 0.03 & 0.998 & 0.996 & 0.995 & 0.995 \\ \hline
692: 0.05 & 0.996 & 0.992 & 0.987 & 0.987 \\ \hline
693: 0.10 & 0.985 & 0.970 & 0.953 & 0.953 \\ \hline
694: 0.20 & 0.935 & 0.900 & 0.837 & 0.835 \\ \hline
695: 0.30 & 0.825 & 0.824 & 0.701 & 0.692 \\ \hline
696: \hline
697: \end{tabular}\vspace*{0.2cm}
698: \end{center}
699:
700: \begin{center}
701: Table 1. Efficiency of circle fitting algorithms. Data are sampled
702: along a full circle.
703: \end{center}
704:
705: Table~2 shows the efficiency of the same algorithms as the data points
706: are sampled along half a circle, rather than a full circle. Again, the
707: efficiency as $\sigma\to 0$ is clear, but we also make another
708: observation. The AF now consistently falls behind the other methods for
709: all $\sigma/R\leq 0.2$, but for $\sigma/R=0.3$ the others suddenly
710: break down, while the AF keeps afloat.
711:
712: \begin{center}
713: \begin{tabular}{||r||c|c|c|c||}
714: \hline\hline $\sigma/R$ & OLSF & AF & Pratt & Taubin \\
715: \hline \hline $<0.01$ & $\sim 1$ & $\sim 1$ & $\sim 1$ & $\sim 1$ \\
716: \hline 0.01 & 0.999 & 0.996 & 0.999 & 0.999 \\ \hline
717: 0.02 & 0.997 & 0.983 & 0.997 & 0.997 \\ \hline
718: 0.03 & 0.994 & 0.961 & 0.992 & 0.992 \\ \hline
719: 0.05 & 0.984 & 0.902 & 0.978 & 0.978 \\ \hline
720: 0.10 & 0.935 & 0.720 & 0.916 & 0.916 \\ \hline
721: 0.20 & 0.720 & 0.493 & 0.703 & 0.691 \\ \hline
722: 0.30 & 0.122 & 0.437 & 0.186 & 0.141 \\ \hline
723: \hline
724: \end{tabular}\vspace*{0.2cm}
725: \end{center}
726:
727: \begin{center}
728: Table 2. Efficiency of circle fitting algorithms with data sampled
729: along half a circle.
730: \end{center}
731:
732: The reason of the above turnaround is that at large noise the data
733: points may occasionally line up along a circular arc of a very large
734: radius. Then the OLSF, Pratt and Taubin dutifully return a large circle
735: whose center lies far away, and such fits blow up the denominator of
736: (\ref{E}), a typical effect of large outliers. On the contrary, the AF
737: is notoriously known for its systematic bias toward smaller circles
738: \cite{CO84,GGS94,Pr87}, hence while it is less accurate than other fits
739: for typical random samples, its bias safeguards it from large outliers.
740:
741: This behavior is even more pronounced when the data are sampled along
742: quarter\footnote{All our algorithms are invariant under simple
743: geometric transformations such as translations, rotations and
744: similarities, hence our experimental results do not depend on the
745: choice of the circle, its size, and the part of the circle the data are
746: sampled from.} of a circle (Table~3). We see that the AF is now far
747: worse than the other fits for $\sigma/R<0.1$ but the others
748: characteristically break down at some point ($\sigma/R=0.1$).
749:
750: \begin{center}
751: \begin{tabular}{||r||c|c|c|c||}
752: \hline\hline $\sigma/R$ & OLSF & AF & Pratt & Taubin \\
753: \hline \hline 0.01 & 0.997 & 0.911 & 0.997 & 0.997 \\ \hline
754: 0.02 & 0.977 & 0.722 & 0.978 & 0.978 \\ \hline
755: 0.03 & 0.944 & 0.555 & 0.946 & 0.946 \\ \hline
756: 0.05 & 0.837 & 0.365 & 0.843 & 0.842 \\ \hline
757: 0.10 & 0.155 & 0.275 & 0.163 & 0.158 \\ \hline
758: \hline
759: \end{tabular}\vspace*{0.2cm}
760: \end{center}
761:
762: \begin{center}
763: Table 3. Data are sampled along a quarter of a circle.
764: \end{center}
765:
766: It is interesting to test smaller circular arcs, too. Figure 1
767: shows a color-coded diagram of the efficiency of the OLSF and the
768: AF for arcs from $0^{\rm o}$ to $50^{\rm o}$ and variable $\sigma$
769: (we set $\sigma=ch$, where $h$ is the height of the circular arc,
770: see Fig.~2, and $c$ varies from 0 to 0.5). The efficiency of the
771: Pratt and Taubin is virtually identical to that of the OLSF, so it
772: is not shown here. We see that the OLSF and AF are efficient as
773: $\sigma\to 0$ (both squares in the diagram get white at the
774: bottom), but the AF loses its efficiency at moderate levels of
775: noise ($c>0.1$), while the OLSF remains accurate up to $c=0.3$
776: after which it rather sharply breaks down.
777:
778: \vspace*{10mm} \centerline{\epsffile{PrattD.eps}$\ \ \ \ $
779: \epsffile{AlgD.eps} $\ \ \ \ $ \epsffile{Bar.eps}}
780:
781: \begin{center}
782: Figure 1: The efficiency of the simple OLSF (left) and the AF (center).
783: The bar on the right explains color codes.
784: \end{center} \vspace*{5mm}
785:
786: The following analysis sheds more light on the behavior of the
787: circle fitting algorithms. When the curvature of the arc
788: decreases, the center coordinates $a,b$ and the radius $R$ grow to
789: infinity and their estimates become highly unreliable. In that
790: case the circle equation (\ref{circ0}) can be converted to a more
791: convenient algebraic form
792: \be
793: A(x^2+y^2) + Bx + Cy + D = 0
794: \label{ABCD}
795: \ee
796: with an additional constrain on the parameters: $B^2+C^2-4AD = 1$. This
797: parametrization was used in \cite{Pr87,GGS94}, and analyzed in detail
798: in \cite{CL02}. We note that the original parameters can be recovered
799: via $a=-B/2A$, $b=-C/2A$, and $R=(2\,|A|)^{-1}$. The new
800: parametrization (\ref{ABCD}) is safe to use for arcs with arbitrary
801: small curvature: the parameters $A,B,C,D$ remain bounded and never
802: develop singularities, see \cite{CL02}. Even as the curvature vanishes,
803: we simply get $A=0$, and the equation (\ref{ABCD}) represents a line
804: $Bx+Cy+D=0$.
805:
806: \vspace*{5mm} \centerline{\epsffile{cl2-02.eps}}
807:
808: \begin{center}
809: Figure 2: The height of an arc, $h$, and our formula for $\sigma$.
810: \end{center} \vspace*{5mm}
811:
812: In terms of the new parameters $A,B,C,D$, the weighted algebraic fit
813: (\ref{Fmain3}) takes form
814: \be
815: {\cal F}_3(A,B,C,D) =
816: \sum_{i=1}^n w_i [A(x^2+y^2) + Bx + Cy + D]^2
817: \ \ \to\ \ \min
818: \label{FF3}
819: \ee
820: (under the constraint $B^2+C^2-4AD = 1$). Converting the AF (\ref{F2})
821: to the new parameters gives
822: \be
823: {\cal F}_2(A,B,C,D) =
824: \sum_{i=1}^n A^{-2} [A(x^2+y^2) + Bx + Cy + D]^2
825: \ \ \to\ \ \min
826: \label{FF2}
827: \ee
828: which corresponds to the weight function $w=1/A^2$. The Pratt method
829: (\ref{F4a}) turns to
830: \be
831: {\cal F}_4(A,B,C,D) =
832: \sum_{i=1}^n [A(x^2+y^2) + Bx + Cy + D]^2
833: \ \ \to\ \ \min
834: \label{FF4}
835: \ee
836: We now see why the AF is unstable and inaccurate for arcs with
837: small curvature: its weight function $w=1/A^2$ develops a
838: singularity (it explodes) in the limit $A\to 0$. Recall that, in
839: our derivation of the statistical efficiency theorem (Section~3),
840: we assumed that the weight function was regular (had bounded
841: derivatives). This assumption is clearly violated by the AF
842: (\ref{FF2}). On the contrary, the Pratt fit (\ref{FF4}) uses a
843: safe choice $w=1$ and thus behaves decently on arcs with small
844: curvature, see next.
845:
846: \vspace*{10mm} \centerline{\epsffile{AlgA.eps}$\ \ \ \ $
847: \epsffile{PrattA.eps} $\ \ \ \ $ \epsffile{Bar.eps}}
848:
849: \begin{center}
850: Figure 3: The efficiency of the simple AF (left) and the Pratt
851: method (center). The bar on the right explains color codes.
852: \end{center} \vspace*{5mm}
853:
854: Figure 3 shows a color-coded diagram of the efficiency of the estimate
855: of the parameter\footnote{Note that $|A|=1/2R$, hence the estimation of
856: $A$ is equivalent to that of the curvature, an important geometric
857: parameter of the arc.} $A$ by the AF (\ref{FF2}) versus Pratt
858: (\ref{FF4}) for arcs from $0^{\rm o}$ to $50^{\rm o}$ and the noise
859: level $\sigma=ch$, where $h$ is the height of the circular arc and $c$
860: varies from 0 to 0.5. The efficiency of the OLSF and the Taubin method
861: is visually indistinguishable from that of Pratt (the central square in
862: Fig.~3), so we did not include it here.
863:
864: We see that the AF performs significantly worse than the Pratt
865: method for all arcs and most of the values of $c$ (i.e.,
866: $\sigma$). The Pratt's efficiency is close 100\%, its lowest point
867: is 89\% for $50^{\rm o}$ arcs and $c=0.5$ (the top right corner of
868: the central square barely gets grey). The AF's efficiency is below
869: 10\% for all $c>0.2$ and almost zero for $c>0.4$. Still, the AF
870: remains efficient as $\sigma\to 0$ (as the tiny white strip at the
871: bottom of the left square proves), but its efficiency can be only
872: counted on when $\sigma$ is extremely small.
873:
874: Our analysis demonstrates that the choice of the weights $w_i$ in the
875: weighted algebraic fit (\ref{Fmain3}) should be made according to our
876: theorem in Section~3, and, in addition, one should avoid singularities
877: in the domain of parameters.
878:
879:
880: \renewcommand{\theequation}{A.\arabic{equation}}
881:
882: \section*{Appendix}
883: \label{secA} \setcounter{equation}{0}
884:
885: Here we prove the theorem of linear algebra stated in
886: Section~\ref{secKCR}. For the sake of clarity, we divide our proof into
887: small lemmas:
888: \medskip
889:
890: \noindent{\bf Lemma 1}. {\em The matrix $B$ is indeed nonsingular}.
891:
892: {\em Proof}. If $Bz=0$ for some nonzero vector $z\in \IR^k$, then
893: $0 = z^TBz = \sum_{i=1}^n (v_i^Tz)^2/\|u_i\|^2$, hence $v_i^Tz=0$
894: for all $1\leq i\leq k$, a contradiction.
895: \medskip
896:
897: \noindent{\bf Lemma 2}. {\em If a set of $n$ matrices $A_1,\ldots,A_n$
898: is proper, then rank$(A_i)\leq 1$. Furthermore, each $A_i$ is given by
899: $A_i = z_iu_i^T$ for some vector $z_i\in \IR^k$, and the vectors
900: $z_1,\ldots,z_n$ satisfy $\sum_{i=1}^n z_iv_i^T = -I$ where $I$ is the
901: $k\times k$ identity matrix. The converse is also true.}
902:
903: {\em Proof}. Let vectors $w_1,\ldots,w_n$ and $r$ satisfy the
904: requirements (\ref{properA1}) and (\ref{properA2}) of the theorem.
905: Consider the orthogonal decomposition $w_i = c_iu_i + w_i^\perp$ where
906: $w_i^\perp$ is perpendicular to $u_i$, i.e.\ $u_i^Tw_i^\perp = 0$. Then
907: the constraint (\ref{properA2}) can be rewritten as
908: \be
909: c_i = -\frac{v_i^Tr}{u_i^Tu_i}
910: \label{properA3}
911: \ee
912: for all $i=1,\ldots,n$ and (\ref{properA1}) takes form
913: \be
914: \sum_{i=1}^n c_iA_iu_i + \sum_{i=1}^n A_iw_i^\perp = r
915: \label{properA4}
916: \ee
917: We conclude that $A_iw_i^\perp = 0$ for every vector $w_i^\perp$
918: orthogonal to $u_i$, hence $A_i$ has a $(k-1)$-dimensional kernel, so
919: indeed its rank is zero or one. If we denote $z_i = A_iu_i/ \|u_i\|^2$,
920: we obtain $A_i=z_iu_i^T$. Combining this with
921: (\ref{properA3})-(\ref{properA4}) gives
922: $$
923: r = - \sum_{i=1}^n (v_i^Tr)z_i =
924: - \left (\sum_{i=1}^n z_iv_i^T\right )\, r
925: $$
926: Since this identity holds for any vector $r\in \IR^k$, the expression
927: within parentheses is $-I$. The converse is obtained by straightforward
928: calculations. Lemma is proved. \medskip
929:
930: \noindent{\bf Corollary}. {\em Let ${\bf n}_i = u_i/\|u_i\|$. Then
931: $A_i{\bf n}_i{\bf n}_i^TA_i = A_iA_i^T$ for each $i$}.
932: \medskip
933:
934: This corollary implies our lemma stated in Section~\ref{secKCR}. We now
935: continue the proof of the theorem.\medskip
936:
937: \noindent{\bf Lemma 3}. {\em The sets of proper matrices make a linear
938: variety, in the following sense. Let $A_1',\ldots,A_n'$ and
939: $A_1'',\ldots,A_n''$ be two proper sets of matrices, then the set
940: $A_1,\ldots,A_n$ defined by $A_i = A_i' + c(A_i''- A_i')$ is proper for
941: every $c\in\IR$.}
942:
943: {\em Proof}. According to the previous lemma, $A_i'=z_i'u_i^T$ and
944: $A_i''=z_i''u_i^T$ for some vectors $z_i',z_i''$, $1\leq i\leq n$.
945: Therefore, $A_i=z_iu_i^T$ for $z_i= z_i' + c(z_i''- z_i')$. Lastly,
946: $$
947: \sum_{i=1}^n z_iv_i^T = \sum_{i=1}^n z_i'v_i^T
948: +c\sum_{i=1}^n z_i''v_i^T - c\sum_{i=1}^n z_i'v_i^T = - I
949: $$
950: Lemma is proved.
951: \medskip
952:
953: \noindent{\bf Lemma 4}. {\em If a set of $n$ matrices $A_1,\ldots,A_n$
954: is proper, then $\sum_{i=1}^n A_iX_i^T = -I$, where $I$ is the $k\times
955: k$ identity matrix.}
956:
957:
958: {\em Proof}. By using Lemma~2 $\sum_{i=1}^n A_iX_i^T =
959: \sum_{i=1}^n z_iv_i^T = -I$. Lemma is proved.
960: \medskip
961:
962: \noindent{\bf Lemma 5}. {\em We have indeed $D \geq B^{-1}$.}
963:
964: {\em Proof}. For each $i=1,\ldots,n$ consider the $2k\times m$
965: matrix $Y_i = \left (\begin{array}{c} A_i\\X_i\end{array} \right
966: )$. Using the previous lemma gives
967: $$
968: \sum_{i=1}^n Y_i\,Y_i^T =
969: \left (\begin{array}{rr} D & -I\\ -I & B
970: \end{array} \right )
971: $$
972: By construction, this matrix is positive semidefinite. Hence, the
973: following matrix is also positive semidefinite:
974: $$
975: \left (\begin{array}{rr} I & B^{-1} \\ 0 & B^{-1}
976: \end{array} \right )
977: \left (\begin{array}{rr} D & -I\\ -I & B
978: \end{array} \right )
979: \left (\begin{array}{cc} I & 0 \\ B^{-1} & B^{-1}
980: \end{array} \right ) =
981: \left (\begin{array}{cc} D-B^{-1} & 0\\ 0 & B^{-1}
982: \end{array} \right )
983: $$
984: By Sylvester's theorem, the matrix $D-B^{-1}$ is positive semidefinite.
985: \medskip
986:
987: \noindent{\bf Lemma 6}. {\em The set of matrices $A_i^{\rm o} = -
988: B^{-1} X_i$ is proper, and for this set we have $D=B^{-1}$.}
989:
990: {\em Proof}. Straightforward calculation.
991: \medskip
992:
993: \noindent{\bf Lemma 7}. {\em If $D=B^{-1}$ for some proper set of
994: matrices $A_1,\ldots,A_n$, then $A_i=A_i^{\rm o}$ for all $1\leq i\leq
995: n$}.
996:
997:
998: {\em Proof}. Assume that there is a proper set of matrices
999: $A_1',\ldots,A_n'$, different from $A_1^{\rm o},\ldots,A_n^{\rm
1000: o}$, for which $D=B^{-1}$. Denote $\delta A_i = A_i'-A_i^{\rm o}$.
1001: By Lemma 3, the set of matrices $A_i(\gamma) = A_i^{\rm o} +
1002: \gamma (\delta A_i)$ is proper for every real $\gamma$. Consider
1003: the variable matrix
1004: \begin{eqnarray*}
1005: D(\gamma) & = & \sum_{i=1}^n [A_i(\gamma)] [A_i(\gamma)]^T\\
1006: & = & \sum_{i=1}^n A_i^{\rm o}(A_i^{\rm o})^T
1007: + \gamma\left (\sum_{i=1}^n A_i^{\rm o}(\delta A_i)^T
1008: +\sum_{i=1}^n (\delta A_i)(A_i^{\rm o})^T\right )
1009: +\gamma^2\sum_{i=1}^n (\delta A_i)(\delta A_i)^T
1010: \end{eqnarray*}
1011: Note that the matrix $R = \sum_{i=1}^n A_i^{\rm o}(\delta A_i)^T
1012: +\sum_{i=1}^n (\delta A_i)(A_i^{\rm o})^T$ is symmetric. By
1013: Lemma~5 we have $D(\gamma)\geq B^{-1}$ for all $\gamma$, and by
1014: Lemma~6 we have $D(0)=B^{-1}$. It is then easy to derive that
1015: $R=0$. Next, the matrix $S =\sum_{i=1}^n (\delta A_i)(\delta
1016: A_i)^T$ is symmetric positive semidefinite. Since we assumed that
1017: $D(1)=D(0)=B^{-1}$, it is easy to derive that $S=0$ as well.
1018: Therefore, $\delta A_i = 0$ for every $i=1,\ldots,n$. The theorem
1019: is proved.
1020:
1021: \begin{thebibliography}{99}
1022:
1023: \bibitem{Ag81} G.J. Agin,
1024: {\em Fitting Ellipses and General Second-Order Curves},
1025: Carnegi Mellon University, Robotics Institute, Technical Report 81-5, 1981.
1026:
1027: \bibitem{ARW01} S.J. Ahn, W. Rauh, and H.J. Warnecke,
1028: {\rm Least-squares orthogonal distances fitting of circle,
1029: sphere, ellipse, hyperbola, and parabola},
1030: {\em Pattern Recog.}, {\bf 34}, 2001, 2283--2303.
1031:
1032: \bibitem{An81} D. A. Anderson,
1033: The circular structural model,
1034: {\em J. R. Statist. Soc. B}, {\bf 27}, 1981, 131--141.
1035:
1036: \bibitem{BC86} M. Berman and D. Culpin,
1037: The statistical behaviour of some least squares estimators of the centre and radius of a
1038: circle, {\em J. R. Statist. Soc. B}, {\bf 48}, 1986, 183--196.
1039:
1040: \bibitem{Be89} M. Berman,
1041: Large sample bias in least squares estimators of a circular arc center and its
1042: radius,
1043: {\em Computer Vision, Graphics and Image Processing}, {\bf 45}, 1989, 126--128.
1044:
1045: \bibitem{Ch65} N. N. Chan,
1046: On circular functional relationships,
1047: {\em J. R. Statist. Soc. B}, {\bf 27}, 1965, 45--56.
1048:
1049: \bibitem{CT95} Y. T. Chan and S. M. Thomas,
1050: {\rm Cramer-Rao Lower Bounds for Estimation of a Circular Arc Center and Its
1051: Radius}, {\em Graph. Models Image Proc.} {\bf 57}, 1995, 527--532.
1052:
1053: \bibitem{CO84} N. I. Chernov and G. A. Ososkov,
1054: Effective algorithms for circle fitting,
1055: {\em Comp. Phys. Comm.} {\bf 33}, 1984, 329--333.
1056:
1057: \bibitem{CL02} N. Chernov and C. Lesort,
1058: {\rm Fitting circles and lines by least squares: theory and experiment},
1059: preprint, available at http://www.math.uab.edu/cl/cl1
1060:
1061: \bibitem{CBH01} W. Chojnacki, M.J. Brooks, and A. van den Hengel,
1062: {\rm Rationalising the renormalisation method of Kanatani},
1063: {\em J. Math. Imaging \& Vision}, {\bf 14}, 2001, 21--38.
1064:
1065: \bibitem{GGS94} W. Gander, G.H. Golub, and R. Strebel,
1066: {\rm Least squares fitting of circles and ellipses},
1067: {\em BIT} {\bf 34}, 1994, 558--578.
1068:
1069: \bibitem{Hu97} {\em Recent advances in total least squares techniques
1070: and errors-in-variables modeling}, Ed. by S. van Huffel, SIAM,
1071: Philadelphia, 1997.
1072:
1073: \bibitem{Ka96} K. Kanatani,
1074: {\em Statistical Optimization for Geometric Computation: Theory and Practice},
1075: Elsevier Science, Amsterdam, 1996.
1076:
1077: \bibitem{Ka98} K. Kanatani,
1078: {\rm Cramer-Rao lower bounds for curve fitting},
1079: {\em Graph. Models Image Proc.} {\bf 60}, 1998, 93--99.
1080:
1081: \bibitem{La87} U.M. Landau,
1082: {\rm Estimation of a circular arc center and its radius},
1083: {\em Computer Vision, Graphics and Image Processing}, {\bf 38} (1987),
1084: 317--326.
1085:
1086: \bibitem{LM00} Y. Leedan and P. Meer,
1087: {\rm Heteroscedastic regression in computer vision: Problems with bilinear
1088: constraint},
1089: {\em Intern. J. Comp. Vision}, {\bf 37}, 2000, 127--150.
1090:
1091: \bibitem{Pr87} V. Pratt,
1092: {\rm Direct least-squares fitting of algebraic surfaces},
1093: {\em Computer Graphics} {\bf 21}, 1987, 145--152.
1094:
1095: \bibitem{Sp96} H. Spath,
1096: {\rm Least-Squares Fitting By Circles},
1097: {\em Computing}, {\bf 57}, 1996, 179--185.
1098:
1099: \bibitem{Sp97} H. Spath,
1100: {\rm Orthogonal least squares fitting by conic sections},
1101: in {\em Recent Advances in Total Least Squares techniques and
1102: Errors-in-Variables Modeling}, SIAM, 1997, pp. 259--264.
1103:
1104: \bibitem{Ta91} G. Taubin,
1105: {\rm Estimation Of Planar Curves, Surfaces And Nonplanar
1106: Space Curves Defined By Implicit Equations,
1107: With Applications To Edge And Range Image Segmentation},
1108: {\em IEEE Transactions on Pattern Analysis and Machine
1109: Intelligence}, {\bf 13}, 1991, 1115--1138.
1110:
1111: \bibitem{Tu74} K. Turner, {\em Computer perception of curved
1112: objects using a television camera}, Ph.D.\ Thesis, Dept.\ of Machine
1113: Intelligence, University of Edinburgh, 1974.
1114:
1115:
1116: \end{thebibliography}
1117:
1118: \end{document}
1119: \end
1120:
1121: \bibitem{Ag81} Agin, G. J., 1981.
1122: {\em Fitting Ellipses and General Second-Order Curves},
1123: Carnegi Mellon University, Robotics Institute, Technical Report 81-5.
1124:
1125: \bibitem{ARW01} Ahn, S. J., Rauh, W., and Warnecke, H. J., 2001.
1126: {\rm Least-squares orthogonal distances fitting of circle,
1127: sphere, ellipse, hyperbola, and parabola},
1128: {\em Pattern Recog.}, {\bf 34}, 2283--2303.
1129:
1130: \bibitem{An81} Anderson, D. A., 1981.
1131: The circular structural model,
1132: {\em J. R. Statist. Soc. B}, {\bf 27}, 131--141.
1133:
1134: \bibitem{BC86} Berman, M. and Culpin, D., 1986.
1135: The statistical behaviour of some least squares estimators of the centre and radius of a
1136: circle, {\em J. R. Statist. Soc. B}, {\bf 48}, 183--196.
1137:
1138: \bibitem{Be89} Berman, M., 1989.
1139: Large sample bias in least squares estimators of a circular arc center and its
1140: radius, {\em Computer Vision, Graphics and Image Processing}, {\bf 45}, 126--128.
1141:
1142: \bibitem{Ch65} Chan, N. N., 1965.
1143: On circular functional relationships,
1144: {\em J. R. Statist. Soc. B}, {\bf 27}, 45--56.
1145:
1146: \bibitem{CT95} Chan, Y. T. and Thomas, S. M., 1995.
1147: {\rm Cramer-Rao Lower Bounds for Estimation of a Circular Arc Center and Its
1148: Radius}, {\em Graph. Models Image Proc.} {\bf 57}, 527--532.
1149:
1150: \bibitem{CO84} Chernov, N. I. and Ososkov, G. A., 1984.
1151: Effective algorithms for circle fitting,
1152: {\em Comp. Phys. Comm.} {\bf 33}, 329--333.
1153:
1154: \bibitem{CL02} N. Chernov and C. Lesort,
1155: {\rm Fitting circles and lines by least squares: theory and experiment},
1156: preprint, available at http://www.math.uab.edu/cl/cl1
1157:
1158: \bibitem{CBH01} Chojnacki, W., Brooks, M. J., and van den Hengel,A., 2001.
1159: {\rm Rationalising the renormalisation method of Kanatani},
1160: {\em J. Math. Imaging \& Vision}, {\bf 14}, 21--38.
1161:
1162: \bibitem{GGS94} Gander, W., Golub, G. H., and Strebel, R., 1994.
1163: {\rm Least squares fitting of circles and ellipses},
1164: {\em BIT} {\bf 34}, 558--578.
1165:
1166: \bibitem{Hu97} {\em Recent advances in total least squares techniques
1167: and errors-in-variables modeling}, Ed. by S. van Huffel, SIAM,
1168: Philadelphia, 1997.
1169:
1170: \bibitem{Ka96} Kanatani, K., 1996.
1171: {\em Statistical Optimization for Geometric Computation: Theory and Practice},
1172: Elsevier Science, Amsterdam.
1173:
1174: \bibitem{Ka98} Kanatani, K., 1998.
1175: {\rm Cramer-Rao lower bounds for curve fitting},
1176: {\em Graph. Models Image Proc.} {\bf 60}, 93--99.
1177:
1178: \bibitem{La87} Landau, U. M., 1987.
1179: {\rm Estimation of a circular arc center and its radius},
1180: {\em Computer Vision, Graphics and Image Processing}, {\bf 38},
1181: 317--326.
1182:
1183: \bibitem{LM00} Leedan, Y. and Meer, P., 2000.
1184: {\rm Heteroscedastic regression in computer vision: Problems with bilinear
1185: constraint},
1186: {\em Intern. J. Comp. Vision}, {\bf 37}, 127--150.
1187:
1188: \bibitem{Pr87} Pratt, V., 1987.
1189: {\rm Direct least-squares fitting of algebraic surfaces},
1190: {\em Computer Graphics} {\bf 21}, 145--152.
1191:
1192: \bibitem{Sp96} Spath, H., 1996.
1193: {\rm Least-Squares Fitting By Circles},
1194: {\em Computing}, {\bf 57}, 179--185.
1195:
1196: \bibitem{Sp97} Spath, H., 1997.
1197: {\rm Orthogonal least squares fitting by conic sections},
1198: in {\em Recent Advances in Total Least Squares techniques and
1199: Errors-in-Variables Modeling}, SIAM, 259--264.
1200:
1201: \bibitem{Ta91} Taubin, G., 1991.
1202: {\rm Estimation Of Planar Curves, Surfaces And Nonplanar
1203: Space Curves Defined By Implicit Equations,
1204: With Applications To Edge And Range Image Segmentation},
1205: {\em IEEE Transactions on Pattern Analysis and Machine
1206: Intelligence}, {\bf 13}, 1115--1138.
1207:
1208: \bibitem{Tu74} Turner, K., 1974. {\em Computer perception of curved
1209: objects using a television camera}, Ph.D.\ Thesis, Dept.\ of Machine
1210: Intelligence, University of Edinburgh.
1211:
1212: {\em Final remark}. In the theorem, we assumed that the vectors
1213: $v_1,\ldots,v_n$ spanned $\IR^k$. If they do not, then the matrix $B$
1214: will be singular and, furthermore, no proper sets of matrices
1215: $A_1,\ldots,A_n$ would exist in the above sense. However, the theorem
1216: can be modified as follows: first, in the definition of proper sets of
1217: matrices we must require that $r\in\, $span$\, \{v_1,\ldots,v_n\}$
1218: rather than $r\in\IR^k$, and second, the matrix $B^{-1}$ must be
1219: replaced by its generalized (Moore-Penrose) inverse $B^-$. The proof of
1220: the theorem in this case only requires minor changes, which we omit.
1221:
1222: {\em Remark}. Consider the following popular iterative algorithm:
1223: using the $k$-th approximation $\Theta^{(k)}$, one computes the
1224: weight $w_i = w(x_i,y_i;\Theta^{(k)})$, then substitutes $w_i$
1225: into (\ref{Fmain3}) and finds $\Theta^{(k+1)}$ by solving
1226: minimizing ${\cal F}_3(\Theta)$ assuming that the weights $w_i$
1227: are fixed (this often becomes a linear problem in $\Theta$, so it
1228: is easily solvable). If this algorithm converges, i.e.\ if
1229: $\Theta^{(k)}\to\hat{\Theta}$, then the limit $\hat{\Theta}$ is a
1230: solution of (\ref{weq1}). We emphasize that this method solves
1231: (\ref{weq1}) rather than (\ref{weq}). Therefore, the above
1232: procedure fails to minimize the proper objective function
1233: (\ref{Fmain3}). But the resulting error is negligibly small, as
1234: $\sigma \to 0$. This error does not alter the principal term of
1235: the covariance matrix of the solution $\hat{\Theta}$, hence it
1236: does not affect the statistical behavior of $\hat{\Theta}$. In
1237: practice, one often uses the above iterative procedure for
1238: minimizing (\ref{Fmain3}) and ignores the error it involves, see
1239: \cite{Sa82,Ta91}.
1240: