cs0308023/cs0308023
1: \documentclass[12pt]{article}
2: 
3: \textwidth6.25in \textheight8.5in \oddsidemargin.25in
4: \topmargin0in
5: 
6: \usepackage{epsfig}
7: %\usepackage{showkeys} % HERE: Comment this out in final version
8: 
9: %\renewcommand{\baselinestretch}{2.0}
10: 
11: \def\be{\begin{equation}}
12: \def\ee{\end{equation}}
13: \def\la{\langle}
14: \def\ra{\rangle}
15: \def\IP{\hbox{\rm I\kern -1.6pt{\rm P}}}
16: \def\IC{{\hbox{\rm C\kern-.58em{\raise.53ex\hbox{$\scriptscriptstyle|$}}
17:     \kern-.55em{\raise.53ex\hbox{$\scriptscriptstyle|$}} }}}
18: \def\IN{\hbox{I\kern-.2em\hbox{N}}}
19: \def\IR{\hbox{\rm I\kern-.2em\hbox{\rm R}}}
20: \def\ZZ{\hbox{{\rm Z}\kern-.3em{\rm Z}}}
21: \def\IT{\hbox{\rm T\kern-.38em{\raise.415ex\hbox{$\scriptstyle|$}} }}
22: %\newtheorem{theorem}{Theorem}[section]
23: \newtheorem{theorem}{Theorem}
24: \newtheorem{lemma}[theorem]{Lemma}
25: \newtheorem{sublemma}[theorem]{Sublemma}
26: \newtheorem{proposition}[theorem]{Proposition}
27: \newtheorem{corollary}[theorem]{Corollary}
28: \newtheorem{remark}[theorem]{Remark}
29: 
30: \begin{document}
31: 
32: \title{On the complexity of curve fitting algorithms}
33: \author{N. Chernov, C. Lesort, N. Sim\'{a}nyi\\
34: Department of Mathematics\\
35: University of Alabama at Birmingham\\
36: Birmingham, AL 35294, USA}
37: \date{\today}
38: \maketitle
39: 
40: %The corresponding author:\\
41: 
42: %\noindent
43: %Nikolai Chernov, the address above,\\
44: %E-mail: chernov@math.uab.edu\\
45: %Fax: 1-205-934-9025
46: 
47: %\newpage
48: 
49: \begin{abstract}
50: We study a popular algorithm for fitting polynomial curves to scattered
51: data based on the least squares with gradient weights. We show that
52: sometimes this algorithm admits a substantial reduction of complexity,
53: and, furthermore, find precise conditions under which this is possible.
54: It turns out that this is, indeed, possible when one fits circles but
55: not ellipses or hyperbolas.
56: \end{abstract}
57: 
58: %\vspace*{3cm}
59: %\begin{center}
60: %Keywords: least squares fit, curve fitting, algebraic gradient weight
61: %fitting, complexity.
62: %\end{center}
63: 
64: %\newpage
65: 
66: %\renewcommand{\theequation}{\arabic{section}.\arabic{equation}}
67: 
68: %\section{Introduction}
69: %\label{secI} \setcounter{equation}{0}
70: 
71: In many applications one needs to fit a curve described by a polynomial
72: equation
73: $$
74:            P(x,y;\Theta)=0
75: $$
76: (here $\Theta$ denotes the vector of unknown parameters) to
77: experimental data $(x_i,y_i)$, $i=1,\ldots,n$. In this equation $P$ is
78: a polynomial in $x$ and $y$, and its coefficients are either unknown
79: parameters or functions of unknown parameters. For example, a number of
80: recent publications \cite{CBH01,GGS94,LM00} are devoted to the problem
81: of fitting quadrics $Ax^2+ Bxy+ Cy^2+ Dx+ Ey+ F=0$, in which case
82: $\Theta=(A,B,C,D,E,F)$ is the parameter vector. The problem of fitting
83: circles, given by equation $(x-a)^2+ (y-b)^2 -R^2=0$ with three
84: parameters $a,b,R$, also arises in practice \cite{CO84,Ka98}.
85: 
86: It is standard to assume that the data $(x_i,y_i)$ are noisy
87: measurements of some true (but unknown) points $(\bar{x}_i,\bar{y}_i)$
88: on the curve, see \cite{BC86,CL03a,Ka96,Ka98} for details. The noise
89: vectors $e_i=(x_i-\bar{x}_i,y_i-\bar{y}_i)$ are then assumed to be
90: independent gaussian vectors with zero mean and a scalar covariance
91: matrix, $\sigma^2 I$. In this case the maximum likelihood estimate of
92: $\Theta$ is given by the {\em orthogonal least squares fit} (OLSF),
93: which is based on the minimization of the function
94: \be
95:        {\cal F}(\Theta) =  \sum_{i=1}^n d_i^2
96:          \label{Fmain1}
97: \ee
98: where $d_i$ denotes the distance from the point $(x_i,y_i)$ to the
99: curve $P(x,y;\Theta)=0$.
100: 
101: Under these assumptions the OLSF is statistically optimal -- it
102: provides estimates of $\Theta$ whose covariance matrix attains its
103: Rao-Cramer lower bound \cite{CL03a,Ka96,Ka98}. The OLSF is widely used
104: in practice, especially when one fits simple curves such as lines or
105: circles. However, for more general curves the OLSF becomes intractable,
106: because the precise distance $d_i$ is hard to compute. In those cases
107: one resorts to various alternatives, and the most popular one is the
108: {\em algebraic fit} (AF) based on the minimization of
109: \be
110:        {\cal F}_{\rm a}(\Theta) =
111:        \sum_{i=1}^n w_i\, [P(x_i,y_i;\Theta)]^2
112:          \label{Fmain2}
113: \ee
114: where $w_i=w(x_i,y_i;\Theta)$ are suitably defined weights. The choice
115: of the weight function $w(x,y;\Theta)$ is important.
116: % The minimization of (\ref{Fmain2}) is usually much cheaper than that
117: % of (\ref{Fmain1}), but the weights $w_i$ must be chosen wisely.
118: The AF is known \cite{CL03a} to provide a statistically optimal
119: estimate of $\Theta$ (in the sense that the covariance matrix will
120: attain its Rao-Cramer lower bound) if and only if the weight
121: function satisfies
122: \be
123:    w(x,y;\Theta) = a(\Theta) / \|\nabla P(x,y;\Theta)\|^2
124:      \label{wgrad}
125: \ee
126: for all points $x,y$ on the curve, i.e.\ such that
127: $P(x,y;\Theta)=0$. Here $\nabla P = (\partial P/\partial
128: x,\partial P/\partial y)$ is the gradient vector of the polynomial
129: $P$, and $a(\Theta)>0$ may be an arbitrary function of $\Theta$
130: (in practice, one simply sets $a(\Theta)=1$). Any other choice of
131: $w$ will result in the loss of accuracy, see \cite{CL03a}. We call
132: $w(x,y;\Theta)$ a {\em gradient weight function} if it satisfies
133: (\ref{wgrad}) for all $x,y$ on the curve $P(x,y;\Theta)=0$. The AF
134: (\ref{Fmain2}) with a gradient weight function $w(x,y;\Theta)$ is
135: commonly referred to as the {\em gradient weighted algebraic fit}
136: (GRAF). It was introduced in the mid-seventies \cite{Tu74} and
137: recently became standard for polynomial curve fitting, see, for
138: example, \cite{CBH01,LM00,Ta91}.
139: 
140: Even though the GRAF is much cheaper than the OLSF, it is still a
141: nonlinear problem requiring iterative methods. For example, in a
142: popular {\em reweight procedure} \cite{Sa82,Ta91} one uses the $k$-th
143: approximation $\Theta^{(k)}$ to compute the weights $w_i =
144: w(x_i,y_i;\Theta^{(k)})$ and then finds $\Theta^{(k+1)}$ by minimizing
145: (\ref{Fmain2}) regarding the just computed $w_i$'s as constants. Note
146: that if the parameters $\Theta$ are the coefficients of $P$, then
147: (\ref{Fmain2}), with fixed weights, becomes a quadratic function in
148: $\Theta$, and its minimum can be easily found. Another algorithm is
149: based on solving the equation $\nabla_{\Theta}{\cal F}_{\rm a}(\Theta)
150: = 0$, i.e.\
151: \be
152:    \sum P_i^2 \, \nabla_{\Theta} w_i +
153:    2 \sum w_i \, P_i \, \nabla_{\Theta} P_i = 0
154:       \label{weq}
155: \ee
156: for which various iterative schemes could be used. In the case of
157: fitting quadrics, for example, the most advanced algorithms are the
158: renormalization method \cite{Ka96}, the heteroscedastic
159: error-in-variables method \cite{LM00} and the fundamental numerical
160: scheme \cite{CBH01}. In all these algorithms, one needs to evaluate
161: ${\cal O}(n)$ terms at each iteration. Therefore, the complexity of
162: those algorithms is ${\cal O}(kn)$, where $k$ is the number of
163: iterations. Moreover, each algorithm requires access to individual
164: coordinates $x_i,y_i$ of the data points at each iteration. These
165: difficulties can be sometimes avoided in a remarkable way, as we show
166: next.
167: 
168: Suppose we need to fit circles given by equation
169: $$
170:      P(x,y)=(x-a)^2+ (y-b)^2-R^2=0.
171: $$
172: Then we have
173: \be
174:    \|\nabla P(x,y;\Theta)\|^2 =
175:     4(x-a)^2 +4(y-b)^2\\
176:    = 4P(x,y) + 4R^2
177:      \label{4444}
178: \ee
179: hence $\|\nabla P(x,y;\Theta)\|^2 = 4R^2$ for all the points
180: $(x,y)$ lying on the circle $P(x,y)=0$, and we can set
181: $w(x,y;\Theta) = 1/R^2$. Therefore
182: \begin{eqnarray}
183:     {\cal F}_{\rm a}(a,b,R) &=&
184:        \sum_{i=1}^n R^{-2} \left[x_i^2+y_i^2-2ax_i-2by_i+
185:        a^2+b^2-R^2\right]^2\nonumber\\
186:        &=& R^{-2}[z_1+az_2+bz_3+a^2z_4+b^2z_5+abz_6
187:        +cz_7+ac z_8+bc z_9+ c^2n]
188:          \label{FmainC}
189: \end{eqnarray}
190: where we denoted $c=a^2+b^2-R^2$ for brevity, and
191: $$
192:    z_1=\sum (x_i^2+y_i^2)^2,\ z_2=-4\sum x_i(x_i^2+y_i^2),\ldots
193: $$
194: are some expressions involving $x_i$ and $y_i$ only.
195: 
196: The minimization of (\ref{FmainC}) is still a nonlinear problem
197: requiring iterative methods \cite{CO84,CL02,Pr87}, but it has
198: obvious advantages over the reweight procedure described above and
199: other generic methods for solving the equation (\ref{weq}). First
200: of all, the values of $z_1,\ldots,z_9$ only need to be computed
201: once, and then the cost of minimization of (\ref{FmainC}) will not
202: depend on $n$ anymore. Thus, the complexity of this algorithm is
203: ${\cal O}(n) + {\cal O}(k)$, where ${\cal O}(n)$ is the cost of
204: evaluation of $z_1,\ldots,z_9$ and ${\cal O}(k)$ is the cost of
205: some $k$ iterations spent on the subsequent minimization of ${\cal
206: F}_{\rm a}(a,b,R)$. Moreover, once the values of $z_1,\ldots,z_9$
207: are computed and stored, the coordinates $x_i,y_i$ can be
208: destroyed. Practically, $z_1,\ldots,z_9$ can be computed
209: ``on-line'', when the data are collected. The minimization
210: procedure per se can be implemented ``off-line'', without storage
211: of (or access to) the data points. The quantities $z_1,\ldots,z_9$
212: here play the role of sufficient statistics.
213: 
214: Inspired by the above example, we might say that the problem of fitting
215: a polynomial curve $P(x,y;\Theta)=0$ {\em admits a reduction of
216: complexity} if there are $\ell$ functions
217: $z_j(x_1,y_1,\ldots,x_n,y_n)$, $1\leq j\leq\ell$, with $\ell $ being
218: independent of $n$ and $\Theta$, and a gradient weight function
219: $w(x,y;\Theta)$ such that
220: \be
221:        {\cal F}_{\rm a} = f(z_1,\ldots,z_{\ell};\Theta)
222:          \label{Fzz}
223: \ee
224: i.e.\ ${\cal F}_{\rm a}$ is a function of $z_1,\ldots,z_{\ell}$ and
225: $\Theta$ only.
226: 
227: This definition does not suggest how to find the functions
228: $z_1,\ldots,z_{\ell}$ in practical terms, though. Since ${\cal F}_{\rm
229: a}$ is given by (\ref{Fmain2}) with $P(x_i,y_i;\Theta)$ being a
230: polynomial in $x_i,y_i$, then the most natural (if not the only) way to
231: construct the functions $z_1,\ldots,z_{\ell}$ is to express the
232: gradient weight function (\ref{wgrad}) in the form
233: \be
234:        w(x,y;\Theta) = \sum_{k=1}^K C_k(\Theta)\, D_k(x,y)
235:          \label{wCD}
236: \ee
237: where $C_k$ are functions of the parameter vector $\Theta$ alone, and
238: $D_k$ are functions of $x$ and $y$ only (here the number of terms, $K$,
239: must be independent of $\Theta$). Indeed, suppose that the
240: representation (\ref{wCD}) is found. Since $P^2$ is a polynomial in
241: $x,y$, we can expand it as
242: $$
243:     P^2(x,y) = \sum_{p,q} c_{p,q}x^py^q
244: $$
245: where $c_{p,q} = c_{p,q} (\Theta)$ denote its coefficients. Now the
246: function ${\cal F}_{\rm a}$ can be evaluated as
247: \begin{eqnarray*}
248:    {\cal F}_{\rm a} &=& \sum_{k=1}^K\sum_{p,q}
249:    C_k(\Theta)c_{p,q}(\Theta)
250:    \sum_{i=1}^n x_i^py_i^qD_k(x_i,y_i) \\
251:    &=& \sum_{k=1}^K\sum_{p,q}
252:    C_k(\Theta)c_{p,q}(\Theta)
253:    z_{k,p,q}
254: \end{eqnarray*}
255: where
256: $$
257:    z_{k,p,q} = \sum_{i=1}^n x_i^py_i^qD_k(x_i,y_i)
258: $$
259: The values of $z_{k,p,q}$ depend on the data $x_i,y_i$ only, hence we
260: obtain the desired representation (\ref{Fzz}). Therefore, (\ref{wCD})
261: implies (\ref{Fzz}). We believe that the converse is also true, i.e.\
262: the conditions (\ref{Fzz}) and (\ref{wCD}) are actually equivalent, but
263: we do not attempt to prove that.
264: 
265: Motivated by the above considerations, we adopt the following
266: definition: the problem of fitting a polynomial curve $P(x,y;\Theta)=0$
267: {\em admits a reduction of complexity} if the gradient weight function
268: (\ref{wgrad}) can be expressed in the form (\ref{wCD}).
269: 
270: 
271: As we have seen, the problem of fitting circles admits a reduction
272: of complexity (and so does the simpler problem of fitting lines).
273: Now if the problem of fitting ellipses and/or hyperbolas admitted
274: a reduction of complexity as defined above, we would be able to
275: dramatically improve the known GRAF algorithms
276: \cite{CBH01,Ka96,LM00}. Unfortunately, this is impossible -- there
277: are deep mathematical reasons which prevent a reduction of
278: complexity in the case of ellipses, hyperbolas, and parabolas.
279: 
280: In this paper we find general conditions on the polynomial
281: $P(x,y;\Theta)$ under which the problem of fitting the curve
282: $P(x,y;\Theta)=0$ allows a reduction of complexity. It turns out
283: that lines and circles satisfy these conditions, but ellipses,
284: hyperbolas, and parabolas do not. Our results thus demonstrate (in
285: a rigorous mathematical way) that fitting noncircular conics is an
286: intrinsically more complicated problem than fitting circles or
287: lines.
288: 
289: For convenience, let us denote
290: $$
291:    Q(x,y;\Theta) :=
292:    \|\nabla P(x,y;\Theta)\|^2 =
293:    (\partial P/\partial x)^2+(\partial P/\partial y)^2
294: $$
295: Clearly, $Q(x,y;\Theta)$ is itself a polynomial in $x$ and $y$. Our
296: subsequent arguments will involve some facts from complex analysis. We
297: will treat $x$ and $y$ as {\em complex}, rather than {\em real},
298: variables.
299: 
300: \medskip\noindent{\bf Theorem}. {\em The problem of fitting curves
301: $P(x,y;\Theta)=0$ admits a reduction of complexity (as defined
302: above) under the condition that the system of polynomial
303: equations}
304: \be
305:   \left \{
306:   \begin{array}{c}
307:   P(x,y) = 0\\
308:   Q(x,y) = 0
309:   \end{array} \right .
310:   \label{PQ0}
311: \ee
312: {\em has no solutions, real or complex.}
313: \medskip
314: 
315: Before we prove our theorem, we shall show how to use it. For the
316: problem of fitting circles, we have already computed $Q=4P+4R^2$, see
317: (\ref{4444}), hence the system (\ref{PQ0}) has indeed no solutions for
318: nondegenerate circles (for which $R\neq 0$).
319: 
320: When using the theorem, the following {\em invariance} property
321: will be helpful. Let $(x,y)\mapsto (\tilde{x},\tilde{y})$ be a
322: transformation of the $xy$ plane that is a composition of
323: translations, rotations, mirror reflections and similarities (the
324: latter are defined by $(x,y)\mapsto (cx,cy)$ for some $c\neq 0$).
325: Denote by $\tilde{P}(\tilde{x},\tilde{y})$ the polynomial $P$ in
326: the new coordinates $\tilde{x},\tilde{y}$. Then the system
327: (\ref{PQ0}) has a solution (real or complex) if and only if the
328: corresponding system
329: $$
330:   \left \{
331:   \begin{array}{c}
332:   \tilde{P}(\tilde{x},\tilde{y}) = 0\\
333:   \tilde{Q}(\tilde{x},\tilde{y}) = 0
334:   \end{array} \right .
335: $$
336: has a solution, real or complex. Here $\tilde{Q} = \|\nabla
337: \tilde{P}\|^2$. This simple fact, which can be verified directly
338: by the reader, allows us to simplify the polynomial $P(x,y)$
339: before applying the theorem.
340: 
341: Consider the problem of fitting ellipses and hyperbolas. By using
342: a translation and rotation of the $xy$ plane we can always reduce
343: the polynomial $P$ to a canonical form $ax^2+by^2+c=0$ (with
344: $a\neq b$ and $abc\neq 0$). Then $Q=4a^2x^2+4b^2y^2$ and we arrive
345: at a system of equations
346: $$
347:   \left \{
348:   \begin{array}{c}
349:    ax^2+by^2+c = 0\\
350:    a^2x^2+b^2y^2 = 0
351:   \end{array} \right .
352: $$
353: It is easy to see that it always has a solution
354: $$
355:    x=\pm\sqrt{\frac{bc}{a(a-b)}},\quad
356:    y=\pm\sqrt{-\frac{ac}{b(a-b)}}
357: $$
358: (note that $x$ or $y$ may be an imaginary number, which is allowed
359: by our theorem). Therefore, the problem does not admit a reduction
360: of complexity.
361: 
362: If our curve is a parabola, then we can use its canonical equation
363: $y=cx^2$ for $c>0$, hence $P=y-cx^2$ and $Q=4c^2x^2+1$. Here again we
364: have a common zero of $P$ and $Q$ at the point $x={\bf i}/2c$ and
365: $y=-1/4c$. Thus, no conic sections (except circles) satisfy the
366: conditions of our theorem.
367: 
368: %Even though we only prove that the conditions of our theorem are
369: %sufficient for a reduction of complexity, we believe they are also
370: %necessary (but we do not attempt to prove their necessity).
371: 
372: We now prove our theorem. Since $w(x,y;\Theta)$ must be a gradient
373: weight function, the requirement (\ref{wCD}) is equivalent to
374: \be
375:      \frac{1}{Q(x,y)} = \sum_{k=1}^K C_k(\Theta)\, D_k(x,y)
376:      \ \ \ \ \ \ {\rm whenever}\ \ \ \
377:      P(x,y)=0
378:        \label{QU}
379: \ee
380: (here we incorporated the factor $a(\Theta)$ into the coefficients
381: $C_k(\Theta)$, for convenience). We emphasize that the left
382: identity in (\ref{QU}) does not have to hold on the entire $xy$
383: plane, it only has to hold {\em on the curve} $P(x,y)=0$. If we
384: denote that curve by $\cal L$, then (\ref{QU}) can be restated as
385: \be
386:      \frac{1}{Q(x,y)} = \sum_{k=1}^K C_k(\Theta)\, D_k(x,y)
387:      \ \ \ \ \ \ {\rm whenever}\ \ \ \
388:      (x,y)\in{\cal L}
389:        \label{QU1}
390: \ee
391: 
392: The functions $D_k(x,y)$ in (\ref{QU}) cannot be arbitrary, they
393: must be easily computable, i.e.\ available in the machine
394: arithmetics. That is, they must be combinations of elementary
395: functions -- polynomials, exponentials, logarithms, trigonometric
396: functions, etc. In that case $D_k(x,y)$ are analytic functions of
397: $x$ and $y$. Therefore, they have analytic extensions to the
398: two-dimensional complex plane $\IC^2$. We note that they do not
399: need be {\em entire functions}, i.e.\ analytic everywhere in
400: $\IC^2$, they may have some singularities. For example, the
401: function $(1+x^2+y^2)^{-1}$ is analytic in $\IR^2$ but has
402: singularities in $\IC^2$, e.g.\ the point $x={\bf i}$ and $y=0$ is
403: its singularity. Also, those extensions maybe multivalued
404: functions (examples are $\ln x$ or $\sqrt{x}$).
405: 
406: Now, the following function will also be analytic in $\IC^2$:
407: $$
408:   G(x,y) = 1 - Q(x,y)\sum_{k=1}^K C_k(\Theta)\, D_k(x,y)
409: $$
410: since it is a combination of analytic functions. By (\ref{QU1}),
411: it vanishes on the curve $\cal L$ in the real $xy$ plane. Consider
412: the subset ${\cal Z}\subset\IC^2$ defined by the equation
413: $P(x,y)=0$, where $x$ and $y$ are treated as complex variables.
414: Note that $\cal L$ is a curve on the two-dimensional manifold
415: $\cal Z$. We will prove that the function $G(x,y)$ vanishes on the
416: entire $\cal Z$.
417: 
418: We can assume that $P(x,y)$ is an irreducible polynomial
419: (otherwise we can apply our argument to each irreducible factor of
420: $P$). Then $\cal Z$ is an algebraic variety, hence it admits a
421: complex parametrization (a complex coordinate, $z$), and the
422: restriction of the function $G$ onto $\cal Z$ will be an analytic
423: function of $z$. It is known in complex analysis that if an
424: analytic function $G(z)$, $z\in\IC$, vanishes on a one-dimensional
425: curve in $\IC$, then it is identically zero on $\IC$, hence
426: $G(z)\equiv 0$ for all $z\in\IC$. In our case the curve on which
427: $G$ vanishes is $\cal L$ (and we assume, of course, that it is a
428: nondegenerate curve for all the relevant values of the parameter
429: $\Theta$). Hence, $G$ vanishes on the entire $\cal Z$, and
430: therefore
431: \be
432:      G(x,y)=0
433:      \ \ \ \ \ \ {\rm whenever}\ \ \ \
434:      (x,y)\in{\cal Z}
435:        \label{GP}
436: \ee
437: On the other hand, if the system of equations (\ref{PQ0}) has a
438: complex solution $(x,y)$, then (\ref{GP}) would be impossible,
439: since any solution of (\ref{PQ0}) lies on the manifold $\cal Z$
440: (because $P(x,y) = 0$), and at the same time $Q(x,y)=0$ implies
441: $G(x,y) = 1$. Therefore, if the system (\ref{PQ0}) has a solution
442: (real or complex), then the representation (\ref{wCD}) cannot
443: possibly exist.
444: 
445: %One can argue here that if the solutions of the system (\ref{PQ0})
446: %belonged to the singularities of the functions $D_k(x,y)$, then
447: %(\ref{GP}) might still hold on the domain of the function $G(x,y)$.
448: %However, this objection is easy to overturn. Indeed, the functions
449: %$D_k(x,y)$ are independent of the parameter vector $\Theta$, hence
450: %their singularities in $\IC^2$ are fixed. On the contrary, both
451: %functions $P(x,y)$ and $Q(x,y)$ in (\ref{PQ0}) depend on $\Theta$, thus
452: %the solutions of that system ``float'' in $\IC^2$ as $\Theta$ changes,
453: %and so they cannot be always ``blocked'' by the singularities of $D_k$.
454: 
455: It remains to show that if the system (\ref{PQ0}) has no solutions,
456: then the representation (\ref{wCD}) is possible, and hence our problem
457: indeed admits a reduction of complexity. Assuming that (\ref{PQ0}) has
458: no solutions, we will construct the representation (\ref{wCD}) in the
459: simplest, polynomial form:
460: \be
461:        w(x,y;\Theta) = \sum_{p,q} w_{p,q}(\Theta)\, x^py^q
462:          \label{wmn}
463: \ee
464: the degree of this polynomial being independent of the parameter
465: $\Theta$. Consider a polynomial equation
466: \be
467:     P(x,y)\, U(x,y) + Q(x,y)\, W(x,y) = 1
468:       \label{UW}
469: \ee
470: where $U(x,y)$ and $W(x,y)$ are unknown polynomials. A classical
471: mathematical theorem, Hilbert's Nullstellensatz \cite{ZS}, says
472: that the equation (\ref{UW}) has polynomial solutions $U(x,y)$ and
473: $W(x,y)$ if and only if $P(x,y)$ and $Q(x,y)$ have no common
474: zeroes in $\IC^2$, i.e.\ whenever the system (\ref{PQ0}) has no
475: complex solutions, which is exactly what we have assumed. Note
476: that since $P$ and $Q$ depend on $\Theta$, then so do $U$ and $W$,
477: but we suppress this dependence in the equation (\ref{UW}).
478: 
479: Now the polynomial $W(x,y)$ solving (\ref{UW}) gives us the weight
480: function $w(x,y;\Theta)=W(x,y)$, and it is easy to see that
481: $$
482:      W(x,y) = 1/Q(x,y)
483:      \ \ \ \ \ \ {\rm whenever}\ \ \ \
484:      P(x,y) = 0
485: $$
486: Technically, the theorem is proved, but we make a further
487: practical remark. Suppose we know that the system (\ref{PQ0}) has
488: no solutions, so that the problem admits a reduction of
489: complexity. In this case we need to find the polynomial $W(x,y)$
490: solving (\ref{UW}) in an explicit form, in order to determine the
491: weight function $w(x,y;\Theta)$. To this end we describe a finite
492: and relatively simple algorithm for computing the coefficients
493: $w_{pq}$ of the polynomial $W$. We substitute the expansions
494: $$
495:        W(x,y) = \sum_{p,q} w_{p,q}\, x_i^py_i^q
496:        \ \ \ \ \ {\rm and}\ \ \ \ \
497:        U(x,y) = \sum_{p,q} u_{p,q}\, x_i^py_i^q
498: $$
499: into the identity (\ref{UW}) and then equate the terms on the left hand
500: side and those on the right hand side with the same degrees of the
501: variables $x,y$. This gives a linear system of equations for the
502: unknown coefficients $w_{pq}$ and $u_{pq}$. This might be a large
503: system (its size depends on the degrees of $U$ and $W$), but it is a
504: linear system whose solution can be found by routine matrix methods. If
505: the assumed degrees of $U$ and $V$ are high enough, then the above
506: system is always solvable by the so called {\em effective
507: Nullstellensatz}, see \cite{S67}. By solving that system we can obtain
508: explicit formulas for the coefficients $w_{pq}$ and $u_{pq}$. In fact,
509: we only need the coefficients of $W$, not $U$. Lastly, we remark that
510: those coefficients will be rational functions of the coefficients of
511: the polynomial $P(x,y)$, hence they will be easily computable.
512: 
513: \noindent{\bf Acknowledgement}. N. Chernov is partially supported by
514: NSF grant DMS-0098788 and N.~Sim\'{a}nyi is partially supported by NSF
515: grant DMS-0098773.
516: 
517: 
518: 
519: \begin{thebibliography}{99}
520: 
521: \bibitem{BC86} M. Berman and D. Culpin,
522:     The statistical behaviour of some least squares estimators of the centre and radius of a
523:     circle,  {\em J. R. Statist. Soc. B}, {\bf 48}, 1986, 183--196.
524: 
525: \bibitem{CO84} N. I. Chernov and G. A. Ososkov,
526:     Effective algorithms for circle fitting,
527:     {\em Comp. Phys. Comm.} {\bf 33}, 1984, 329--333.
528: 
529: \bibitem{CL02} N. Chernov and C. Lesort,
530:     {\rm Fitting circles and lines by least squares: theory and experiment},
531:     preprint, available at http://www.math.uab.edu/cl/cl1
532: 
533: \bibitem{CL03a} N. Chernov and C. Lesort,
534:     {\rm Statistical efficiency of curve fitting algorithms},
535:     preprint, available at http://www.math.uab.edu/cl/cl2
536: 
537: \bibitem{CBH01} W. Chojnacki, M. J. Brooks, and A. van den Hengel,
538:     {\rm Rationalising the renormalisation method of Kanatani},
539:     {\em J. Math. Imaging \& Vision}, {\bf 14}, 2001, 21--38.
540: 
541: \bibitem{GGS94} W. Gander, G. H. Golub, and R. Strebel,
542:     {\rm Least squares fitting of circles and ellipses},
543:     {\em BIT} {\bf 34}, 1994, 558--578.
544: 
545: \bibitem{Ka96} K. Kanatani,
546:     {\em Statistical Optimization for Geometric Computation: Theory and Practice},
547:     Elsevier Science, Amsterdam, 1996.
548: 
549: \bibitem{Ka98} K. Kanatani,
550:     {\rm Cramer-Rao lower bounds for curve fitting},
551:     {\em Graph. Models Image Proc.} {\bf 60}, 1998, 93--99.
552: 
553: \bibitem{LM00} Y. Leedan and P. Meer,
554:     {\rm Heteroscedastic regression in computer vision: Problems with bilinear
555:     constraint},
556:     {\em Intern. J. Comp. Vision}, {\bf 37}, 2000, 127--150.
557: 
558: \bibitem{Pr87} V. Pratt,
559:     {\rm Direct least-squares fitting of algebraic surfaces},
560:     {\em Computer Graphics} {\bf 21}, 1987, 145--152.
561: 
562: \bibitem{Sa82} P. D. Sampson,
563:     {\rm Fitting conic sections to very scattered data:
564:     an iterative refinement of the Bookstein algorithm},
565:     {\em Comp. Graphics Image Proc.} {\bf 18}, 1982, 97--108.
566: 
567: \bibitem{S67} J. R. Shoenfield, {\em Mathematical logic}, Reading, Mass.,
568: Addison-Wesley, 1967, p. 100, Ex. 18 (e).
569: 
570: \bibitem{Ta91} G. Taubin,
571:     {\rm Estimation Of Planar Curves, Surfaces And Nonplanar
572:     Space Curves Defined By Implicit Equations,
573:     With Applications To Edge And Range Image Segmentation},
574:     {\em IEEE Transactions on Pattern Analysis and Machine
575:     Intelligence},  {\bf 13}, 1991, 1115--1138.
576: 
577: \bibitem{Tu74} K. Turner, {\em Computer perception of curved
578: objects using a television camera}, Ph.D.\ Thesis, Dept.\ of Machine
579: Intelligence, University of Edinburgh, 1974.
580: 
581: \bibitem{ZS} O. Zariski and P. Samuel, {\em Commutative algebra}, Vol. 2.
582: Princeton, N.J., Van Nostrand [1958-60], p. 164.
583: 
584: \end{thebibliography}
585: 
586: \end{document}
587: \end
588: 
589: \bibitem{Ag81} Agin, G. J., 1981.
590:     {\em Fitting Ellipses and General Second-Order Curves},
591:     Carnegi Mellon University, Robotics Institute, Technical Report 81-5.
592: 
593: \bibitem{ARW01} Ahn, S. J., Rauh, W., and Warnecke, H. J., 2001.
594:     {\rm Least-squares orthogonal distances fitting of circle,
595:     sphere, ellipse, hyperbola, and parabola},
596:     {\em Pattern Recog.}, {\bf 34}, 2283--2303.
597: 
598: \bibitem{An81} Anderson, D. A., 1981.
599:     The circular structural model,
600:     {\em J. R. Statist. Soc. B}, {\bf 27}, 131--141.
601: 
602: \bibitem{BC86} Berman, M. and Culpin, D., 1986.
603:     The statistical behaviour of some least squares estimators of the centre and radius of a
604:     circle,  {\em J. R. Statist. Soc. B}, {\bf 48}, 183--196.
605: 
606: \bibitem{Be89} Berman, M., 1989.
607:     Large sample bias in least squares estimators of a circular arc center and its
608:     radius,  {\em Computer Vision, Graphics and Image Processing}, {\bf 45}, 126--128.
609: 
610: \bibitem{Ch65} Chan, N. N., 1965.
611:     On circular functional relationships,
612:     {\em J. R. Statist. Soc. B}, {\bf 27}, 45--56.
613: 
614: \bibitem{CT95} Chan, Y. T. and Thomas, S. M., 1995.
615:     {\rm Cramer-Rao Lower Bounds for Estimation of a Circular Arc Center and Its
616:     Radius},  {\em Graph. Models Image Proc.} {\bf 57}, 527--532.
617: 
618: \bibitem{CO84} Chernov, N. I. and Ososkov, G. A., 1984.
619:     Effective algorithms for circle fitting,
620:     {\em Comp. Phys. Comm.} {\bf 33}, 329--333.
621: 
622: \bibitem{CL02} N. Chernov and C. Lesort,
623:     {\rm Fitting circles and lines by least squares: theory and experiment},
624:     preprint, available at http://www.math.uab.edu/cl/cl1
625: 
626: \bibitem{CBH01} Chojnacki, W., Brooks, M. J., and van den Hengel,A., 2001.
627:     {\rm Rationalising the renormalisation method of Kanatani},
628:     {\em J. Math. Imaging \& Vision}, {\bf 14}, 21--38.
629: 
630: \bibitem{GGS94} Gander, W., Golub, G. H., and Strebel, R., 1994.
631:     {\rm Least squares fitting of circles and ellipses},
632:     {\em BIT} {\bf 34}, 558--578.
633: 
634: \bibitem{Hu97} {\em Recent advances in total least squares techniques
635: and errors-in-variables modeling}, Ed. by S. van Huffel, SIAM,
636: Philadelphia, 1997.
637: 
638: \bibitem{Ka96} Kanatani, K., 1996.
639:     {\em Statistical Optimization for Geometric Computation: Theory and Practice},
640:     Elsevier Science, Amsterdam.
641: 
642: \bibitem{Ka98} Kanatani, K., 1998.
643:     {\rm Cramer-Rao lower bounds for curve fitting},
644:     {\em Graph. Models Image Proc.} {\bf 60}, 93--99.
645: 
646: \bibitem{La87} Landau, U. M., 1987.
647:     {\rm Estimation of a circular arc center and its radius},
648:     {\em Computer Vision, Graphics and Image Processing}, {\bf 38},
649:     317--326.
650: 
651: \bibitem{LM00} Leedan, Y. and Meer, P., 2000.
652:     {\rm Heteroscedastic regression in computer vision: Problems with bilinear
653:     constraint},
654:     {\em Intern. J. Comp. Vision}, {\bf 37}, 127--150.
655: 
656: \bibitem{Pr87} Pratt, V., 1987.
657:     {\rm Direct least-squares fitting of algebraic surfaces},
658:     {\em Computer Graphics} {\bf 21}, 145--152.
659: 
660: \bibitem{Sp96} Spath, H., 1996.
661:     {\rm Least-Squares Fitting By Circles},
662:     {\em Computing}, {\bf 57}, 179--185.
663: 
664: \bibitem{Sp97} Spath, H., 1997.
665:     {\rm Orthogonal least squares fitting by conic sections},
666:     in {\em Recent Advances in Total Least Squares techniques and
667:     Errors-in-Variables Modeling}, SIAM, 259--264.
668: 
669: \bibitem{Ta91} Taubin, G., 1991.
670:     {\rm Estimation Of Planar Curves, Surfaces And Nonplanar
671:     Space Curves Defined By Implicit Equations,
672:     With Applications To Edge And Range Image Segmentation},
673:     {\em IEEE Transactions on Pattern Analysis and Machine
674:     Intelligence},  {\bf 13}, 1115--1138.
675: 
676: \bibitem{Tu74} Turner, K., 1974. {\em Computer perception of curved
677: objects using a television camera}, Ph.D.\ Thesis, Dept.\ of Machine
678: Intelligence, University of Edinburgh.
679: 
680: {\em Final remark}. In the theorem, we assumed that the vectors
681: $v_1,\ldots,v_n$ spanned $\IR^k$. If they do not, then the matrix $B$
682: will be singular and, furthermore, no proper sets of matrices
683: $A_1,\ldots,A_n$ would exist in the above sense. However, the theorem
684: can be modified as follows: first, in the definition of proper sets of
685: matrices we must require that $r\in\, $span$\, \{v_1,\ldots,v_n\}$
686: rather than $r\in\IR^k$, and second, the matrix $B^{-1}$ must be
687: replaced by its generalized (Moore-Penrose) inverse $B^-$. The proof of
688: the theorem in this case only requires minor changes, which we omit.
689: 
690: {\em Remark}. Consider the following popular iterative algorithm:
691: using the $k$-th approximation $\Theta^{(k)}$, one computes the
692: weight $w_i = w(x_i,y_i;\Theta^{(k)})$, then substitutes $w_i$
693: into (\ref{Fmain3}) and finds $\Theta^{(k+1)}$ by solving
694: minimizing ${\cal F}_3(\Theta)$ assuming that the weights $w_i$
695: are fixed (this often becomes a linear problem in $\Theta$, so it
696: is easily solvable). If this algorithm converges, i.e.\ if
697: $\Theta^{(k)}\to\hat{\Theta}$, then the limit $\hat{\Theta}$ is a
698: solution of (\ref{weq1}). We emphasize that this method solves
699: (\ref{weq1}) rather than (\ref{weq}). Therefore, the above
700: procedure fails to minimize the proper objective function
701: (\ref{Fmain3}). But the resulting error is negligibly small, as
702: $\sigma \to 0$. This error does not alter the principal term of
703: the covariance matrix of the solution $\hat{\Theta}$, hence it
704: does not affect the statistical behavior of $\hat{\Theta}$. In
705: practice, one often uses the above iterative procedure for
706: minimizing (\ref{Fmain3}) and ignores the error it involves, see
707: \cite{Sa82,Ta91}.
708: 
709: For example, when one fits ellipses and hyperbolas, then $P$ is a
710: quadratic polynomial, and the function (\ref{Fmain2}) becomes
711: relatively simple:
712: \be
713:    \sum_{i=1}^n \frac{\Theta^T{\bf A}_i\Theta}{\Theta^T{\bf B}_i\Theta}
714:    \ \to \ \min
715:      \label{TABTmin}
716: \ee
717: where ${\bf A}_i$ and ${\bf B}_i$
718: