1: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2: % mst.tex -
3: % Approximating the minimum spanning tree under the
4: % intersection metric.
5: %
6: % Sariel Har-Peled and Piotr Indyk
7: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
8:
9:
10: \documentclass[12pt]{article}
11: \usepackage{amstext}
12: \usepackage{amsmath,amssymb}
13: \usepackage{theorem}
14: \usepackage{enumerate}
15: \usepackage{tabularx}
16: \usepackage{graphicx}
17: \usepackage{sariel,wide}
18: \usepackage{url,hyperref}
19:
20:
21: \newcommand{\Wopt}{{\cal W}_{opt}}
22: \newcommand{\Topt}{{\cal T}_{opt}}
23: \newcommand{\Wf}{{\cal W}}
24: \newcommand{\weight}{{\mathop{\mathrm{weight}}}}
25: \newcommand{\IDL}{\D_L}
26: \newcommand{\IDX}[1]{\D_{{#1}}}
27: \newcommand{\MSTA}{\widehat{\MST}}
28: \newcommand{\ApproxMST}{{\tt ApproxMST}}
29: \newcommand{\PropagateWavefront}{{\tt PropagateWavefront}}
30: \newcommand{\PropagateApproxWavefront}{{\tt PropagateApproxWavefront}}
31: \newcommand{\cAnother}{c_1}
32: \newcommand{\cmindist}{c_5}
33: \newcommand{\cSampleProb}{c_6}
34: \newcommand{\cFarEnough}{c_7}
35: \newcommand{\cSample}{c_{samp}}
36: \newcommand{\Gadj}{G_{adj}}
37: \newcommand{\Ot}{\widetilde{O}}
38: \newcommand{\RS}{{\cal RS}}
39: \newcommand{\polylog}{\mathop{\mathrm{polylog}}}
40: \newcommand{\vL}{\vec{v}_L}
41: \newcommand{\plow}{z}
42: \newcommand{\phigh}{Z}
43: \newcommand{\HX}{{\cal H}}
44: \newcommand{\R}{{\cal R}}
45: \newcommand{\lshort}{l_{short}}
46: \newcommand{\out}{\mathrm{out}}
47: \newcommand{\Oe}{O_\eps}
48: \newcommand{\EmbedDim}{7}
49:
50: % Title
51: \title{When Crossings Count --- Approximating the Minimum
52: Spanning Tree\thanks{A preliminary version of the paper
53: appeared in the {\em 16th ACM Symposium of
54: Computational Geometry}, 166--175, 2000.}}
55: %\remove{
56: \author{Sariel Har-Peled\sarielthanks{}
57: \and
58: Piotr Indyk\thanks{MIT Laboratory for Computer Science;
59: 545 Technology Square, NE43-373;
60: Cambridge, Massachusetts 02139-3594;
61: {{\tt indyk\atgen{}theory.lcs.mit.edu}}}}
62: %}
63: \date{\today}
64:
65: \begin{document}
66: %\let\ps@plain=\ps@empty
67: %\nopagenumber{}
68: \maketitle
69: %\renewcommand{\thefootnote}{}
70: %\copyrightspace{}
71: %\renewcommand\thefootnote{\arabic{footnote}}%
72: \begin{abstract}
73: We present an $(1+\eps)$-approximation algorithm for
74: computing the minimum-spanning tree of points in a
75: planar arrangement of lines, where the metric is the
76: number of crossings between the spanning tree and the
77: lines. The expected running time of the algorithm is
78: near linear. We also show how to embed such a crossing
79: metric of hyperplanes in $d$-dimensions, in subquadratic
80: time, into high-dimensions so that the distances are
81: preserved. As a result, we can deploy a large
82: collection of subquadratic approximations algorithms
83: \cite{im-anntr-98,giv-rahdp-01} for problems involving
84: points with the crossing metric as a distance function.
85: Applications include MST, matching, clustering,
86: nearest-neighbor, and furthest-neighbor.
87: \end{abstract}
88:
89: \begin{figure}
90: \centerline{\includegraphics{figs/crossing-mst}}
91:
92: \caption{A set of lines and points, and the resulting
93: crossing MST. Note that in this case the crossing MST
94: is different from the Euclidean MST.}
95:
96: \figlab{mst}
97: \end{figure}
98:
99: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
100: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
101:
102: \section{Introduction}
103:
104: Given a set of lines in the plane a natural measure of
105: distances between any two points is the number of lines one
106: has to cross to reach from one point to the other. This is
107: a discrete distance measure that can be used to approximate
108: the Euclidean distance and other distance measures.
109: However, since this measure is defined by an arrangement of
110: lines it is not locally defined and is thus computationally
111: cumbersome. Finding the minimum spanning tree (MST) of a set
112: of points, so that the number of intersections between the
113: tree and the given set of lines is minimized, quantify how
114: the set of points interact with the set of lines; see Figure
115: \ref{fig:mst}. In fact, when the set of lines is the set of
116: all possible lines, then this MST is the standard Euclidean
117: MST \cite{af-acrsm-97} (here one minimizes the average
118: number of edges of the MST crossed when picking a random
119: line). Such an MST is related to a spanning tree of low
120: stabbing number (STLSN) \cite{w-stlcn-92,a-idapa-91}. While
121: the spanning tree of low stabbing number guarantee that any
122: line intersects at most $O(\sqrt{n})$ edges of the spanning
123: tree, the MST guarantees that the overall number of
124: intersections between the tree and a given set of lines is
125: minimized. Thus, if we have the set of lines in advance
126: then the MST will have overall less intersections than the
127: STLSN. The spanning tree of low-stabbing number was used in
128: several applications, see for example
129: \cite{a-idapa-91,mww-deabv-91}. In particular, having such
130: an MST enables one: (i) to answer half-plane range queries
131: in an efficient manner using a near linear space
132: \cite{ghs-citds-91}, (ii) bound the complexity of the faces
133: of the arrangement of lines that contain the points
134: \cite{hs-oplpa-01-dcg}, and (iii) traverse between the points in
135: an efficient way, so that the number of updates needed is
136: minimized. (Imagine traversing among the points and
137: maintaining the set of half-planes that contain the current
138: point. Each time one crosses a line an update operation is
139: performed.)
140:
141: Computing the MST for the general case of arcs can be done
142: in $O(n^2\log{n})$ time by performing wavefront propagation
143: from each of the points (see Section
144: \ref{sec:cont:dijkstra}). As for approximation algorithms,
145: Har-Peled and Sharir \cite{hs-oplpa-01-dcg} gave recently an
146: approximation algorithm for the case of arcs, computing a
147: Steiner tree in expected running time $w=O(
148: \lambda_{t+2}(n+\Wopt) \log(n))$, where $t$ is the maximum
149: number of intersections between a pair of arcs,
150: $\lambda_{t+2}(\cdot)$ is the maximum length of a
151: Davenport-Schinzel sequence of order $t$, and $\Wopt$ is the
152: weight of the optimal Steiner tree.\footnote{It is easy to
153: verify that if we have triangle inequality then the
154: Steiner tree weight is at least half the weight of the
155: MST.} The algorithm outputs a tree of weight $w$ (and
156: thus gives roughly $O(\log n)$-approximation).
157:
158: In this paper, we present two results:
159: \begin{itemize}
160: \item A near linear time $(1+\eps)$-approximation algorithm to the
161: minimum-spanning tree under the crossing metric in the
162: planar case.
163:
164: \item We show to embed the crossing metric among
165: hyperplanes into a Hamming distance in high dimensions.
166: As a result, we show how one can apply known
167: subquadratic approximation algorithms for problems
168: involving point-sets and hyperplanes in high dimensions
169: (MST, clustering, matching, etc).
170:
171: \remove{ Intuitively, all those applications rely on a
172: black-box (called $\eps$-PLEB in \cite{im-anntr-98})
173: that decides whether, for a query point $q$, there is
174: no point close to $q$ in our point-set (i.e., $\geq
175: r$ for all points), or alternatively finds a nearby
176: point (i.e., $\leq (1+\eps)r$) under the crossing
177: metric, where $r$ is a prespecified threshold.
178:
179: Using $\eps$-PLEB for the embedded points
180: \cite{im-anntr-98} we construct the required
181: $\eps$-PLEB for the points we started with. For
182: $d>2$ dimensions, we show how to embed the crossing
183: metric induced by $n$ hyperplanes over a set of $n$
184: points in $\Re^d$, in $O(n^{2d/(d+1) + \delta})$
185: time, where $\delta>0$ is arbitrary.
186:
187: we can maintain the dynamic $c$-approximate nearest
188: neighbor problem over this $n$-point metric in
189: $\Ot( n^{1/c})$ time per
190: operation~\cite{im-anntr-98}. This in turn implies
191: dynamic amortized $\Ot(n^{4/3} +
192: n^{1+1/c})$-time $c$-approximation algorithms for
193: bichromatic closest pair~\cite{e-fhcoa-98} and
194: $\Ot(n^{4/3} + n^{1+1/c})$-time algorithms for:
195: $c$-approximate diameter and discrete minimum
196: enclosing ball \cite{giv-rahdp-01},
197: $O(c)$-approximate facility location and bottleneck
198: matching (all for $d=2$) \cite{giv-rahdp-01}. }
199:
200: The connection between the crossing metric, and points
201: in high dimension follows by interpreting the input
202: points as points in abstract VC-space \cite{pa-cg-95}
203: induced by the lines. Namely, we associate with each
204: point in the plane, an $n$-dimensional binary vector,
205: where $i$-th coordinate indicate on which side of the
206: $i$-th line the point lies. In this way, we mapped our
207: input points into points lying on the $n$-dimensional
208: hypercube. The crossing metric is no more than the
209: Hamming distance between the mapped points. We can now
210: deploy the techniques of \cite{im-anntr-98} to those
211: mapped points, yielding an approximation algorithm for
212: the MST problem. Bringing down the running time to be
213: subquadratic requires some additional work.
214:
215: Specifically, we show how to compute a mapping of the
216: points into space of dimension $O(\log^\EmbedDim n)$; this
217: embedding can be computed in $\Ot(n^{4/3})$
218: time\footnote{Here and in the rest of this paper
219: $f(n)=\Ot(g(n))$ iff $f(n)=g(n)
220: (1/\eps)^{O(1)}\log^{O(1)}n$, and
221: $f(n)={O_\eps}(g(n))$ iff $f(n)=O( g(n)
222: /\eps^{O(1)})$}, for $n$ points, so that we get a
223: $(1+\eps)$ gap property for a specified range of
224: distances is preserved.
225:
226: As a result, we can solve several approximation problems
227: for this metric, among them is the MST problem. In
228: fact, our near-linear approximate MST algorithm in the
229: plane can be roughly viewed as an unraveling of the
230: corresponding MST approximation algorithm in high
231: dimensions. Similar bounds can be derived for $d > 2$
232: dimensions. See \secref{embed} for details.
233: \end{itemize}
234:
235: The paper is organized as follows: In Section
236: \ref{sec:cont:dijkstra}, we describe how one can compute the
237: exact MST using wavefront propagation. In \secref{speedup},
238: we present the planar $(1+\eps)$-approximation algorithm for
239: the MST. Next, in Section \ref{sec:embed}, we
240: describe the embedding into points in high dimension and
241: demonstrate its usage for computing an approximate MST.
242: Concluding remarks are given in Section \ref{sec:conc}.
243:
244:
245: \begin{figure}
246: \begin{center}
247: \begin{tabular}{cc}
248: {\includegraphics{figs/mst0}}
249: &
250: {\includegraphics{figs/mst1}}\\
251: (i) & (ii)\\
252: \\
253: {\includegraphics{figs/mst2}}
254: &
255: {\includegraphics{figs/mst3}}\\
256: (iii) & (iv)\\
257: \end{tabular}
258: \end{center}
259:
260: \caption{Computing the crossing MST by doing wavefront
261: propagation. The thick lines denote the boundary
262: of the current connected components of the
263: spanning forest.}
264:
265: \figlab{mst:wavefront}
266: \end{figure}
267:
268: %-------------------------------------------------------------
269: %-------------------------------------------------------------
270:
271: \section{Minimum Spanning Tree by Continuous Dijkstra}
272: \seclab{cont:dijkstra}
273:
274: In this section, we present a simple algorithm for computing
275: the crossing MST. It relies on a simple direct solution
276: interpreted as a geometric algorithm. We also present a
277: ``weight sensitive'' algorithm (\lemref{propagate}) that
278: computes portions of the MST in time proportional to its
279: overall weight.
280:
281: In the following, we assume that we are given a set $L$ of
282: lines and a set $P$ of points in the plane. For simplicity,
283: we assume $|P| = |L| = n$.
284:
285: \begin{defn}
286: For a set $L$ of lines, the {\em crossing metric} is
287: defined to be the minimum number of lines of $L$ that
288: one has to cross as one moves between two prespecified
289: points. Thus, for a pair of points $p,q \in \Re^2$ the
290: crossing distance between $p$ and $q$, denoted by
291: $\IDL(p,q)$, is the number of lines of $L$ that
292: intersects the segment $p q$. If $L$ is a set of arcs, a
293: similar crossing metric is defined, although the
294: ``shortest path'' in this case is no longer necessarily
295: a straight segment.
296: \end{defn}
297:
298: \begin{defn}
299: For a set $L$ of lines, and a set $P$ of points in the
300: plane, let $\Topt(P,L)$ denote a minimum spanning tree
301: of $P$ under the crossing metric induced by $L$, and let
302: $\Wopt(P,L)$ denote the weight of $\Topt(P,L)$.
303: \end{defn}
304:
305: Let $\Arr = \Arr(L)$ denote the planar arrangement induced
306: by the lines of $L$. Let $\Gadj = \Gadj(\A)$ be the
307: adjacency graph of $\Arr$; namely, each face of $\Arr$ is a
308: vertex, and two vertices are connected if the two
309: corresponding faces share an edge. Let $V$ be the set of
310: vertices of $\Gadj$ that corresponds to the faces of $\Arr$
311: that contains points of $P$. Clearly, the crossing MST of
312: $P$ in $\Arr$, corresponds to the MST of $V$ in the graph
313: $\Gadj$ (here, each edge has associated weight $1$).
314:
315: Computing the MST of $V$ in $\Gadj$ can be done by performing
316: a simultaneous flooding of $\Gadj$ from the vertices of $V$.
317: Indeed, we compute in the $i$-th iteration all the vertices
318: of $\Gadj$ that are in distance $\leq i$ from any vertex of
319: $V$. This can be easily done using a modified BFS. In the
320: beginning, the flood front is made out of $n$ connected
321: components. Every time two connected components of the flood
322: front collide, we discovered a new edge of the MST. This edge
323: connects the two vertices that induced the two parts of the
324: wavefront that collided. This is a somewhat non-standard
325: algorithm for computing the MST, but one can easily verify
326: that it indeed computes the MST of $V$ in $\Gadj$.
327:
328: This flooding algorithm has a natural geometric
329: interpretation: Let $\F_{2i}$ denote the set of all faces of
330: $\Arr$ that are in (crossing) distance at most $i$ from any
331: point of $P$. Clearly, $\F_0$ is the set of faces of $\Arr$
332: that contain points of $P$. The algorithm works in $n/2$
333: phases. We do a wavefront propagation in $\Gadj$, starting
334: from all the vertices that correspond to the marked faces
335: (i.e., faces of $\Arr$ that contain points of $P$). In each
336: iteration, we propagate the wavefront from the faces of
337: $\F_{2i-2}$ into the faces of $\F_{2i}$. It is easy to
338: verify that a connected component of the flood corresponds
339: to a connected component of the wavefront of $\F_{2i}$.
340: (Note, that two faces of $\F_{2i}$ might be adjacent but
341: belong to different wavefronts as the wavefronts did not
342: cross the separating edge yet and thus were not merged into
343: a single wavefront.) The connected components are
344: maintained implicitly by a union-find data-structure. In
345: particular, during the $i$-th iteration of the wavefront
346: propagation in $\Gadj$, when two different connected
347: components of the wavefront collide, it corresponds to two
348: points of $P$ with crossing distance equal to $2i-1$ or $2i$
349: from each other.
350:
351: In particular, if there is an edge of the MST of weight
352: $2i-1$ or $2i$ it would be discovered when the corresponding
353: wavefronts collide. The $i$-th iteration of the wavefront
354: propagation, corresponds to the detection of edges of weight
355: $2i-1$ and $2i$ in the MST. For the MST applications, we
356: first handle all relevant edges of weight $2i-1$, and later
357: all such edges of weight $2i$. This requires a somewhat
358: careful implementation, and we omit the the technical but
359: straightforward details. See Figure \ref{fig:mst:wavefront}.
360:
361: Note, that the wavefront propagation can be done without
362: constructing $\Gadj$ in advance, and one can compute parts
363: of $\Gadj$ on the fly as needed (i.e., we need to compute
364: only the parts of $\Gadj$ that are covered by the wavefront,
365: or are about to be covered). Of course, in the worst case,
366: the whole graph $\Gadj$ would be computed, which takes
367: $O(n^2 \log{n})$ time (this corresponds to computing the
368: whole arrangement $\Arr(L)$).
369:
370: \begin{lemma}
371: Given a set $L$ of $n$ lines, and a set $P$ of $n$ points, a
372: minimum spanning tree $\Topt(P,L)$ of $P$ under the crossing
373: metric $\IDL$ can be computed in $O(n^2\log{n})$ time.
374:
375: \lemlab{bf:prop}
376: \end{lemma}
377:
378: \begin{remark}
379: In the algorithm of \lemref{bf:prop} we did not use the fact
380: that $L$ is a set of lines. The same algorithm will work for the
381: case where $L$ is a set of arcs. Since we do not have the triangle
382: inequality in this case, the edges of the MST are no longer line
383: segments, but rather a Jordan arcs. (For example, imagine that the
384: set $L$ is a single segment and we would like to connect two
385: points that are separated by this segment. This can be done with
386: no crossing by going ``around'' this segment.)
387: \end{remark}
388:
389: To be able to generate parts of $\Gadj$ incrementally, as we
390: perform the wavefront propagation, we need a way to compute
391: the relevant portions of $\Arr(L)$ on the fly.
392:
393: \begin{theorem}[\cite{hs-oplpa-01-dcg}]
394: Let $L$ be a set of $n$ lines, as above, and $P$ a set
395: of $m$ points in the plane. Then one can compute, in
396: expected $O(\pth{ n+w +m}\alpha(n)\log{n})$ time,
397: a Steiner tree $\MSTA$ of $P$, so that the expected
398: weight of $\MSTA$ is $O((n+w)\alpha(n)\log{n})$,
399: where $w= \Wopt(P,L)$ and $\alpha(n)$ is the inverse of
400: the Ackermann function. Alternatively, one can compute
401: the $m$ faces that contain the points of $P$ in the same
402: time bound.
403:
404: \theolab{hs}
405: \end{theorem}
406:
407:
408: \begin{lemma}[\cite{a-idapa-91}]
409: There exists a Steiner tree $\MST'$ of $P$, so that
410: $\Wopt(P,L) = O(n\sqrt{n})$, and this is tight in the
411: worst case (even for the case the arcs are lines).
412: \lemlab{span:tree}
413: \end{lemma}
414:
415: In the worst case, Theorem \ref{theo:hs} is inferior to
416: implicit point-location data-structures \cite{ams-cmfal-98}
417: (which can perform the implicit point-location needed in
418: roughly $O(n^{4/3})$ time for $m=n$), as implied by
419: \lemref{span:tree} (as the weight of the MST is
420: $\Omega(n^{3/2})$ in the worst case, and this is the time to
421: compute the relevant portions of the arrangement using the
422: algorithm of \theoref{hs}). However, the running time of
423: the algorithm of \theoref{hs} is sensitive to the overall
424: weight of the MST. This would be crucial for our algorithm.
425:
426: \begin{lemma}
427: Given a set $L$ of $n$ lines, a set $P$ of $n$ points,
428: and a parameter $i$, one can compute, in expected
429: $O(i(n+\Wopt)\alpha^2(n)\log{n})$ time, a minimum
430: spanning forest of $P$ under the crossing metric $\IDL$,
431: that connects all the points of $P$ in distance at most
432: $\leq 2i$ from each other, where $\Wopt = \Wopt(P,L)$.
433:
434: \lemlab{propagate}
435: \end{lemma}
436:
437: \begin{proof}
438: The wavefront propagation on $\Gadj$ can be done using
439: an implicit representation of the arrangement of
440: $\Arr(L)$. Namely, we compute the set $\F_i$ of faces
441: of $\Arr(L)$ in distance $i$ from the points of $P$.
442: Observe that the complexity of $\F_i$ is $O((n+\Wopt)
443: i\alpha( n/i))$. Indeed, the points of $P$ can be
444: connected by an arc $\gamma = \Topt(P,L)$ having
445: $O(\Wopt)$ intersections with the lines of $L$, and let
446: $\Arr'$ be the arrangement resulting from $\Arr$ by
447: creating a tiny gate for each intersection of $\gamma$
448: with the lines of $L$. The zone of $\gamma$ in $\Arr(L)$
449: corresponds to a single face $F$ of $\Arr'$, and the
450: faces of $\F_i$ are contained in the set of faces in
451: distance $\leq i$ from $F$. By \cite{bds-lric-95}, the
452: complexity of this region is $O((n+\Wopt) i\alpha(
453: n/i))$ (this is a bound on the complexity of all the
454: vertices in distance $\leq i$ from the face $F$.).
455:
456: Clearly, the faces of $\F_i$ have a spanning tree of
457: weight $O((n+\Wopt) i\alpha( n/i))$, and so it can be
458: computed in an online fashion in $O((n+\Wopt) i\alpha^2
459: ( n) \log{n})$ expected time, by Theorem \ref{theo:hs}.
460: \end{proof}
461:
462: \begin{figure*}[tb]
463: \vspace{-0.5cm}
464: \begin{center}
465: \fbox{
466: \begin{program}
467: \> \>{\large{\sc{Algorithm}}}\ \ \
468: \Proc{\ApproxMST{}($P, L, \eps$)} \\
469: \> \>{\tt Input:} {\rm{A set of points $P$,
470: a set of lines $L$, and an approximation
471: parameter $\eps$}}\\
472: \> \>{\tt Output:} A spanning tree of $P$ of
473: weight $\leq (1+\eps)\Wopt(P,L)$\\
474: \> \Procbegin \\
475: \>\> $M \leftarrow $ Approximate the weight
476: of MST
477: using the algorithm of Lemma \lemref{rough}.\\
478: \> \> $l_0 \leftarrow \max \pth{ \frac{\eps
479: M}{\cmindist n \alpha(n)\log^2{n}}, 1}
480: $\\
481: \> \> Set $F = (P, \emptyset)$ to be the
482: an empty spanning forest of $P$.\\
483:
484: \>\>\PropagateApproxWavefront{}( $P$, $L$,
485: $l$, $F$ )\\
486:
487: \>\> $i \leftarrow 1$\\
488: \> \> \While $F$ is not a single connected
489: component \Do\\
490: \>\>\> $l_i \leftarrow l_{i-1} \cdot 2$\\
491: \>\>\>\PropagateApproxWavefront{}( $P$, $L$,
492: $l$, $F$ )\\
493: \>\>\> $i \leftarrow i + 1$\\
494: \> \> \End \While\\
495: \>\>\\
496: \>\>\Return $F$ \\
497: \>\Endproc{\ApproxMST{}}
498: \end{program}
499: }
500: \end{center}
501: \vspace{-0.5cm}
502: \caption{Approximating the MST in the Plane}
503: \figlab{alg:mst2}
504: % \vspace{-0.5cm}
505: \end{figure*}
506:
507: \begin{figure*}[tb]
508: \vspace{-0.5cm}
509: \begin{center}
510: \fbox{
511: \begin{program}
512: \> \>{\large{\sc{Algorithm}}}\ \ \
513: \Proc{\PropagateWavefront{}( $P$, $R$, $l$, $F$ )}\\
514: \> \>{\tt Input:} ~
515: {\rm{$P$ - set of points}}\\
516: \>\>\>\>\>{\rm{$R$ - set of lines}}\\
517: \>\>\>\>\>{\rm{$l$ - propagation distance}}\\
518: \>\>\>\>\>{\rm{$F$ - current spanning forest}}\\
519: \> \>{\tt Output:} An updated forest $F$ with
520: any pair of points of distance $\leq 2l$ in a\\
521: \>\>\>\>\> single
522: connected component\\
523: \> \Procbegin \\
524: \>\> Initialize the data-structure $D(R)$ of
525: \cite{hs-oplpa-01-dcg} for online point-location.\\
526: \> \> Set $W_0$ to be the set of faces of
527: $\Arr(R)$ that contains points of $P$.\\
528: \>\>\>\>Use
529: $D(R)$ to compute those faces.\\
530: \> \> \For $i=1, \ldots, l$ \Do\\
531: \>\>\> $W_i \leftarrow$
532: Set of faces of
533: $\Arr(R)$ of distance $=i$
534: from points of $P$.\\
535: \>\>\>\>\>Do wavefront propagation from
536: $W_{i-1}$, and use $D(R)$ to retrieve
537: \\
538: \>\>\>\>\>the faces of interest in $\Arr(R)$. \\
539: \>\>\> \If two different wavefronts collide
540: \Then\\
541: \>\>\>\> Add
542: an edge connecting the two corresponding points
543: to $F$\\
544: \>\>\>\>Merge the corresponding connected
545: components.\\
546: \>\> \Endfor\\
547:
548: \>\Endproc{\PropagateWavefront{}}
549: \end{program}
550: }
551: \end{center}
552: \vspace{-0.5cm}
553: \caption{Doing the wavefront propagation}
554: \figlab{alg:propagate}
555: \vspace{0.5cm}
556: \end{figure*}
557:
558:
559: \begin{figure*}[tb]
560: \vspace{-0.5cm}
561: \begin{center}
562: \fbox{
563: \begin{program}
564: \> \>{\large{\sc{Algorithm}}}\ \ \
565: \Proc{\PropagateApproxWavefront{}( $P$, $L$, $l$, $F$ )}\\
566: \> \>{\tt Input:} ~
567: {\rm{$P$ - set of points}}\\
568: \>\>\>\>\>{\rm{$L$ - set of lines}}\\
569: \>\>\>\>\>{\rm{$l$ - starting propagation distance}}\\
570: \>\>\>\>\>{\rm{$F$ - current spanning forest}}\\
571: \> \>{\tt Output:} An updated forest $F$ with
572: any pair of points of distance $\leq 2l$ in a\\
573: \>\>\>\>\> single
574: connected component\\
575: \> \Procbegin \\
576: \>\>\> Compute a random sample $R$ by choosing
577: each line of $L$ into the sample with
578: \\
579: \>\>\> \> \> probability $f(l) = 128
580: \cSampleProb
581: \frac{\log{n}}{l\eps^2}$\\
582: \> \> \> {\tt /* Approximate the wavefront propagation
583: in $A(L)$ by doing}\\
584: \>\>\>\> {\tt it (exactly) in $\Arr(R)$ */}\\
585: \> \> \> \PropagateWavefront{}( $P$, $R$, $\cFarEnough
586: \log{n}/\eps^2$, $F$ )\\
587: \>\>\>\>\>\>{\tt /*
588: $\cFarEnough$ is an appropriate constant */}\\
589: \>\Endproc{\PropagateApproxWavefront{}}
590: \end{program}
591: }
592: \end{center}
593: \vspace{-0.5cm}
594: \caption{Doing the approximate wavefront propagation}
595: \figlab{alg:propagate:x}
596: \vspace{0.5cm}
597: \end{figure*}
598:
599:
600:
601:
602:
603: %-------------------------------------------------------
604: %-------------------------------------------------------
605: \section{Approximation Algorithm for the Planar Case}
606: \seclab{speedup}
607:
608: The algorithm is depicted in \figref{alg:mst2},
609: \figref{alg:propagate} and \figref{alg:propagate:x}. We next
610: describe the algorithm and its analysis in more detail.
611:
612: \lemref{propagate} provides us with an algorithm for
613: approximating the MST in roughly quadratic time in the worst
614: case. To get a near linear running time, we simulate the
615: Dijkstra algorithm by performing the wavefront propagation
616: in an approximate fashion.
617:
618:
619: \begin{defn}
620: A metric $\D'$ {\em $\eps$-approximates} a metric $\D$,
621: if for any $p,q,r,s \in P$ such that $\D'(p,q) \leq
622: \D'(r,s)$ then $\D(p,q) \leq (1+\eps)\D(r,s)$.
623: \end{defn}
624:
625: \begin{defn}
626: For a set $F$ of segments in the plane, and a metric $\D$,
627: let $\weight_\D(F) = \sum_{e \in F} \D(e)$ denote the
628: total weight of $F$ under the metric $D$.
629: \end{defn}
630:
631: The proof of the following lemma is straightforward, and is
632: included only for the sake of completeness.
633: \begin{lemma}
634: Let the metric $\D'$ be an $\eps$-approximation to the
635: metric $\D$ over a point-set $P$. Let $T'$ be an MST of
636: $P$ under $\D'$. Then, $\weight_\D(T') \leq
637: (1+\eps)\weight_\D(T)$, where $T$ is the MST of $P$
638: under $\D$, and $\weight(T)$ is the total weight of the
639: edges of $T$.
640:
641: \lemlab{approx:mst}
642: \end{lemma}
643:
644: \begin{proof}
645: Let $e_1', \ldots, e_{n-1}'$ be the the edges of $T'$
646: sorted by their weight $\D'(e_1') \leq \ldots \leq
647: \D'(e_{n-1}')$. Let $T_0 = T$, and let $T_i$ be the tree
648: resulting from removing the heaviest edge (according to
649: $\D'$) from the cycle present in $T_{i-1} \cup
650: \brc{e_i'}$ (if $e_i'$ is already in $T_{i-1}$ we do
651: nothing). Let $e_i$ denote this removed edge. Clearly,
652: $\D'(e_i') \leq \D'(e_i)$ and, by definition, $\D(e_i')
653: \leq (1+\eps) \D(e_i)$. Namely, we replaced an edge
654: $e_i$ by an edge $e_i'$ which is heavier by a factor of
655: $(1+\eps)$. In the end of the process $T_{n-1}$ is just
656: $T'$, and $\weight_\D(T') \leq
657: \sum_{i=1}^{n-1}(1+\eps)\weight_\D(e_i) \leq
658: (1+\eps)\weight_\D(T)$.
659: \end{proof}
660:
661: \lemref{approx:mst} suggest that if we can find a
662: computationally cheaper approximate metric than
663: $\IDL(\cdot,\cdot)$, then we can use it to compute the MST.
664: A natural way to do that, is to randomly sample a subset $R
665: \subseteq L$, and use $\IDX{R}( \cdot, \cdot )$ as the
666: approximate metric. However, it is easy to verify that
667: $\IDX{R}$ is an $\eps$-approximate metric to $\IDL$ only if
668: $L = R$.
669:
670: \begin{defn}
671: Let $\D',\D$ be two metrics, $\eps > 0$, and $l$ be
672: prescribed parameters. The metric $\D'$ is an {\em
673: $(\eps,l)$-approximation} to $\D$, if for any
674: $p,q,r,s \in P$, such that (i) $\D( p,q), \D(r,s) \geq
675: l$, and (ii) $\D'(p,q) \leq \D'(r,s)$, we have $\D(p,q)
676: \leq (1+\eps)\D(r,s)$.
677:
678: Namely, $\D'$ $\eps$-approximates $\D$ for distances not
679: smaller than $l$.
680: \end{defn}
681:
682: \begin{defn}
683: For $l, \eps$, let $\nu(l, \eps ) = \max \pth{ 128
684: \cSample \frac{\log{n}}{l\eps^2}, 1 }$, where
685: $\cSample$ is an appropriate constant. Let $\RS( L, l,
686: \eps)$ be a random subset of $L$ generated by picking
687: independently each line of $L$ with probability
688: $\nu(l,\eps)$.
689:
690: Let $\rho(l, \eps) = \nu(l, \eps) l = 128 \cSample
691: \frac{\log{n}}{\eps^2}$. The value $\rho(l,\eps)$ is the
692: expected crossing distance in $\Arr(\RS(L, l, \eps))$
693: between two points $p, q \in P$ such that $\IDL(p,q) =
694: l$.
695: \deflab{def:sample}
696: \end{defn}
697:
698: \begin{lemma}
699: Let $L$ be a set of $n$ lines in the plane, $l$ a
700: positive integer number, $\eps >0$, and let $R = \RS(L,
701: l, \eps)$ be a random subset of $L$.
702:
703: For any two points $p,q$ of distance $\IDL(p,q) \geq l$
704: from each other we have
705: \[
706: \IDL(p,q) \leq \frac{n}{r(1-\eps/4)}\cdot \IDX{R}(p,q)
707: \leq (1+\eps)\IDL(p,q),
708: \]
709: with probability $\geq 1-n^{-c_0}$.
710:
711: Furthermore, $\IDX{R}( \cdot, \cdot)$ is an
712: $(\eps,l)$-approximation to $\IDL(\cdot, \cdot)$ with
713: high probability.
714:
715: \lemlab{good:estimate}
716: \end{lemma}
717:
718:
719: \begin{proof}
720: Indeed, let $X_{p q} = D_R(p,q)$. We have,
721: \begin{eqnarray*}
722: \mu = E[ X_{p q} ] = \IDL(p,q)\cdot \nu( l, \eps )
723: \leq 128 \IDL(p,q) \cSample \frac{\log{n}}{l \eps^2}
724: \geq \frac{128 \cSample
725: \log{n}}{\eps^2}.
726: \end{eqnarray*}
727:
728: By Chernoff inequality \cite{mr-ra-95,mps-lpvaa-98}, we have that
729: \begin{eqnarray*}
730: P \pbrc{ \cardin{X_{p q} - \mu} > \frac{\eps}{4}\mu} &\leq& 2
731: \pth{ \frac{e^{\eps/4}}{{\pth{1 + \frac{\eps}{4}}^{1+
732: \eps/4}}}}^\mu
733: = 2 \exp \pth{\mu \pth{ \frac{\eps}{4} -
734: \pth{1+\frac{\eps}{4}}\log \pth{ 1 + \frac{\eps}{4}}}} \\
735: &\leq& 2 \exp \pth{\mu \pth{
736: \frac{\eps}{4} - \pth{1+\frac{\eps}{4}}
737: \pth{ \frac{\eps}{4} - \frac{\eps^2}{32}}}}\\
738: &\leq& 2
739: \exp \pth{-\mu \frac{\eps^2}{64}}
740: \leq
741: \exp \pth{- \frac{128 \cSample
742: \log{n}}{\eps^2} \cdot \frac{\eps^2}{64}}
743: \leq n^{-\cSample},
744: \end{eqnarray*}
745: since $\log(1+x) \geq x - x^2/2$, for $0 \leq x \leq 1$. In
746: particular, this implies that with high probability
747: $\mu(1-\eps/4) \leq X_{p q} \leq \mu
748: (1+\eps/4)$. Namely, with high probability we have
749: \begin{eqnarray*}
750: \IDL(p,q) &\leq& \frac{X_{p q}}{\nu(l, \eps )(1-\eps/4)}
751: \leq
752: \frac{\nu(l, \eps )(1+\eps/4)}{\nu(l, \eps )(1-\eps/4)} \IDL(p,q)
753: =
754: \frac{1+\eps/4}{1-\eps/4} \IDL(p,q) \\
755: &\leq &
756: (1+\eps)\IDL(p,q).
757: \end{eqnarray*}
758:
759: Consider now four points $p,q,r,s$, such that
760: $\IDL(p,q), \IDL(s,t) \geq l$ and $\IDX{R}(p,q) \leq
761: \IDX{R}(r,s)$. By the above discussion, we have with
762: high probability
763: \[
764: \IDL(p,q) \cdot \nu(l,\eps) (1-\eps/4) \leq \IDX{R}(p,q)
765: \leq \IDX{R}(r,s) \leq (1+\eps) \IDL(r,s) \cdot
766: \nu(l,\eps)(1-\eps/4).
767: \]
768: Namely, $\IDL(p,q) \leq (1+\eps)\IDL(r,s)$. Namely,
769: $\IDX{R}(\cdot, \cdot)$ is an $(\eps,l)$-approximation
770: to $\IDL(\cdot, \cdot)$ with probability $\geq 1 - {n
771: \choose 2} n^{-\cSample}$.
772: \end{proof}
773:
774: \lemref{good:estimate} and \lemref{approx:mst} suggest that
775: we compute the MST by computing an appropriate random sample
776: $R$ (by using a threshold $l$), and deploy the algorithms of
777: \secref{cont:dijkstra} to compute the MST of $P$ in
778: $\Arr(R)$. Such an MST would be an approximate MST. There
779: are two main problems with this approach: (i) For short
780: distances (i.e., $l=1$), just starting the wavefront
781: propagation (i.e., \lemref{propagate}) is prohibitively
782: expensive (it roughly takes $O(\Wopt(P,L))$ time which might
783: be $\Omega(n^{3/2})$), (ii) For long distances (i.e., $\geq
784: i \cdot l$), the wavefront propagation becomes, again,
785: prohibitly expensive (i.e. $\Ot(ni)$) by
786: \lemref{propagate}.
787:
788: \begin{corollary}
789: Let $U$ be the total weight of all the edges of $\T$
790: having weigh less than $\eps\Wopt(P,L)/(10n)$. Then $U
791: \leq \eps\Wopt(P,L)/10$.
792: \corlab{idiotic}
793: \end{corollary}
794:
795: \lemref{rough} describes how we can approximate $\Wopt(P,L)$
796: to within a polylogarithmic factor using random sampling in
797: near linear time. Since the algorithm of this lemma is very
798: similar to the techniques used below, we defer its
799: description to the appendix. Equipped with such
800: approximation $M$, we know by \corref{idiotic} that we do
801: not ``care'' about edges of the MST of length smaller than
802: $l_0 = O(\eps M/(n \polylog(n)))$. In particular, we can
803: generate a random sample $R_0$ which provides an
804: $(\eps,l_0)$-approximation to $\IDL(\cdot, \cdot)$. Thus, we
805: can approximate the MST by computing the MST of
806: $\Topt(P,R_0)$.
807:
808: This, however, does not address the second problem. Indeed,
809: computing the MST of $\Topt(P,R_0)$ might still be too
810: expensive, as the following lemma testifies.
811:
812:
813: \begin{lemma}
814: Given a set $L$ of $n$ lines, a set $P$ of $n$ points,
815: and parameters $l, i, \eps, U$, such that $l
816: =\Omega\pth{ \Wopt(P,L)/(n U) }$ and let $R =\RS(L, l,
817: \eps)$ be a random sample of $L$. Then, one can
818: compute, in expected $\Ot (i U n)$ time, a minimum
819: spanning forest of $P$ under the crossing metric
820: $\IDX{R}$, that connects all the points of $P$ in
821: distance at most $\leq 2i$ from each other.
822:
823: \lemlab{propagate:ext}
824: \end{lemma}
825:
826: \begin{proof}
827: Let $X$ denote the size of $R$. Clearly, The expected
828: value of $X$ is
829: \[
830: E[X] = n \nu(l, \eps) = 128 n \cSample
831: \frac{\log{n}}{l\eps^2} = O \pth{ \frac{U n^2 \log
832: n}{\eps^2\Wopt(P,L)}},
833: \]
834: by \defref{def:sample}. Let $\gamma = \Topt(\gamma,L)$.
835: Let $Y = \weight(\gamma, R)$. Clearly,
836: \[
837: E[Y] = \weight(P,R) = \Wopt(P,L) \nu(l, \eps) = O \pth{
838: \frac{U n \log n}{\eps^2}}.
839: \]
840: Namely, $E[ \Wopt(P,R) ] \leq E[ Y] = O \pth{ \frac{ U n
841: \log n}{\eps^2}}$. The running time bound now
842: follows immediately by applying the algorithm of
843: \lemref{propagate} to $P$ and $R$.
844: \end{proof}
845:
846: The algorithm of \lemref{propagate:ext} first performs
847: wavefront propagation for distances in $\Arr(R)$ which are
848: smaller than $\rho( l, \eps)$. For such distances $\Arr(R)$
849: {\em does not provide} reliable estimate (i.e., ordering) of
850: the crossing distances between points. However, once the
851: distances propagated exceed $\rho(l,\eps)$, we know by
852: \lemref{good:estimate} that the distances are now
853: $(\eps,l)$-approximated correctly. The main importance of
854: the algorithm of \lemref{propagate:ext} is that the
855: algorithm has near linear running time for small values of
856: $U$ and $i$.
857:
858: Using \lemref{propagate:ext} together with \corref{idiotic}
859: implies that we can compute a spanning forest for the
860: ``short'' edges of $\Topt(P,L)$ in near linear time.
861:
862: \begin{lemma}
863: Given a set $P$ of $n$ points in the plane, and a set
864: $L$ of $n$ lines in the plane. One can compute a
865: spanning forest $F$ of $P$, such that the weight of $F$
866: is $\leq \eps\Wopt(P,L)/10$. Furthermore, every pair of
867: points of $P$ in distance $\Omega( \Wopt(P,L)\eps/(n
868: \log^3 n) )$ belong to the same connected components of
869: $F$. The running time of this algorithm is $\Ot \pth{ n
870: }$.
871:
872: \lemlab{start:forest}
873: \end{lemma}
874: \begin{proof}
875: Using the algorithm of \lemref{rough}, compute in
876: $\Ot(n)$ time, a number $M$ such that $\Wopt(P,L) \leq M
877: = O(n \alpha(n) \log^2{n} + \Wopt(P,L) \alpha(n) \log
878: n)$. In particular, let
879: \begin{equation}
880: \lshort = \frac{\eps M}{\cAnother n\log^3{n}} \leq
881: \frac{\eps}{40n}\Wopt(P,L),
882: \eqlab{specify:l}
883: \end{equation}
884: for $\cAnother$ large enough. On the other hand,
885: $\lshort = \Omega( \Wopt(P,L)/ ( U n) )$, where $U =
886: O((\log^3{n})/\eps)$.
887:
888: We now compute a spanning forest for $P$, using
889: \lemref{propagate:ext} with $\lshort$ and $U$ as specified and
890: $i= 2\rho(l,\eps)$. The running time of this algorithm
891: is
892: \[
893: \Ot \pth{i U n} = \Ot \pth{ \rho(\lshort,\eps) n } = \Ot \pth{
894: \frac{\log{n}}{\eps^2} \cdot n } = \Ot\pth{n}.
895: \]
896:
897: Clearly, $F$ has at most $n$ edges, and all the points
898: of $P$ in distance $\leq \lshort$ are in the same connected
899: component of $F$ by \lemref{good:estimate}.
900:
901: Furthermore, for any edge $p q$ of $F$, we have that
902: with high probability
903: $\IDL(p,q) \leq 2(1+\eps)\lshort \leq 4\lshort$ by
904: \lemref{good:estimate}. In particular, $\weight(F, L)
905: \leq 4n\lshort \leq (\eps/10)\Wopt(P,L)$.
906: \end{proof}
907:
908: \lemref{start:forest} implies that we can compute a cheap
909: spanning forest of $P$ in near linear time that ``captures''
910: all the light edges of the MST. Next, we can compute the
911: rest of the edges of the MST using \lemref{propagate:ext}
912: repeatedly.
913: \begin{lemma}
914: Given a set $P$ of $n$ points in the plane, and a set
915: $L$ of $n$ lines in the plane, a parameter $\eps>0$, and
916: a spanning forest $F$ of $P$, such that every pair of
917: points of $P$ in distance $\leq l$ belong to the same
918: connected components of $F$, where $l = \Omega(
919: \Wopt(P,L)\eps/(n \log^3 n) )$. Then, one can compute a
920: spanning forest $F'$ of $P$ such that all the points of
921: $F$ in distance $\leq 2l$ belong to the same connected
922: component of $F'$. The forest $F'$ can be computed in
923: $\Ot \pth{ n}$ expected time.
924:
925: \lemlab{round:forest}
926: \end{lemma}
927:
928: \begin{proof}
929: We use the same algorithm of \lemref{start:forest}, with
930: the modification that when calling to the algorithm of
931: \lemref{propagate:ext}, we pass on $F$, such that the
932: algorithm ignore generated edges that belong to the same
933: connected component of $F$. It is again clear, that only
934: edges of length between $l$ and $2 (1+\eps)l$ would be
935: added to the spanning forest. The exact details of how
936: to specify $U$ and $i$ are similar to
937: \lemref{start:forest}, and are omitted.
938: \end{proof}
939:
940: Our algorithm for computing the MST works by using
941: \lemref{start:forest}. This results in a spanning forest
942: $F_0$ of the points of $P$, and a value $\lshort$ as
943: specified by \eqref{specify:l}. We now use
944: \lemref{round:forest} repeatedly $O(\log{n})$ times, in the
945: $i$-th iteration handling distances between $2^{i-1}\lshort$
946: to $2 \cdot 2^i \lshort (1+\eps)$, for $i=1, \ldots,
947: O(\log{n})$), till we handle all distances $\leq n$. Namely,
948: in the $i$-th iteration, we compute a spanning forest $F_i$
949: of all points in distance $\leq 2^i\lshort$ from each other
950: using \lemref{round:forest} using $F_{i-1}$ as our
951: ``starting'' spanning forest.
952:
953: Clearly, the expected running time of the resulting
954: algorithm is $\Ot \pth{ n }$. What is not clear, is that
955: the resulting MST is indeed an $\eps$-approximate MST.
956:
957: \begin{lemma}
958: With high probability, the tree $T$ computed by the above
959: algorithm is an $\eps$-MST of $P$ in $\Arr(L)$.
960: \end{lemma}
961:
962: \begin{proof}
963: All the edges generated by the algorithm of
964: \lemref{start:forest}, in the first stage of the
965: algorithm, have total weight $\leq (\eps/20) \Wopt(P,L)$
966: with high probability.
967:
968: Let $\Topt(P,L)$ be the optimal spanning tree. If $T$ is
969: not an $\eps$-approximate MST, then $\weight_{\IDL}(T)>
970: (1+\eps)\weight_{\IDL}(\Topt)$. In particular, there
971: must be an edge of $\Topt$ which its insertion into $T$
972: would results in substantially lightly spanning tree.
973: Formally, for an edge $e$, let $T(e)$ be the tree
974: resulting from $T$ by inserting $e$ into $T$, and
975: removing from $T$ the heaviest (according to $\IDL$)
976: edge on the new cycle that was created, and let
977: $\out(T,e)$ denote this ``ejected'' edge.
978:
979: Arguing as in the proof of \lemref{approx:mst}, it must
980: be that there exists an edge $\phi=p q$ of $\Topt$ such that
981: \[
982: (1+\eps)\IDL(\phi) < \IDL( \out(T,\phi) ),
983: \]
984: and $\IDL(\phi) > \Wopt(P,L)/(20n)$.
985:
986: Let $i$ be the index such that $2^{i-1} \lshort \leq
987: \IDL(\phi) \leq 2^i \lshort$. With high probability, we
988: know that after the $i$-th iteration $p$ and $q$ are in
989: the same connected component of $F_i$. Assume that $p$
990: and $q$ were not in the same connected component of
991: $F_{i-1}$ (the other case is easier and as such is
992: omitted).
993:
994: Let $T''$ be the spanning forest maintained by the
995: algorithm just after $p$ and $q$ were present in the
996: same connected component. With high probability, for
997: any edge $e''$ of $T''$, we have $\IDL(e'') \leq
998: (1+\eps)\IDL(\phi)$, since the random sample $R_i$ we
999: used in the $i$-iteration is $(2^{i-1}
1000: \lshort,\eps)$-approximation to $\IDL$.
1001:
1002: But then, it is not possible that the algorithm added
1003: $\out(T,\phi)$ to the spanning tree $T''$, as all the
1004: edges on the cycle in $T'' \cup \brc{\phi}$ are lighter
1005: than $(1+\eps)\IDL( \phi)$. A contradiction.
1006: \end{proof}
1007:
1008: We summarize our result:
1009: \begin{theorem}
1010: Given a set $P$ of $n$ points in the plane, $L$ a set of
1011: $n$ lines, and $\eps > 0$ a parameter. Then one can
1012: compute a spanning tree $T$ of $P$, in $\Ot \pth{ n }$
1013: expected time, such that $\weight(T, L) \leq
1014: (1+\eps)\Wopt(P,L)$. The result is correct with high
1015: probability.
1016: \end{theorem}
1017:
1018:
1019:
1020:
1021:
1022:
1023:
1024:
1025:
1026:
1027:
1028:
1029:
1030:
1031: %----------------------------------------------------------------
1032: %----------------------------------------------------------------
1033:
1034: \section{Approximation Algorithms for the Intersection
1035: Metric via Embeddings}
1036:
1037: \seclab{embed}
1038:
1039: Let $P=\brc{p_1, \ldots, p_n}$ be a given set of $n$ points,
1040: and $L = \brc{l_1, \ldots, l_m}$ be a set of $m$ lines,
1041: where $m= n^{O(1)}$. As mentioned earlier, the metric
1042: $\IDL$ is computationally cumbersome. One possible way to
1043: overcome this problem, is to embed this metric into a more
1044: convenient metric (while introducing a small distortion
1045: error).
1046:
1047: In this section, we show a somewhat weaker result. We show
1048: how to embed the points of $P$ into $O(\log^\EmbedDim
1049: n)$-dimensional space in $\Ot(n+m+n^{2/3}m^{2/3})$ time, so
1050: that a specific distance gap in the crossing metric, is
1051: mapped to a corresponding gap in the target space.
1052:
1053: We first observe that the crossing distance between two
1054: points $p$ and $q$, can be computed by interpreting this
1055: distance as a Hamming distance on the hypercube in $m$
1056: dimensions induced by the lines. Namely, each line $l$
1057: contribute a coordinate --- a point gets a '1' in this
1058: coordinate if it is on one side of $l$, and a '0' if it is
1059: on the other side of $l$. Formally, let $l^+$ denote the
1060: open half-plane defined by a line $l$ that contains the
1061: origin, and $l^-$ denote the other open plane. For a point
1062: $p \in \Re^2$, let $\vL(p) = (b_1, \ldots, b_m)$ be a
1063: $m$-bit vector so that $b_i=1$ {\bf iff} $p \in l_i^+$. It
1064: is easy to verify that $\IDL(p,q) = d_H(\vL(p), \vL(q))$,
1065: where $d_H$ is the Hamming distance.
1066:
1067: \remove{
1068: On this mapped set, we can now deploy several approximation
1069: algorithms for points in high-dimension. However, all those
1070: algorithms first need to read all their input, which
1071: requires $\Omega(nm)$ time. A standard technique to reduce
1072: the dimension of the input (and thus its size), while
1073: preserving distances between points, is to use dimension
1074: reduction techniques \cite{jl-elmih-84,im-anntr-98}. We
1075: next show how one performs a (somewhat restricted) dimension
1076: reduction in an implicit way, by using the underlining
1077: geometry in $o(m n)$ time.
1078: }
1079:
1080: \begin{defn}
1081: Let $R \subseteq L$, let $f_R:\Re^2 \rightarrow \ZZ$ be
1082: the mapping that maps a point $p$ in the plane to its
1083: face ID in the arrangement $\Arr(R)$. Formally, we
1084: assign for each face in the arrangement $\Arr(R)$ a
1085: unique integer (say, and integer between $1$ and
1086: $O(|R|^2)$). The mapping $f_R$ maps a point $p$ in the
1087: plane to the integer identifying the face that contains
1088: $p$. (Note, that is does not uniquely define
1089: $f_R(\cdot)$ as we did not specify how we assign the IDs
1090: to the faces.)
1091:
1092: For a set $\R = (R_1, \ldots, R_\mu)$ of subsets of $L$,
1093: let $f_\R:\Re^2 \rightarrow \ZZ^\mu$ be the mapping
1094: $f_\R(p) = ( f_{R_1}(p), f_{R_2}(p), \ldots,
1095: f_{R_\mu}(p))$. For two points $p,q \in \Re^2$, let
1096: $d_H(f_\R(p),f_\R(q))$ be the Hamming distance between
1097: $f(p)$ and $f_\R(q)$. Namely, this is the number of
1098: coordinates, where the two vectors $f_\R(p)$ and
1099: $f_\R(q)$ disagree.
1100:
1101: One can view $f_\R$ as an embedding of the crossing
1102: metric $\IDL$ to the Hamming space $\ZZ^\mu$.
1103: \end{defn}
1104:
1105: \begin{lemma}
1106: Given a set $P$ of $n$ points in the plane, a set $L$ of
1107: lines in the plane, a parameter $\eps > 0$ and a
1108: parameter $r$. One can compute a set $\R$ of $\mu$
1109: subsets of $L$, such that for the embedding $f_\R:\Re^2
1110: \rightarrow \ZZ^\mu$, we have that, with high
1111: probability, for any $p,q \in P$ it holds:
1112: \begin{itemize}
1113: \item If $\IDL(p,q) \leq r$, then $d_H(f(p), f(q))
1114: \leq M$,
1115: \item If $\IDL(p,q) \geq (1+\eps)r$ then $d_H(f(p),
1116: f(q)) \geq (1+\eps)(1-a/\log{n})M$,
1117: \end{itemize}
1118: where $M$ and $a$ are appropriate constants and $\mu
1119: =O(\log^4 n)$.
1120:
1121: \lemlab{good:embed}
1122: \end{lemma}
1123: \remove{
1124: In the following, we restrict ourselves to the case where
1125: only distances in a certain range are approximately
1126: preserved by the embedding. Namely, for a prescribed
1127: parameters $r > 0$, $\eps > 0$ we describe a mapping
1128: $f(\cdot)$ so that if a pair of points $p,q$ is in distance
1129: $\leq r$, then it is mapped (with high probability) into a
1130: pair $f(p),f(q)$ having distance $\leq M$, and if $p,q \geq
1131: (1+\eps)$, then the pair $f(p),f(q)$ are in distance
1132: $\geq(1+\eps')M$, where $M$ is an appropriate constant, and
1133: $\eps, \eps'$ are of the same up to the factor of
1134: $(1+O(1)/\log n)$.
1135:
1136: In this way approximate nearest neighbor
1137: in the original space with error $(1+\eps)$ is be reduced
1138: the $(1+\eps')$-approximate nearest neighbor in the
1139: resulting Hamming space. For the purpose of using the
1140: nearest neighbor algorithms of~\cite{im-anntr-98} this
1141: ``threshold embedding'' is sufficient,
1142: see~\cite{im-anntr-98} for details.
1143: }
1144:
1145: \begin{proof}
1146: For sake of simplicity of exposition, we assume that $m
1147: / r \geq \log{n}$, where $m=|L|$. If this is not
1148: correct, we can add ``fictitious'' lines to $L$ that have
1149: all the points of $P$ on one side of them. If we pick
1150: such a line to a set of $\R$, we can ignore it when we
1151: compute the face IDs.
1152:
1153: For a parameter $\alpha$ to be specified shortly, let
1154: $k= \alpha m/r$, $R$ be a sample of $k$ lines out of $L$
1155: (performed with replacement), and let $p,q$ be two
1156: points of $P$. Let $\rho = \IDL(p,q) /n$. The
1157: probability that $p,q$ will be in two different faces of
1158: $\Arr(R)$ is
1159: \[
1160: U(\rho) = 1 - (1-\rho)^k,
1161: \]
1162: as this is the probability that not all the lines will
1163: miss the segment connecting $p$ and $q$.
1164:
1165: Our target is to approximate the value of $U(\rho)$ so
1166: we could decide whether $p,q$ are close or far. Indeed,
1167: if $U(\rho) \geq U( (1+\eps)r/m )$ then $\IDL(p,q) \geq
1168: (1+\eps)r$, and if $U(\rho) \leq U( r/m )$ then
1169: $\IDL(p,q) \leq r$.
1170:
1171: To do so, we generate a set of subsets $\R = (R_1,
1172: \ldots, R_{\mu})$, by random sampling as described
1173: above, where $\mu$ would be specified shortly. Now we
1174: consider the quality of the distance approximation
1175: provided by the embedding\footnote{A similar analysis
1176: (in the context of Hamming spaces) appeared already
1177: in~\cite{i-drtpp-00}; in our case, however, we have
1178: to put more care into the analysis, since we want
1179: $\eps$ and $\eps'$ to be very close.}. Let
1180: $X(p,q)$ denote the random variable which is the number
1181: of arrangements of $\Arr(R_1), \ldots, \Arr(R_\mu)$ that
1182: have $p,q$ in different faces. Note, that $X(p,q)$ is
1183: equal to the Hamming distance between $f_\R(p)$ and $f_\R(q)$,
1184: and it thus the distance between the images of $p$ and
1185: $q$ in the new space. Clearly, as $\mu$ tends to
1186: infinity, $X(p,q)/\mu$ tends to $U(\rho)$. Using
1187: Chernoff inequality, we can quantify the quality of
1188: approximation provided by $\mu$. Specifically, let
1189: $\plow=U(r/m)$ and $\phigh=U((1+\eps)r/m)$; in the
1190: following we will make sure that $\phigh<1/2$. Then,
1191: from the Chernoff bound~\cite{mr-ra-95,mps-lpvaa-98} it
1192: follows that for any $\alpha>0$ if $\mu=C \frac{\log
1193: n}{\plow \alpha^2}$ for some constant $C$, then with
1194: high probability:
1195: \begin{itemize}
1196: \item if $\IDL(p,q) \le r$ then $X(p,q)/\mu \le
1197: \plow(1+\alpha)$
1198:
1199: \item if $\IDL(p,q) \ge r(1+\eps)$ then $X(p,q)/\mu
1200: \ge \phigh(1-\alpha)$
1201: \end{itemize}
1202: Therefore, the mapping $f_\R$ converts the distance gap
1203: $r:(1+\eps)r$ into the gap $\plow(1+\alpha)\mu :
1204: \phigh(1-\alpha)\mu$. We next fine tune $k$ (the size
1205: of each sample) so that the resulting gap will be as
1206: large as possible. (Intuitively, the larger the target
1207: gap is, the easier it is to detect it in later stages.)
1208: Therefore, in the following we focus on finding $k$ such
1209: that the ratio
1210: \[
1211: \Delta = \frac{\phigh(1-\alpha)\mu}{\plow(1+\alpha)\mu}
1212: \]
1213: is as large as possible. To this end, we observe that
1214: \begin{eqnarray*}
1215: \plow &=& U\pth{\frac{r}{m}}=1-\pth{1-\frac{r}{m}}^k
1216: \leq 1 - e^{-r k/m}\pth{ 1- \frac{(r k/m)^2}{k}}
1217: = 1 - e^{-\alpha}\pth{ 1- \frac{\alpha^2}{k}} \\
1218: &\leq&
1219: 1 - e^{-\alpha}\pth{ 1 - \alpha^2}
1220: \leq \alpha^2 + (1 - e^{-\alpha}) \pth{ 1 -
1221: \alpha^2}
1222: \leq \alpha^2 + \alpha \pth{ 1 -
1223: \alpha^2}
1224: \leq \alpha(1+\alpha)
1225: \end{eqnarray*}
1226: since $\displaystyle \pth{ 1 - \frac{t}{n}}^{n} \geq
1227: e^{-t} \pth{ 1 - \frac{t^2}{n}}$~\cite{mr-ra-95},
1228: $k=\frac{\alpha m}{r}$, and $x \geq 1 -e^{-x}$.
1229: Furthermore,
1230: \begin{eqnarray*}
1231: \phigh &=& U((1+\eps)r/m) =1-(1-(1+\eps)r/m)^k
1232: \geq 1-e^{-(1+\eps)r k/m}
1233: = 1-e^{-(1+\eps)\alpha}\\
1234: & \geq & (1+\eps)\alpha - ((1+\eps)\alpha)^2
1235: \geq (1+\eps)\alpha(1 - (1+\eps)\alpha)
1236: \end{eqnarray*}
1237: since $(1-t/n)^{n} \leq e^{-t}$~\cite{mr-ra-95} and
1238: $1-e^{-x} \ge x-x^2/2 \geq x - x^2$.
1239:
1240: Therefore
1241: \begin{eqnarray*}
1242: \frac{\phigh}{\plow} &\geq&
1243: \frac{(1+\eps)\alpha(1 -
1244: (1+\eps)\alpha)}{\alpha(1+\alpha)}
1245: \geq (1+\eps)(1 - (1+\eps)\alpha)(1-\alpha)
1246: \geq (1+\eps)(1 - (2+\eps)\alpha),
1247: \end{eqnarray*}
1248: since $1/(1+x) \geq (1-x)$. Thus, if we set $\alpha$ to
1249: be $1/\log n$, then the distance gap becomes (at least)
1250: \[
1251: \Delta = \frac{\phigh(1-\alpha)\mu}{\plow(1+\alpha)\mu} \geq
1252: (1+\eps)(1-(2+\eps)\alpha)(1-\alpha)^2 \geq
1253: (1+\eps)\pth{1 - \frac{a}{\log{n}}},
1254: \]
1255: where $a$ is an appropriate constant. Also, note that
1256: the resulting value of $\plow$ is
1257: \[
1258: \plow =
1259: 1-(1-r/m)^k
1260: \geq 1 - e^{-p_0k} = 1 -e^{-\alpha} \geq \alpha - \alpha^2/2 =
1261: \Omega(1/\log n)
1262: \]
1263: and $\mu=(C\log{n})/(z\alpha^2) = C\log^2{n}/\alpha^2 =
1264: O(\log^4 n)$. Finally, since $m/r \geq \log{n}$, we
1265: have that $k= \alpha(m/r) = (1/\log{n}) (m/r) \geq 1$
1266: (i.e., the sample size $k$ is at least $1$).
1267: \end{proof}
1268:
1269: \begin{lemma}
1270: Given a set $P$ of $n$ points, and a set $L$ of $m$
1271: lines, one can compute the function $f_\R(\cdot)$, of
1272: \lemref{good:embed}, for all the points of $P$ in
1273: $\Ot( (m^{2/3}n^{2/3} + m + n))$ expected time.
1274: \end{lemma}
1275:
1276: \begin{proof}
1277: We have to compute for each point of $P$ the face that
1278: contains it in each of the arrangements $\Arr(R_1),
1279: \ldots, \Arr(R_\mu)$, where $\mu = O( \log^4 n )$. Or
1280: alternatively, compute all the faces of $\Arr(R_1),
1281: \ldots, \Arr(R_\mu)$ that contains points of $P$. For a
1282: single arrangement $A_i$ this can be done in \linebreak
1283: $O(m^{2/3}n^{2/3} \log^{2/3}(m/\sqrt{n}) + (m +
1284: n)\log{m})$ expected time \cite{ams-cmfal-98}. Since
1285: there are $\mu$ coordinates (i.e., arrangements), the
1286: result follows.
1287: \end{proof}
1288:
1289: Thus, we showed how to embed $\IDL$ into $\mu$-dimensional
1290: Hamming space $\Sigma^{\mu}$ in $\Ot(n+m+n^{2/3}m^{2/3})$
1291: time, mapping a $(1+\eps)$ gap between close and far points
1292: into a gap of size $(1+\eps)(1-O(1)/\log{n})$, where $\mu =
1293: O(\log^4 n)$ and $\Sigma \subseteq \ZZ$ is the set of face
1294: labels we use (i.e., $|\Sigma| = O(m^2)$. By using standard
1295: embedding techniques (e.g. see~\cite{kor-esann-00}) we can
1296: embed the Hamming space $\Sigma^{\mu}$ into $\{0,1\}^D$ with
1297: $D=O(\mu \log |\Sigma| \log^2 n) = O(\log^{6}{n} \log{m})$,
1298: preserving the gap up to another factor $(1-O(1)/\log n)$.
1299: This gives an embedding of $\IDL$ into $D=O(\mu \log m
1300: \log^2 n)$-dimensional binary Hamming cube, with error
1301: $(1-O(1)/\log n)$. Thus it is sufficient for us to maintain
1302: $c$-nearest neighbor in $\{0,1\}^D$ where $c=(1+\eps)
1303: (1-O(1)/\log n)$, which takes
1304: $\Ot(n^{1/c})=\Ot(n^{1/(1+\eps/2)})$ time per operation
1305: \cite{im-anntr-98}.
1306:
1307: We conclude:
1308: \begin{theorem}
1309: By performing a $\Ot(n+m+n^{2/3}m^{2/3})$-time
1310: preprocessing, one can reduce the problem of maintaining
1311: dynamic $(1+\eps)$-approximate nearest neighbor for any
1312: $n$-point crossing metric over $m$ lines, to the problem
1313: of maintaining dynamic $(1+\eps)(1-O(1)/\log
1314: n)$-approximate nearest neighbor in Hamming space with
1315: $O(\log^\EmbedDim n)$ dimensions (assuming $m=
1316: n^{O(1)}$). The latter can be solved in
1317: $\Ot(n^{1/(1+\eps/2)})$ time per operation.
1318: \end{theorem}
1319:
1320: \remove{
1321: \subsection{Embedding of the Crossing Metric over $\Re^d$}
1322:
1323: In this Section, we extend the methods from the previous
1324: section to the crossing metric defined by
1325: $d-1$-dimensional hyperplanes in $\Re^d$, for any fixed
1326: $d \ge 2$. To this end, it is sufficient to design an
1327: efficient procedure, which given a set of $n$ points
1328: $p_1, \ldots, p_n$ and $m$ hyperplanes $H_1, \ldots,
1329: H_m$, assigns a symbol $a_i \in \Sigma$ to each $p_i$ in
1330: such a way that $a_i \neq a_j$ iff there exists $H_k$
1331: which separates $p_i$ from $p_j$. Unfortunately, the
1332: idea from the previous section does not give subquadratic
1333: time algorithm for $d>2$, since even in $d=3$ the
1334: complexity of $n$ arrangement cells from an arrangement
1335: formed by $n$ planes could be $\Omega(n^2)$.
1336: Fortunately, for our purpose, we do not need to compute
1337: the actual cells containing $p_i$s; rather, it is just
1338: sufficient to find {\em labels} of those cells.
1339:
1340:
1341: The algorithm for finding the labels is based on {\em
1342: partition trees} by \matousek{}~\cite{m-ept-92}, which
1343: are defined as follows. }
1344:
1345:
1346:
1347: \subsection{Embedding of the Crossing Metric over $\Re^d$}
1348:
1349: In this Section, we extend the methods from the previous
1350: section to the crossing metric defined by
1351: $(d-1)$-dimensional hyperplanes in $\Re^d$, for any fixed $d
1352: \ge 2$. To this end, it is sufficient to design an
1353: efficient procedure, which given a set of $n$ points $P=p_1,
1354: \ldots, p_n$ and a set of $m$ hyperplanes $\HX = \brc{H_1,
1355: \ldots, H_m}$, assigns a symbol $\sigma_i \in \Sigma
1356: \subset \ZZ$ to each $p_i$ in such a way that $\sigma_i \neq
1357: \sigma_j$ iff there exists $H_k$ which separates $p_i$ from
1358: $p_j$. Unfortunately, the idea from the previous section
1359: does not give subquadratic time algorithm for $d>2$, since
1360: even in $d=3$ the complexity of $n$ cells in an arrangement
1361: formed by $n$ planes could be $\Omega(n^2)$. Fortunately,
1362: for our purpose, we do not need to compute the actual cells
1363: containing $p_i$s. Rather, it is just sufficient to find
1364: the {\em labels} for those cells, or more specifically, a
1365: function $h: P \to \Sigma$ such that $h(p)=h(q)$ iff $p$ and
1366: $q$ belong to the same arrangement cell.
1367:
1368: Abusing notations, we denote by $H_k(p)$ the function
1369: returning $1$ if $p$ lies on one side of $H_k$ and zero
1370: otherwise. We use the following hashing function
1371: \[
1372: h(x)= \pth{\sum_i a_i H_i(x)},
1373: \]
1374: where $a_1 \ldots a_m$ are independent and identically
1375: distributed random variables with uniform distribution over
1376: $\brc{0, \ldots ,n^c}$, where $c$ is a constant to be
1377: specified shortly. Note, that if $p,q \in \Re^d$ lie in two
1378: different full-dimensional faces of $\Arr(\HX)$, then, as
1379: noted above, there must be a hyperplane $H_k \in \HX$, so
1380: that $H_k(p) \neq H_k(q)$, and say that $H_k(p) = 1$. That
1381: is, $h(p) = h'(p) + a_k$ and $h(q) = h'(q)$, where $h'(x) =
1382: \sum_{i\neq k} a_i H_i(x)$. Since the $a_i$ were picked
1383: independently, it follows that $h(p)=h(q)$ only if $h'(p) -
1384: h'(q) = a_k$. But the probability of that to happen is
1385: $1/n^c$. We conclude, that the probability of two points
1386: belonging to two different faces to be mapped to the same
1387: value by $h(\cdot)$ is $1/n^c$. Thus, since we have $O(n^2)$
1388: pairs of points to consider in our algorithm, it follows
1389: that the probability of the hashing to fail is $n^{2-c}$
1390: which can be made to be arbitrarily small by picking $c$ to
1391: be large enough.
1392:
1393: Namely, we associate a weight $a_i$ with each half-space
1394: induced by a hyperplane $H_i$. For each point $p_j$, we
1395: compute the total weight of all the half-spaces that contain
1396: it, and all the points having the same total weight are
1397: associated with the same label. Computing the weight of a
1398: point $p_j$ falls into the class of problems known as
1399: intersection-searching \cite{a-rs-97}. In particular, one
1400: can construct a data-structure in $O(m^{1+\delta})$ time, so
1401: that one can answer intersection-searching queries in $O(
1402: (n/m^{1/d}) \log^{d+1} n )$ time, where $\delta >0$ is
1403: arbitrarily small constant. As the algorithm needs to perform
1404: a linear number of such queries, we set $m= n^{2d/(d+1)}$.
1405: Thus, the algorithm computes the required labels in
1406: $O(n^{2d/(d+1) + \delta})$ time.
1407: We conclude:
1408: \begin{theorem}
1409: By performing a $O(n^{2d/(d+1)+\delta})$-time
1410: preprocessing, where $\delta >0$ is arbitrary constant,
1411: one can reduce the problem of maintaining dynamic
1412: $(1+\eps)$-approximate nearest neighbor for any
1413: $n$-point crossing metric over $n$ hyperplanes in
1414: $\Re^d$, to the problem of maintaining dynamic
1415: $(1+\eps)(1-O(1)/\log n)$-approximate nearest neighbor
1416: in Hamming space with $O(\log^\EmbedDim n)$ dimensions.
1417:
1418: \theolab{reduction}
1419: \end{theorem}
1420:
1421: \begin{remark}
1422: Note, that the constants in the bounds of Theorem
1423: \ref{theo:reduction} depend exponentially (or worse) on
1424: the dimension $d$.
1425: \end{remark}
1426:
1427:
1428: \begin{remark}
1429: As indicated in the introduction, having such a
1430: embedding, enable one to use a large collection of
1431: subquadratic approximation algorithms for the
1432: intersection metric, including dynamic amortized
1433: $\Ot(n^{4/3} + n^{1+1/c})$-time (for $d=2$)
1434: $c$-approximation algorithms for bichromatic closest
1435: pair~\cite{e-demst-95} and $\Ot(n^{4/3} +
1436: n^{1+1/c})$-time algorithms for: $c$-approximate
1437: diameter and discrete minimum enclosing ball
1438: \cite{giv-rahdp-01}, $O(c)$-approximate facility
1439: location and bottleneck matching~\cite{giv-rahdp-01}.
1440: Similar (i.e., subquadratic time) results hold for any
1441: $d>2$.
1442: \end{remark}
1443:
1444: \subsection{Computing an MST Using the Embedding}
1445:
1446: We next describe how to use the embedding described in the
1447: previous two sections, for getting an
1448: $(1+\eps)$-approximation algorithm for the MST under
1449: crossing metric. Note that everything described in this
1450: section is well known \cite{im-anntr-98}, and we provide it
1451: only for the sake of completeness. Also, the resulting
1452: algorithm is slower in the planar case than the algorithm of
1453: Section \ref{sec:speedup}.
1454:
1455:
1456: Computing the minimum spanning tree under the intersection
1457: metric, using the Kruskal's algorithm, boils down to
1458: maintaining the bichromatic nearest-neighbor pair (under
1459: the intersection metric) between two sets $P_1, P_2
1460: \subseteq P$, under insertions and deletions. A consequence
1461: of Eppstein result \cite{e-demst-95} is the following:
1462:
1463: \begin{theorem}[\cite{e-demst-95}]
1464: Given a dynamic data-structure for nearest-neighbor
1465: queries, where each insertion / deletion / query operation
1466: takes $T(n)$ time, then one can compute the MST in
1467: $O(n T(n)\log^2 n)$ time.
1468: \end{theorem}
1469:
1470: It is easy to verify that if we get a
1471: $(1+\eps)$-approximation to the MST if we use an
1472: $(1+\eps)$-approximate dynamic nearest-neighbor
1473: data-structure (Eppstein, personal communication, 1999).
1474:
1475: Namely, we need a data-structure that support dynamic
1476: approximation nearest-neighbor queries. After applying the
1477: embedding described above, we use the $\eps'$-PLEB
1478: data-structure of \cite{im-anntr-98} to maintain a
1479: $(1+\eps')$-approximate nearest neighbor in the embedded
1480: space. Specifically, we construct an $\eps$-PLEB in the
1481: embedded points. In this way, we obtain an $\eps$-PLEB for
1482: our original points (i.e., we embedded a gap to a gap, so
1483: that a close point in the embedded space, corresponds to a
1484: close point in the crossing metric) data-structure that for
1485: a query $p$ return us a point of $q \in P$ so that
1486: $\IDL(p,q) \leq (1+\eps)r$, if there exits a point $q^* \in
1487: P$ so that $\IDL(p,q^*) \leq r$.
1488:
1489: Thus, by constructing $\log_{1+\eps}n$ such data-structures,
1490: we can use binary search on those data-structures to find
1491: and $(1+\eps)$-approximate nearest neighbor to a query
1492: point. Namely, this data-structure can be used to answer
1493: approximate nearest neighbor queries for the intersection
1494: metric. For the whole scheme to work, we need those
1495: data-structures to be dynamic; i.e., support insertions and
1496: deletions of points. Fortunately, the only part of the
1497: algorithm that needs to be dynamic is the second stage that
1498: uses the data-structure of \cite{im-anntr-98} which is
1499: dynamic.
1500:
1501: We conclude:
1502: \begin{theorem}
1503: Given a set $P$ of $n$ points in the plane, and a set
1504: $L$ of $n$ lines, one can compute in $\Ot \pth{ n^{4/3}
1505: + n^{1+ 1/(1+\eps)} }$ time, a spanning tree of $P$
1506: of weight $\leq (1+\eps)\Wopt(P,L)$. The result returned
1507: by the algorithm is correct with high probability. For
1508: $d>2$ dimensions, such an MST can be approximated in
1509: $\Ot \pth{ n^{2d/(d+1) + \delta} + n^{1+ 1/(1+\eps)} }$
1510: time, where $\delta>0$ is an arbitrary constant.
1511: \end{theorem}
1512:
1513:
1514:
1515:
1516: %----------------------------------------------------------------
1517: %----------------------------------------------------------------
1518: %----------------------------------------------------------------
1519: %----------------------------------------------------------------
1520: \section{Conclusions}
1521: \seclab{conc}
1522:
1523: We presented the first $(1+\eps)$-algorithm for
1524: approximating the minimum spanning tree under the crossing
1525: metric in the plane. We also presented a subquadratic time
1526: approximation algorithms for a variety of other problems,
1527: obtained by embedding the crossing metric into higher
1528: dimensional space. The techniques used in our paper seems
1529: to be new to low-dimension computational geometry, and we
1530: believe that they might be useful for other problems in
1531: computational geometry.
1532:
1533: There are several interesting open problems for further
1534: research:
1535: \begin{itemize}
1536: \item Can the result be extended to other cases:
1537: segments or arcs instead of lines?
1538:
1539: \item Can a similar approximation algorithm be found
1540: for the case of minimum weight triangulation under the
1541: crossing metric?
1542: \end{itemize}
1543:
1544: \subsection*{Acknowledgments}
1545:
1546: The authors wish to thank Pankaj Agarwal, Boris Aronov and
1547: Micha Sharir for helpful discussions concerning the problems
1548: studied in this paper and related problems.
1549:
1550: %-------------------------------------------------------------------------
1551: % Bibliography
1552: %-------------------------------------------------------------------------
1553: \bibliographystyle{salpha}
1554: \bibliography{shortcuts,geometry}
1555:
1556:
1557:
1558: %-------------------------------------------------------
1559:
1560: \appendix
1561: \section{A Rough Approximation to the Weight of the
1562: MST in Near Linear Time}
1563: \seclab{fast:approx}
1564:
1565: In this appendix, we show how to approximate the weight of
1566: the minimum spanning tree up to roughly a factor of
1567: $O(\alpha(n)\log{n})$ if its weight is at least linear. In
1568: Section \ref{sec:speedup}, we presented a near linear time
1569: algorithm for $(1+\eps)$-approximation for the minimum
1570: spanning tree, that relies on this approximation algorithm.
1571:
1572: Underlining the approximation algorithm, is the observation
1573: that an MST for a random sample of the lines of $L$ provides
1574: a rough approximation to the weight of the MST of $L$.
1575: If the weight of the MST of the sample is near linear,
1576: we can approximate it up to a $O(\alpha(n)\log{n})$, using
1577: the following algorithm.
1578:
1579: \begin{lemma}
1580: Given a set $R$ of $r$ lines, $P$ a set of $n$ points,
1581: and $W$ a prescribed parameter, one can decide whether
1582: $\Wopt(P,R)$ is large; namely, $\Wopt(P,R) = \Omega( (r
1583: + n + W) \alpha(n)\log{n} )$. The algorithm takes $O(
1584: (r + n + W) \alpha(n)\log^2{n} )$ expected time.
1585: Furthermore, if $\Wopt(P,R) \leq W$, the algorithm will
1586: report that its weight is large with probability at most
1587: $n^{-c}$, where $c$ is an appropriate constant.
1588:
1589: \lemlab{brute:estimate}
1590: \end{lemma}
1591:
1592: \begin{proof}
1593: Use the algorithm of Theorem~\ref{theo:hs} and execute
1594: it $O(\log{n})$ times on $P$ and $R$. If the running
1595: time of the $i$-th execution of the algorithm exceeds
1596: $\Omega( (r + n + W)\alpha(n) \log{n})$ abort it, and
1597: move on to the next execution. If $\Wopt(P,R) \leq W$,
1598: then the algorithm of \cite{hs-oplpa-01-dcg} provides a
1599: spanning tree of expected weight $O( (r+n+ W)\alpha(n)
1600: \log{n})$ with the same bound on the expected running
1601: time. Thus, if in $O(\log{n})$ executions the algorithm
1602: returns always that $\Wopt$ is large, we can conclude
1603: that with probability $\geq 1 - n^{-c}$ the weight of
1604: $\Wopt(P,R)$ is not $\leq W$.
1605: \end{proof}
1606:
1607:
1608: \lemref{brute:estimate} shows that we can
1609: approximate the weight of the MST in near linear time if its
1610: weight is near linear. However, if it is heavier, we will
1611: use random sampling to keep the running time under control.
1612:
1613: Let $R \subseteq L$ be a random sample of lines out of $L$,
1614: where each line is picked independently with probability
1615: $r/n$. Clearly, the probability of an intersection point
1616: $u$ (between a connected set $\gamma$ and a line of $L$), to
1617: be present in $\Arr(R)$ is $r/n$ (this is the probability
1618: that the line of $L$ passing through $u$ will be chosen to
1619: be in the random sample).
1620:
1621: \begin{defn}
1622: For a curve $\gamma$, and a set of lines $L$, let
1623: $\weight(\gamma,L)$ denote the {\em weight} of $\gamma$
1624: in the arrangement $\Arr(L)$. This is the number of
1625: intersections of $\gamma$ with the lines of $L$.
1626: \end{defn}
1627:
1628: \begin{lemma}
1629: Let $R$ be a sample of lines of $L$ (chosen as described
1630: above), then with high probability:
1631: \[
1632: \Wopt(P,L) \leq \frac{n}{r} \pth{ c_0 n \log{n} + 2
1633: \Wopt(P,R)},
1634: \]
1635: and with probability $\geq 0.9$ we have $\frac{n}{r}
1636: \cdot \frac{\Wopt(P,R)}{10} \leq \Wopt(P,L)$, where $c_0$ is an
1637: appropriately large constant.
1638: \lemlab{wopt:sample}
1639: \end{lemma}
1640:
1641: \begin{proof}
1642: Let $\Topt^L = \Topt(P,L)$, and let $W_R =
1643: \weight(\Topt^L, R)$ be the weight of $\Topt^L$ under
1644: the crossing metric of $R$. Clearly, $E[ W_R] =
1645: \Wopt(P,L)\frac{r}{n}$. Thus, we know that with
1646: probability $\geq 0.9$ we have $W_R \leq 10 \Wopt(P,L)
1647: \frac{r}{n}$ (by Markov inequality), and with
1648: probability $\geq 0.9$, we have that $\displaystyle
1649: \Wopt^R = \Wopt(P,R) \leq W_R \leq 10 \Wopt(P, L)
1650: \frac{r}{n}$.
1651:
1652: Let $p,q \in P$ be two points, and let $X_{p q}$ be
1653: the distance between $p,q$ in the arrangement $\Arr(R)$.
1654: If the distance between $p,q$ is large, that is $U =
1655: \IDL(p,q) \geq c_0 (n/r) \log{n}$ (where $c_0$ is a
1656: large enough constant), then one can show using Chernoff
1657: inequality, that with high probability, we have:
1658: \[
1659: \frac{U}{2} \leq X_{p q} \frac{n}{r} \leq 2 U.
1660: \]
1661:
1662: On the other hand, by the above argument, each edge $e
1663: =p q$ of $\Topt^R = \Topt(P,R)$ either intersects at most
1664: $c_0 (n/r)\log{n}$ lines of $L$, or alternatively, the
1665: number of lines of $L$ intersected by $e$ is smaller
1666: than $2(n/r)X_e$, where $X_e$ is the number of lines of
1667: $R$ that $e$ intersects. Thus, with high probability,
1668: we have
1669: \begin{eqnarray*}
1670: \Wopt(P,L) &\leq & \weight( \Topt^R, L) = \sum_{e =
1671: p q \in \Topt^{R}} \IDL(p,q)
1672: \leq \sum_{e \in
1673: \Topt^{R}} \pth{ c_0 \frac{n}{r}\log{n} +
1674: 2X_e\frac{n}{r}}\\
1675: &=& c_0 \frac{n^2 \log{n}}{r}
1676: + \Wopt(P,R) \frac{2n}{r}.
1677: \end{eqnarray*}
1678: \end{proof}
1679:
1680: \begin{remark}
1681: We can make both probabilities in \lemref{wopt:sample}
1682: large by repeating the experiment $O(\log{n})$ times,
1683: and picking the smallest $W(P,R)$ computed. With high
1684: probability, we have
1685: \[
1686: \frac{n}{r} \cdot \frac{\Wopt(P,R)}{10}
1687: \leq \Wopt(P,L) \leq
1688: \frac{n}{r} \pth{ c_0 n \log{n} + 2 \Wopt(P,R)}.
1689: \]
1690: In particular, if $\Wopt(P,R) > c_0 n \log{n}$, we get
1691: that $3\Wopt(P,R)\frac{n}{r}$ is a constant factor
1692: approximation to $\Wopt(P,L)$.
1693: \end{remark}
1694:
1695: \begin{lemma}
1696: Let $r$ be a prescribed parameter, and $\Wopt =
1697: \Wopt(P,L)$. Then, an algorithm can decide whether
1698: \begin{itemize}
1699: \item $\Wopt$ is small - namely $\Wopt \leq
1700: \frac{10c_0 n^2\log{n}}{r} $.
1701:
1702: \item $\Wopt$ is large - $\Wopt = \Omega(
1703: \frac{n^2}{r} \alpha(n)\log^2{n})$.
1704:
1705: \item $\Wopt$ is in between. Any of the two above
1706: answers are valid.
1707: \end{itemize}
1708: The algorithm takes $O( n \alpha(n)\log^4{n})$ time, and
1709: returns a correct result with high probability.
1710:
1711: \lemlab{fine:estimate}
1712: \end{lemma}
1713:
1714: \begin{proof}
1715: We pick $m = O(\log{n})$ samples $R_1, \ldots, R_m$ by
1716: picking each line with probability $r/n$ into the
1717: sample. For each sample, we check whether $\Wopt(P,R_i)
1718: \leq 10 c_0 n\log{n}$, using the algorithm of
1719: \lemref{brute:estimate}. This will require
1720: $O(n\alpha(n)\log^{3}(n))$ time for each sample, and
1721: $O(n\alpha(n)\log^{4}(n))$ overall.
1722:
1723: If the algorithm of \lemref{brute:estimate}
1724: returned {\em not large} for any sample $R$, we know
1725: that $\Wopt(P,R) = O(n\alpha(n)\log^2{n})$. And by
1726: \lemref{wopt:sample}, we know that $\Wopt(P,L)
1727: = O\pth{ \frac{n^2 \alpha(n)\log^{2}n }{r} }$ with high
1728: probability. \remove{On the other hand, if all the
1729: spanning trees are ``long'' for all the samples, we
1730: know that $\Wopt(P,L) \geq \frac{n}{10r}\cdot 10 c_0
1731: n\log{n} = \frac{c_0n^2 \log{n}{r}$ with high
1732: probability by \lemref{wopt:sample}. }}
1733: \end{proof}
1734:
1735: Now, we can perform a binary search to approximate the
1736: weight of $\Wopt(P,L)$.
1737:
1738: \begin{lemma}
1739: One can compute in $O(n\alpha(n)\log^{5}{n})$ time a
1740: value $M$, so that
1741: \[
1742: \Wopt(P,L) \leq M = O(n \alpha(n) \log^2{n} + \Wopt(P,L)
1743: \alpha(n) \log n).
1744: \]
1745:
1746: \lemlab{rough}
1747: \end{lemma}
1748:
1749: \begin{proof}
1750: Use \lemref{fine:estimate}, set $r_0 = n$. In
1751: the $i$-th iteration check whether $\Wopt = \Omega\pth{
1752: \frac{n^2}{r_i} \alpha(n)\log^2{n}}$, by using the
1753: algorithm of \lemref{fine:estimate}. If it is,
1754: we set $r_{i+1} = r_i/2$, and repeat the process. We
1755: stop as soon as this check fails. Then, we know that
1756: with high probability
1757: \[
1758: \frac{10c_0 n^2\log{n}}{r_{i-1}}
1759: \leq
1760: \Wopt(P, L) = O\pth{ \frac{n^2 \alpha(n)\log^{2} n
1761: }{r_i} } = M,
1762: \]
1763: implying that $M$ is the required approximation.
1764: \end{proof}
1765:
1766: \begin{remark}
1767: Note, that if algorithm of \lemref{rough} stops after
1768: the first iteration, then $\Wopt = O(n
1769: \alpha(n)\log^2{n})$. In such a case the approximation
1770: we get is much worse then logarithmic. However, this is
1771: to some extent the easiest case: Without any sampling we
1772: get a spanning tree of near linear (or sub linear)
1773: weight.
1774: \end{remark}
1775:
1776: \end{document}
1777:
1778: %--------------------------------------------------------
1779: %
1780: % mst.tex - end of file
1781: %-------------------------------------------------------
1782:
1783: