0303:cs0303001/mst.tex

1: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

2: % mst.tex -

3: %   Approximating the minimum spanning tree under the

4: %   intersection metric.

5: %

6: % Sariel Har-Peled and Piotr Indyk

7: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

8:

9:

10: \documentclass[12pt]{article}

11: \usepackage{amstext}

12: \usepackage{amsmath,amssymb}

13: \usepackage{theorem}

14: \usepackage{enumerate}

15: \usepackage{tabularx}

16: \usepackage{graphicx}

17: \usepackage{sariel,wide}

18: \usepackage{url,hyperref}

19:

20:

21: \newcommand{\Wopt}{{\cal W}_{opt}}

22: \newcommand{\Topt}{{\cal T}_{opt}}

23: \newcommand{\Wf}{{\cal W}}

24: \newcommand{\weight}{{\mathop{\mathrm{weight}}}}

25: \newcommand{\IDL}{\D_L}

26: \newcommand{\IDX}[1]{\D_{{#1}}}

27: \newcommand{\MSTA}{\widehat{\MST}}

28: \newcommand{\ApproxMST}{{\tt ApproxMST}}

29: \newcommand{\PropagateWavefront}{{\tt PropagateWavefront}}

30: \newcommand{\PropagateApproxWavefront}{{\tt PropagateApproxWavefront}}

31: \newcommand{\cAnother}{c_1}

32: \newcommand{\cmindist}{c_5}

33: \newcommand{\cSampleProb}{c_6}

34: \newcommand{\cFarEnough}{c_7}

35: \newcommand{\cSample}{c_{samp}}

36: \newcommand{\Gadj}{G_{adj}}

37: \newcommand{\Ot}{\widetilde{O}}

38: \newcommand{\RS}{{\cal RS}}

39: \newcommand{\polylog}{\mathop{\mathrm{polylog}}}

40: \newcommand{\vL}{\vec{v}_L}

41: \newcommand{\plow}{z}

42: \newcommand{\phigh}{Z}

43: \newcommand{\HX}{{\cal H}}

44: \newcommand{\R}{{\cal R}}

45: \newcommand{\lshort}{l_{short}}

46: \newcommand{\out}{\mathrm{out}}

47: \newcommand{\Oe}{O_\eps}

48: \newcommand{\EmbedDim}{7}

49:

50: % Title

51: \title{When Crossings Count --- Approximating the Minimum

52:    Spanning Tree\thanks{A preliminary version of the paper

53:       appeared in the {\em 16th ACM Symposium of

54:          Computational Geometry}, 166--175, 2000.}}

55: %\remove{

56: \author{Sariel Har-Peled\sarielthanks{}

57:    \and

58:    Piotr Indyk\thanks{MIT Laboratory for Computer Science;

59:          545 Technology Square, NE43-373;

60:          Cambridge, Massachusetts 02139-3594;

61:          {{\tt indyk\atgen{}theory.lcs.mit.edu}}}}

62: %}

63: \date{\today}

64:

65: \begin{document}

66: %\let\ps@plain=\ps@empty

67: %\nopagenumber{}

68: \maketitle

69: %\renewcommand{\thefootnote}{}

70: %\copyrightspace{}

71: %\renewcommand\thefootnote{\arabic{footnote}}%

72: \begin{abstract}

73:     We present an $(1+\eps)$-approximation algorithm for

74:     computing the minimum-spanning tree of points in a

75:     planar arrangement of lines, where the metric is the

76:     number of crossings between the spanning tree and the

77:     lines. The expected running time of the algorithm is

78:     near linear. We also show how to embed such a crossing

79:     metric of hyperplanes in $d$-dimensions, in subquadratic

80:     time, into high-dimensions so that the distances are

81:     preserved.  As a result, we can deploy a large

82:     collection of subquadratic approximations algorithms

83:     \cite{im-anntr-98,giv-rahdp-01} for problems involving

84:     points with the crossing metric as a distance function.

85:     Applications include MST, matching, clustering,

86:     nearest-neighbor, and furthest-neighbor.

87: \end{abstract}

88:

89: \begin{figure}

90:     \centerline{\includegraphics{figs/crossing-mst}}

91:

92:     \caption{A set of lines and points, and the resulting

93:        crossing MST. Note that in this case the crossing MST

94:        is different from the Euclidean MST.}

95:

96:     \figlab{mst}

97: \end{figure}

98:

99: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

100: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

101:

102: \section{Introduction}

103:

104: Given a set of lines in the plane a natural measure of

105: distances between any two points is the number of lines one

106: has to cross to reach from one point to the other.  This is

107: a discrete distance measure that can be used to approximate

108: the Euclidean distance and other distance measures.

109: However, since this measure is defined by an arrangement of

110: lines it is not locally defined and is thus computationally

111: cumbersome. Finding the minimum spanning tree (MST) of a set

112: of points, so that the number of intersections between the

113: tree and the given set of lines is minimized, quantify how

114: the set of points interact with the set of lines; see Figure

115: \ref{fig:mst}. In fact, when the set of lines is the set of

116: all possible lines, then this MST is the standard Euclidean

117: MST \cite{af-acrsm-97} (here one minimizes the average

118: number of edges of the MST crossed when picking a random

119: line).  Such an MST is related to a spanning tree of low

120: stabbing number (STLSN) \cite{w-stlcn-92,a-idapa-91}. While

121: the spanning tree of low stabbing number guarantee that any

122: line intersects at most $O(\sqrt{n})$ edges of the spanning

123: tree, the MST guarantees that the overall number of

124: intersections between the tree and a given set of lines is

125: minimized.  Thus, if we have the set of lines in advance

126: then the MST will have overall less intersections than the

127: STLSN.  The spanning tree of low-stabbing number was used in

128: several applications, see for example

129: \cite{a-idapa-91,mww-deabv-91}.  In particular, having such

130: an MST enables one: (i) to answer half-plane range queries

131: in an efficient manner using a near linear space

132: \cite{ghs-citds-91}, (ii) bound the complexity of the faces

133: of the arrangement of lines that contain the points

134: \cite{hs-oplpa-01-dcg}, and (iii) traverse between the points in

135: an efficient way, so that the number of updates needed is

136: minimized. (Imagine traversing among the points and

137: maintaining the set of half-planes that contain the current

138: point. Each time one crosses a line an update operation is

139: performed.)

140:

141: Computing the MST for the general case of arcs can be done

142: in $O(n^2\log{n})$ time by performing wavefront propagation

143: from each of the points (see Section

144: \ref{sec:cont:dijkstra}). As for approximation algorithms,

145: Har-Peled and Sharir \cite{hs-oplpa-01-dcg} gave recently an

146: approximation algorithm for the case of arcs, computing a

147: Steiner tree in expected running time $w=O(

148: \lambda_{t+2}(n+\Wopt) \log(n))$, where $t$ is the maximum

149: number of intersections between a pair of arcs,

150: $\lambda_{t+2}(\cdot)$ is the maximum length of a

151: Davenport-Schinzel sequence of order $t$, and $\Wopt$ is the

152: weight of the optimal Steiner tree.\footnote{It is easy to

153:    verify that if we have triangle inequality then the

154:    Steiner tree weight is at least half the weight of the

155:    MST.} The algorithm outputs a tree of weight $w$ (and

156: thus gives roughly $O(\log n)$-approximation).

157:

158: In this paper, we present two results:

159: \begin{itemize}

160:     \item A near linear time $(1+\eps)$-approximation algorithm to the

161:     minimum-spanning tree under the crossing metric in the

162:     planar case.

163:

164:     \item We show to embed the crossing metric among

165:     hyperplanes into a Hamming distance in high dimensions.

166:     As a result, we show how one can apply known

167:     subquadratic approximation algorithms for problems

168:     involving point-sets and hyperplanes in high dimensions

169:     (MST, clustering, matching, etc).

170:

171:     \remove{ Intuitively, all those applications rely on a

172:        black-box (called $\eps$-PLEB in \cite{im-anntr-98})

173:        that decides whether, for a query point $q$, there is

174:        no point close to $q$ in our point-set (i.e., $\geq

175:        r$ for all points), or alternatively finds a nearby

176:        point (i.e., $\leq (1+\eps)r$) under the crossing

177:        metric, where $r$ is a prespecified threshold.

178:

179:        Using $\eps$-PLEB for the embedded points

180:        \cite{im-anntr-98} we construct the required

181:        $\eps$-PLEB for the points we started with.  For

182:        $d>2$ dimensions, we show how to embed the crossing

183:        metric induced by $n$ hyperplanes over a set of $n$

184:        points in $\Re^d$, in $O(n^{2d/(d+1) + \delta})$

185:        time, where $\delta>0$ is arbitrary.

186:

187:        we can maintain the dynamic $c$-approximate nearest

188:        neighbor problem over this $n$-point metric in

189:        $\Ot( n^{1/c})$ time per

190:        operation~\cite{im-anntr-98}.  This in turn implies

191:        dynamic amortized $\Ot(n^{4/3} +

192:        n^{1+1/c})$-time $c$-approximation algorithms for

193:        bichromatic closest pair~\cite{e-fhcoa-98} and

194:        $\Ot(n^{4/3} + n^{1+1/c})$-time algorithms for:

195:        $c$-approximate diameter and discrete minimum

196:        enclosing ball \cite{giv-rahdp-01},

197:        $O(c)$-approximate facility location and bottleneck

198:        matching (all for $d=2$) \cite{giv-rahdp-01}.  }

199:

200:     The connection between the crossing metric, and points

201:     in high dimension follows by interpreting the input

202:     points as points in abstract VC-space \cite{pa-cg-95}

203:     induced by the lines. Namely, we associate with each

204:     point in the plane, an $n$-dimensional binary vector,

205:     where $i$-th coordinate indicate on which side of the

206:     $i$-th line the point lies.  In this way, we mapped our

207:     input points into points lying on the $n$-dimensional

208:     hypercube. The crossing metric is no more than the

209:     Hamming distance between the mapped points.  We can now

210:     deploy the techniques of \cite{im-anntr-98} to those

211:     mapped points, yielding an approximation algorithm for

212:     the MST problem.  Bringing down the running time to be

213:     subquadratic requires some additional work.

214:

215:     Specifically, we show how to compute a mapping of the

216:     points into space of dimension $O(\log^\EmbedDim n)$; this

217:     embedding can be computed in $\Ot(n^{4/3})$

218:     time\footnote{Here and in the rest of this paper

219:        $f(n)=\Ot(g(n))$ iff $f(n)=g(n)

220:        (1/\eps)^{O(1)}\log^{O(1)}n$, and

221:        $f(n)={O_\eps}(g(n))$ iff $f(n)=O( g(n)

222:        /\eps^{O(1)})$}, for $n$ points, so that we get a

223:     $(1+\eps)$ gap property for a specified range of

224:     distances is preserved.

225:

226:     As a result, we can solve several approximation problems

227:     for this metric, among them is the MST problem.  In

228:     fact, our near-linear approximate MST algorithm in the

229:     plane can be roughly viewed as an unraveling of the

230:     corresponding MST approximation algorithm in high

231:     dimensions. Similar bounds can be derived for $d > 2$

232:     dimensions. See \secref{embed} for details.

233: \end{itemize}

234:

235: The paper is organized as follows: In Section

236: \ref{sec:cont:dijkstra}, we describe how one can compute the

237: exact MST using wavefront propagation.  In \secref{speedup},

238: we present the planar $(1+\eps)$-approximation algorithm for

239: the MST.  Next, in Section \ref{sec:embed}, we

240: describe the embedding into points in high dimension and

241: demonstrate its usage for computing an approximate MST.

242: Concluding remarks are given in Section \ref{sec:conc}.

243:

244:

245: \begin{figure}

246:     \begin{center}

247:         \begin{tabular}{cc}

248:             {\includegraphics{figs/mst0}}

249:             &

250:             {\includegraphics{figs/mst1}}\\

251:             (i) & (ii)\\

252:             \\

253:             {\includegraphics{figs/mst2}}

254:             &

255:             {\includegraphics{figs/mst3}}\\

256:             (iii) & (iv)\\

257:         \end{tabular}

258:     \end{center}

259:

260:     \caption{Computing the crossing MST by doing wavefront

261:             propagation. The thick lines denote the boundary

262:             of the current connected components of the

263:             spanning forest.}

264:

265:     \figlab{mst:wavefront}

266: \end{figure}

267:

268: %-------------------------------------------------------------

269: %-------------------------------------------------------------

270:

271: \section{Minimum Spanning Tree by Continuous Dijkstra}

272: \seclab{cont:dijkstra}

273:

274: In this section, we present a simple algorithm for computing

275: the crossing MST. It relies on a simple direct solution

276: interpreted as a geometric algorithm. We also present a

277: ``weight sensitive'' algorithm (\lemref{propagate}) that

278: computes portions of the MST in time proportional to its

279: overall weight.

280:

281: In the following, we assume that we are given a set $L$ of

282: lines and a set $P$ of points in the plane. For simplicity,

283: we assume $|P| = |L| = n$.

284:

285: \begin{defn}

286:     For a set $L$ of lines, the {\em crossing metric} is

287:     defined to be the minimum number of lines of $L$ that

288:     one has to cross as one moves between two prespecified

289:     points. Thus, for a pair of points $p,q \in \Re^2$ the

290:     crossing distance between $p$ and $q$, denoted by

291:     $\IDL(p,q)$, is the number of lines of $L$ that

292:     intersects the segment $p q$.  If $L$ is a set of arcs, a

293:     similar crossing metric is defined, although the

294:     ``shortest path'' in this case is no longer necessarily

295:     a straight segment.

296: \end{defn}

297:

298: \begin{defn}

299:     For a set $L$ of lines, and a set $P$ of points in the

300:     plane, let $\Topt(P,L)$ denote a minimum spanning tree

301:     of $P$ under the crossing metric induced by $L$, and let

302:     $\Wopt(P,L)$ denote the weight of $\Topt(P,L)$.

303: \end{defn}

304:

305: Let $\Arr = \Arr(L)$ denote the planar arrangement induced

306: by the lines of $L$.  Let $\Gadj = \Gadj(\A)$ be the

307: adjacency graph of $\Arr$; namely, each face of $\Arr$ is a

308: vertex, and two vertices are connected if the two

309: corresponding faces share an edge.  Let $V$ be the set of

310: vertices of $\Gadj$ that corresponds to the faces of $\Arr$

311: that contains points of $P$. Clearly, the crossing MST of

312: $P$ in $\Arr$, corresponds to the MST of $V$ in the graph

313: $\Gadj$ (here, each edge has associated weight $1$).

314:

315: Computing the MST of $V$ in $\Gadj$ can be done by performing

316: a simultaneous flooding of $\Gadj$ from the vertices of $V$.

317: Indeed, we compute in the $i$-th iteration all the vertices

318: of $\Gadj$ that are in distance $\leq i$ from any vertex of

319: $V$.  This can be easily done using a modified BFS. In the

320: beginning, the flood front is made out of $n$ connected

321: components.  Every time two connected components of the flood

322: front collide, we discovered a new edge of the MST. This edge

323: connects the two vertices that induced the two parts of the

324: wavefront that collided. This is a somewhat non-standard

325: algorithm for computing the MST, but one can easily verify

326: that it indeed computes the MST of $V$ in $\Gadj$.

327:

328: This flooding algorithm has a natural geometric

329: interpretation: Let $\F_{2i}$ denote the set of all faces of

330: $\Arr$ that are in (crossing) distance at most $i$ from any

331: point of $P$.  Clearly, $\F_0$ is the set of faces of $\Arr$

332: that contain points of $P$.  The algorithm works in $n/2$

333: phases. We do a wavefront propagation in $\Gadj$, starting

334: from all the vertices that correspond to the marked faces

335: (i.e., faces of $\Arr$ that contain points of $P$). In each

336: iteration, we propagate the wavefront from the faces of

337: $\F_{2i-2}$ into the faces of $\F_{2i}$.  It is easy to

338: verify that a connected component of the flood corresponds

339: to a connected component of the wavefront of $\F_{2i}$.

340: (Note, that two faces of $\F_{2i}$ might be adjacent but

341: belong to different wavefronts as the wavefronts did not

342: cross the separating edge yet and thus were not merged into

343: a single wavefront.)  The connected components are

344: maintained implicitly by a union-find data-structure. In

345: particular, during the $i$-th iteration of the wavefront

346: propagation in $\Gadj$, when two different connected

347: components of the wavefront collide, it corresponds to two

348: points of $P$ with crossing distance equal to $2i-1$ or $2i$

349: from each other.

350:

351: In particular, if there is an edge of the MST of weight

352: $2i-1$ or $2i$ it would be discovered when the corresponding

353: wavefronts collide.  The $i$-th iteration of the wavefront

354: propagation, corresponds to the detection of edges of weight

355: $2i-1$ and $2i$ in the MST. For the MST applications, we

356: first handle all relevant edges of weight $2i-1$, and later

357: all such edges of weight $2i$.  This requires a somewhat

358: careful implementation, and we omit the the technical but

359: straightforward details. See Figure \ref{fig:mst:wavefront}.

360:

361: Note, that the wavefront propagation can be done without

362: constructing $\Gadj$ in advance, and one can compute parts

363: of $\Gadj$ on the fly as needed (i.e., we need to compute

364: only the parts of $\Gadj$ that are covered by the wavefront,

365: or are about to be covered). Of course, in the worst case,

366: the whole graph $\Gadj$ would be computed, which takes

367: $O(n^2 \log{n})$ time (this corresponds to computing the

368: whole arrangement $\Arr(L)$).

369:

370: \begin{lemma}

371:     Given a set $L$ of $n$ lines, and a set $P$ of $n$ points, a

372:     minimum spanning tree $\Topt(P,L)$ of $P$ under the crossing

373:     metric $\IDL$ can be computed in $O(n^2\log{n})$ time.

374:

375:     \lemlab{bf:prop}

376: \end{lemma}

377:

378: \begin{remark}

379:     In the algorithm of \lemref{bf:prop} we did not use the fact

380:     that $L$ is a set of lines. The same algorithm will work for the

381:     case where $L$ is a set of arcs. Since we do not have the triangle

382:     inequality in this case, the edges of the MST are no longer line

383:     segments, but rather a Jordan arcs. (For example, imagine that the

384:     set $L$ is a single segment and we would like to connect two

385:     points that are separated by this segment. This can be done with

386:     no crossing by going ``around'' this segment.)

387: \end{remark}

388:

389: To be able to generate parts of $\Gadj$ incrementally, as we

390: perform the wavefront propagation, we need a way to compute

391: the relevant portions of $\Arr(L)$ on the fly.

392:

393: \begin{theorem}[\cite{hs-oplpa-01-dcg}]

394:     Let $L$ be a set of $n$ lines, as above, and $P$ a set

395:     of $m$ points in the plane. Then one can compute, in

396:     expected $O(\pth{ n+w +m}\alpha(n)\log{n})$ time,

397:     a Steiner tree $\MSTA$ of $P$, so that the expected

398:     weight of $\MSTA$ is $O((n+w)\alpha(n)\log{n})$,

399:     where $w= \Wopt(P,L)$ and $\alpha(n)$ is the inverse of

400:     the Ackermann function. Alternatively, one can compute

401:     the $m$ faces that contain the points of $P$ in the same

402:     time bound.

403:

404:     \theolab{hs}

405: \end{theorem}

406:

407:

408: \begin{lemma}[\cite{a-idapa-91}]

409:     There exists a Steiner tree $\MST'$ of $P$, so that

410:     $\Wopt(P,L) = O(n\sqrt{n})$, and this is tight in the

411:     worst case (even for the case the arcs are lines).

412:     \lemlab{span:tree}

413: \end{lemma}

414:

415: In the worst case, Theorem \ref{theo:hs} is inferior to

416: implicit point-location data-structures \cite{ams-cmfal-98}

417: (which can perform the implicit point-location needed in

418: roughly $O(n^{4/3})$ time for $m=n$), as implied by

419: \lemref{span:tree} (as the weight of the MST is

420: $\Omega(n^{3/2})$ in the worst case, and this is the time to

421: compute the relevant portions of the arrangement using the

422: algorithm of \theoref{hs}).  However, the running time of

423: the algorithm of \theoref{hs} is sensitive to the overall

424: weight of the MST. This would be crucial for our algorithm.

425:

426: \begin{lemma}

427:     Given a set $L$ of $n$ lines, a set $P$ of $n$ points,

428:     and a parameter $i$, one can compute, in expected

429:     $O(i(n+\Wopt)\alpha^2(n)\log{n})$ time, a minimum

430:     spanning forest of $P$ under the crossing metric $\IDL$,

431:     that connects all the points of $P$ in distance at most

432:     $\leq 2i$ from each other, where $\Wopt = \Wopt(P,L)$.

433:

434:     \lemlab{propagate}

435: \end{lemma}

436:

437: \begin{proof}

438:     The wavefront propagation on $\Gadj$ can be done using

439:     an implicit representation of the arrangement of

440:     $\Arr(L)$.  Namely, we compute the set $\F_i$ of faces

441:     of $\Arr(L)$ in distance $i$ from the points of $P$.

442:     Observe that the complexity of $\F_i$ is $O((n+\Wopt)

443:     i\alpha( n/i))$. Indeed, the points of $P$ can be

444:     connected by an arc $\gamma = \Topt(P,L)$ having

445:     $O(\Wopt)$ intersections with the lines of $L$, and let

446:     $\Arr'$ be the arrangement resulting from $\Arr$ by

447:     creating a tiny gate for each intersection of $\gamma$

448:     with the lines of $L$. The zone of $\gamma$ in $\Arr(L)$

449:     corresponds to a single face $F$ of $\Arr'$, and the

450:     faces of $\F_i$ are contained in the set of faces in

451:     distance $\leq i$ from $F$. By \cite{bds-lric-95}, the

452:     complexity of this region is $O((n+\Wopt) i\alpha(

453:     n/i))$ (this is a bound on the complexity of all the

454:     vertices in distance $\leq i$ from the face $F$.).

455:

456:     Clearly, the faces of $\F_i$ have a spanning tree of

457:     weight $O((n+\Wopt) i\alpha( n/i))$, and so it can be

458:     computed in an online fashion in $O((n+\Wopt) i\alpha^2

459:     ( n) \log{n})$ expected time, by Theorem \ref{theo:hs}.

460: \end{proof}

461:

462: \begin{figure*}[tb]

463:     \vspace{-0.5cm}

464:     \begin{center}

465:         \fbox{

466:            \begin{program}

467:                \> \>{\large{\sc{Algorithm}}}\ \ \

468:                \Proc{\ApproxMST{}($P, L, \eps$)} \\

469:                \> \>{\tt Input:} {\rm{A set of points $P$,

470:                      a set of lines $L$, and an approximation

471:                      parameter $\eps$}}\\

472:                \> \>{\tt Output:} A spanning tree of $P$ of

473:                weight $\leq (1+\eps)\Wopt(P,L)$\\

474:                \> \Procbegin \\

475:                \>\> $M \leftarrow $ Approximate the weight

476:                of MST

477:                using the algorithm of Lemma \lemref{rough}.\\

478:                \> \> $l_0 \leftarrow \max \pth{ \frac{\eps

479:                      M}{\cmindist n \alpha(n)\log^2{n}}, 1}

480:                $\\

481:                \> \> Set $F = (P, \emptyset)$ to be the

482:                an empty spanning forest of $P$.\\

483:

484:                \>\>\PropagateApproxWavefront{}( $P$, $L$,

485:                $l$, $F$ )\\

486:

487:                \>\> $i \leftarrow 1$\\

488:                \> \> \While $F$ is not a single connected

489:                component \Do\\

490:                \>\>\> $l_i \leftarrow l_{i-1} \cdot 2$\\

491:                \>\>\>\PropagateApproxWavefront{}( $P$, $L$,

492:                $l$, $F$ )\\

493:                \>\>\> $i \leftarrow i + 1$\\

494:                \> \> \End \While\\

495:                \>\>\\

496:                \>\>\Return $F$ \\

497:                \>\Endproc{\ApproxMST{}}

498:            \end{program}

499:         }

500:     \end{center}

501:     \vspace{-0.5cm}

502:     \caption{Approximating the MST in the Plane}

503:     \figlab{alg:mst2}

504: %    \vspace{-0.5cm}

505: \end{figure*}

506:

507: \begin{figure*}[tb]

508:     \vspace{-0.5cm}

509:     \begin{center}

510:         \fbox{

511:         \begin{program}

512:             \> \>{\large{\sc{Algorithm}}}\ \ \

513:             \Proc{\PropagateWavefront{}( $P$, $R$, $l$, $F$ )}\\

514:             \> \>{\tt Input:} ~

515:             {\rm{$P$ -  set of points}}\\

516:             \>\>\>\>\>{\rm{$R$ - set of lines}}\\

517:             \>\>\>\>\>{\rm{$l$ - propagation distance}}\\

518:             \>\>\>\>\>{\rm{$F$ - current spanning forest}}\\

519:             \> \>{\tt Output:} An updated forest $F$ with

520:             any pair of points of distance $\leq 2l$ in a\\

521:             \>\>\>\>\> single

522:             connected component\\

523:             \> \Procbegin \\

524:             \>\> Initialize the data-structure $D(R)$ of

525:             \cite{hs-oplpa-01-dcg} for online point-location.\\

526:             \> \> Set $W_0$ to be the set of faces of

527:             $\Arr(R)$ that contains points of $P$.\\

528:             \>\>\>\>Use

529:             $D(R)$ to compute those faces.\\

530:             \> \> \For $i=1, \ldots, l$ \Do\\

531:             \>\>\> $W_i \leftarrow$

532:             Set of faces of

533:             $\Arr(R)$ of distance $=i$

534:             from points of $P$.\\

535:             \>\>\>\>\>Do wavefront propagation from

536:             $W_{i-1}$, and use $D(R)$ to retrieve

537:             \\

538:             \>\>\>\>\>the faces of interest in $\Arr(R)$. \\

539:             \>\>\> \If two different wavefronts collide

540:             \Then\\

541:             \>\>\>\> Add

542:             an edge connecting the two corresponding points

543:             to $F$\\

544:             \>\>\>\>Merge the corresponding connected

545:             components.\\

546:             \>\> \Endfor\\

547:

548:             \>\Endproc{\PropagateWavefront{}}

549:         \end{program}

550:         }

551:     \end{center}

552:     \vspace{-0.5cm}

553:     \caption{Doing the wavefront propagation}

554:     \figlab{alg:propagate}

555:     \vspace{0.5cm}

556: \end{figure*}

557:

558:

559: \begin{figure*}[tb]

560:     \vspace{-0.5cm}

561:     \begin{center}

562:         \fbox{

563:         \begin{program}

564:             \> \>{\large{\sc{Algorithm}}}\ \ \

565:             \Proc{\PropagateApproxWavefront{}( $P$, $L$, $l$, $F$ )}\\

566:             \> \>{\tt Input:} ~

567:             {\rm{$P$ -  set of points}}\\

568:             \>\>\>\>\>{\rm{$L$ - set of lines}}\\

569:             \>\>\>\>\>{\rm{$l$ - starting propagation distance}}\\

570:             \>\>\>\>\>{\rm{$F$ - current spanning forest}}\\

571:             \> \>{\tt Output:} An updated forest $F$ with

572:             any pair of points of distance $\leq 2l$ in a\\

573:             \>\>\>\>\> single

574:             connected component\\

575:             \> \Procbegin \\

576:             \>\>\> Compute a random sample $R$ by choosing

577:             each line of $L$ into the sample with

578:             \\

579:             \>\>\> \> \> probability $f(l) = 128

580:             \cSampleProb

581:             \frac{\log{n}}{l\eps^2}$\\

582:             \> \> \> {\tt /* Approximate the wavefront propagation

583:                in $A(L)$ by doing}\\

584:             \>\>\>\> {\tt it (exactly) in $\Arr(R)$ */}\\

585:             \> \> \> \PropagateWavefront{}( $P$, $R$, $\cFarEnough

586:             \log{n}/\eps^2$, $F$ )\\

587:             \>\>\>\>\>\>{\tt /*

588:             $\cFarEnough$  is an appropriate constant */}\\

589:             \>\Endproc{\PropagateApproxWavefront{}}

590:         \end{program}

591:         }

592:     \end{center}

593:     \vspace{-0.5cm}

594:     \caption{Doing the approximate wavefront propagation}

595:     \figlab{alg:propagate:x}

596:     \vspace{0.5cm}

597: \end{figure*}

598:

599:

600:

601:

602:

603: %-------------------------------------------------------

604: %-------------------------------------------------------

605: \section{Approximation Algorithm for the Planar Case}

606: \seclab{speedup}

607:

608: The algorithm is depicted in \figref{alg:mst2},

609: \figref{alg:propagate} and \figref{alg:propagate:x}. We next

610: describe the algorithm and its analysis in more detail.

611:

612: \lemref{propagate} provides us with an algorithm for

613: approximating the MST in roughly quadratic time in the worst

614: case.  To get a near linear running time, we simulate the

615: Dijkstra algorithm by performing the wavefront propagation

616: in an approximate fashion.

617:

618:

619: \begin{defn}

620:     A metric $\D'$ {\em $\eps$-approximates} a metric $\D$,

621:     if for any $p,q,r,s \in P$ such that $\D'(p,q) \leq

622:     \D'(r,s)$ then $\D(p,q) \leq (1+\eps)\D(r,s)$.

623: \end{defn}

624:

625: \begin{defn}

626:     For a set $F$ of segments in the plane, and a metric $\D$,

627:     let $\weight_\D(F) = \sum_{e \in F} \D(e)$ denote the

628:     total weight of $F$ under the metric $D$.

629: \end{defn}

630:

631: The proof of the following lemma is straightforward, and is

632: included only for the sake of completeness.

633: \begin{lemma}

634:     Let the metric $\D'$ be an $\eps$-approximation to the

635:     metric $\D$ over a point-set $P$. Let $T'$ be an MST of

636:     $P$ under $\D'$. Then, $\weight_\D(T') \leq

637:     (1+\eps)\weight_\D(T)$, where $T$ is the MST of $P$

638:     under $\D$, and $\weight(T)$ is the total weight of the

639:     edges of $T$.

640:

641:     \lemlab{approx:mst}

642: \end{lemma}

643:

644: \begin{proof}

645:     Let $e_1', \ldots, e_{n-1}'$ be the the edges of $T'$

646:     sorted by their weight $\D'(e_1') \leq \ldots \leq

647:     \D'(e_{n-1}')$. Let $T_0 = T$, and let $T_i$ be the tree

648:     resulting from removing the heaviest edge (according to

649:     $\D'$) from the cycle present in $T_{i-1} \cup

650:     \brc{e_i'}$ (if $e_i'$ is already in $T_{i-1}$ we do

651:     nothing). Let $e_i$ denote this removed edge. Clearly,

652:     $\D'(e_i') \leq \D'(e_i)$ and, by definition, $\D(e_i')

653:     \leq (1+\eps) \D(e_i)$. Namely, we replaced an edge

654:     $e_i$ by an edge $e_i'$ which is heavier by a factor of

655:     $(1+\eps)$. In the end of the process $T_{n-1}$ is just

656:     $T'$, and $\weight_\D(T') \leq

657:     \sum_{i=1}^{n-1}(1+\eps)\weight_\D(e_i) \leq

658:     (1+\eps)\weight_\D(T)$.

659: \end{proof}

660:

661: \lemref{approx:mst} suggest that if we can find a

662: computationally cheaper approximate metric than

663: $\IDL(\cdot,\cdot)$, then we can use it to compute the MST.

664: A natural way to do that, is to randomly sample a subset $R

665: \subseteq L$, and use $\IDX{R}( \cdot, \cdot )$ as the

666: approximate metric. However, it is easy to verify that

667: $\IDX{R}$ is an $\eps$-approximate metric to $\IDL$ only if

668: $L = R$.

669:

670: \begin{defn}

671:     Let $\D',\D$ be two metrics, $\eps > 0$, and $l$ be

672:     prescribed parameters.  The metric $\D'$ is an {\em

673:        $(\eps,l)$-approximation} to $\D$, if for any

674:     $p,q,r,s \in P$, such that (i) $\D( p,q), \D(r,s) \geq

675:     l$, and (ii) $\D'(p,q) \leq \D'(r,s)$, we have $\D(p,q)

676:     \leq (1+\eps)\D(r,s)$.

677:

678:     Namely, $\D'$ $\eps$-approximates $\D$ for distances not

679:     smaller than $l$.

680: \end{defn}

681:

682: \begin{defn}

683:     For $l, \eps$, let $\nu(l, \eps ) = \max \pth{ 128

684:        \cSample \frac{\log{n}}{l\eps^2}, 1 }$, where

685:     $\cSample$ is an appropriate constant. Let $\RS( L, l,

686:     \eps)$ be a random subset of $L$ generated by picking

687:     independently each line of $L$ with probability

688:     $\nu(l,\eps)$.

689:

690:     Let $\rho(l, \eps) = \nu(l, \eps) l = 128 \cSample

691:     \frac{\log{n}}{\eps^2}$. The value $\rho(l,\eps)$ is the

692:     expected crossing distance in $\Arr(\RS(L, l, \eps))$

693:     between two points $p, q \in P$ such that $\IDL(p,q) =

694:     l$.

695:     \deflab{def:sample}

696: \end{defn}

697:

698: \begin{lemma}

699:     Let $L$ be a set of $n$ lines in the plane, $l$ a

700:     positive integer number, $\eps >0$, and let $R = \RS(L,

701:     l, \eps)$ be a random subset of $L$.

702:

703:     For any two points $p,q$ of distance $\IDL(p,q) \geq l$

704:     from each other we have

705:     \[

706:     \IDL(p,q) \leq \frac{n}{r(1-\eps/4)}\cdot \IDX{R}(p,q)

707:     \leq (1+\eps)\IDL(p,q),

708:     \]

709:     with probability $\geq 1-n^{-c_0}$.

710:

711:     Furthermore, $\IDX{R}( \cdot, \cdot)$ is an

712:     $(\eps,l)$-approximation to $\IDL(\cdot, \cdot)$ with

713:     high probability.

714:

715:     \lemlab{good:estimate}

716: \end{lemma}

717:

718:

719: \begin{proof}

720:     Indeed, let $X_{p q} = D_R(p,q)$. We have,

721:     \begin{eqnarray*}

722:         \mu = E[ X_{p q} ] = \IDL(p,q)\cdot \nu( l, \eps )

723:         \leq 128 \IDL(p,q) \cSample \frac{\log{n}}{l \eps^2}

724:         \geq \frac{128 \cSample

725:            \log{n}}{\eps^2}.

726:     \end{eqnarray*}

727:

728:     By Chernoff inequality \cite{mr-ra-95,mps-lpvaa-98}, we have that

729:     \begin{eqnarray*}

730:         P \pbrc{ \cardin{X_{p q} - \mu} > \frac{\eps}{4}\mu} &\leq& 2

731:         \pth{ \frac{e^{\eps/4}}{{\pth{1 + \frac{\eps}{4}}^{1+

732:                     \eps/4}}}}^\mu

733:         = 2 \exp \pth{\mu \pth{ \frac{\eps}{4} -

734:               \pth{1+\frac{\eps}{4}}\log \pth{ 1 + \frac{\eps}{4}}}} \\

735:         &\leq& 2 \exp \pth{\mu \pth{

736:               \frac{\eps}{4} - \pth{1+\frac{\eps}{4}}

737:               \pth{ \frac{\eps}{4} - \frac{\eps^2}{32}}}}\\

738:         &\leq& 2

739:         \exp \pth{-\mu \frac{\eps^2}{64}}

740:         \leq

741:         \exp \pth{- \frac{128 \cSample

742:               \log{n}}{\eps^2} \cdot \frac{\eps^2}{64}}

743:         \leq n^{-\cSample},

744:     \end{eqnarray*}

745:     since $\log(1+x) \geq x - x^2/2$, for $0 \leq x \leq 1$. In

746:     particular, this implies that with high probability

747:     $\mu(1-\eps/4) \leq X_{p q} \leq \mu

748:     (1+\eps/4)$. Namely, with high probability we have

749:     \begin{eqnarray*}

750:         \IDL(p,q) &\leq& \frac{X_{p q}}{\nu(l, \eps )(1-\eps/4)}

751:         \leq

752:         \frac{\nu(l, \eps )(1+\eps/4)}{\nu(l, \eps )(1-\eps/4)} \IDL(p,q)

753:         =

754:         \frac{1+\eps/4}{1-\eps/4} \IDL(p,q) \\

755:         &\leq &

756:         (1+\eps)\IDL(p,q).

757:     \end{eqnarray*}

758:

759:     Consider now four points $p,q,r,s$, such that

760:     $\IDL(p,q), \IDL(s,t) \geq l$ and $\IDX{R}(p,q) \leq

761:     \IDX{R}(r,s)$. By the above discussion, we have with

762:     high probability

763:     \[

764:     \IDL(p,q) \cdot \nu(l,\eps) (1-\eps/4) \leq \IDX{R}(p,q)

765:     \leq \IDX{R}(r,s) \leq (1+\eps) \IDL(r,s) \cdot

766:     \nu(l,\eps)(1-\eps/4).

767:     \]

768:     Namely, $\IDL(p,q) \leq (1+\eps)\IDL(r,s)$. Namely,

769:     $\IDX{R}(\cdot, \cdot)$ is an $(\eps,l)$-approximation

770:     to $\IDL(\cdot, \cdot)$ with probability $\geq 1 - {n

771:        \choose 2} n^{-\cSample}$.

772: \end{proof}

773:

774: \lemref{good:estimate} and \lemref{approx:mst} suggest that

775: we compute the MST by computing an appropriate random sample

776: $R$ (by using a threshold $l$), and deploy the algorithms of

777: \secref{cont:dijkstra} to compute the MST of $P$ in

778: $\Arr(R)$. Such an MST would be an approximate MST. There

779: are two main problems with this approach: (i) For short

780: distances (i.e., $l=1$), just starting the wavefront

781: propagation (i.e., \lemref{propagate}) is prohibitively

782: expensive (it roughly takes $O(\Wopt(P,L))$ time which might

783: be $\Omega(n^{3/2})$), (ii) For long distances (i.e., $\geq

784: i \cdot l$), the wavefront propagation becomes, again,

785: prohibitly expensive (i.e. $\Ot(ni)$) by

786: \lemref{propagate}.

787:

788: \begin{corollary}

789:     Let $U$ be the total weight of all the edges of $\T$

790:     having weigh less than $\eps\Wopt(P,L)/(10n)$. Then $U

791:     \leq \eps\Wopt(P,L)/10$.

792:     \corlab{idiotic}

793: \end{corollary}

794:

795: \lemref{rough} describes how we can approximate $\Wopt(P,L)$

796: to within a polylogarithmic factor using random sampling in

797: near linear time.  Since the algorithm of this lemma is very

798: similar to the techniques used below, we defer its

799: description to the appendix.  Equipped with such

800: approximation $M$, we know by \corref{idiotic} that we do

801: not ``care'' about edges of the MST of length smaller than

802: $l_0 = O(\eps M/(n \polylog(n)))$.  In particular, we can

803: generate a random sample $R_0$ which provides an

804: $(\eps,l_0)$-approximation to $\IDL(\cdot, \cdot)$. Thus, we

805: can approximate the MST by computing the MST of

806: $\Topt(P,R_0)$.

807:

808: This, however, does not address the second problem. Indeed,

809: computing the MST of $\Topt(P,R_0)$ might still be too

810: expensive, as the following lemma testifies.

811:

812:

813: \begin{lemma}

814:     Given a set $L$ of $n$ lines, a set $P$ of $n$ points,

815:     and parameters $l, i, \eps, U$, such that $l

816:     =\Omega\pth{ \Wopt(P,L)/(n U) }$ and let $R =\RS(L, l,

817:     \eps)$ be a random sample of $L$.  Then, one can

818:     compute, in expected $\Ot (i U n)$ time, a minimum

819:     spanning forest of $P$ under the crossing metric

820:     $\IDX{R}$, that connects all the points of $P$ in

821:     distance at most $\leq 2i$ from each other.

822:

823:     \lemlab{propagate:ext}

824: \end{lemma}

825:

826: \begin{proof}

827:     Let $X$ denote the size of $R$. Clearly, The expected

828:     value of $X$ is

829:     \[

830:     E[X] = n \nu(l, \eps) = 128 n \cSample

831:     \frac{\log{n}}{l\eps^2} = O \pth{ \frac{U n^2 \log

832:           n}{\eps^2\Wopt(P,L)}},

833:     \]

834:     by \defref{def:sample}.  Let $\gamma = \Topt(\gamma,L)$.

835:     Let $Y = \weight(\gamma, R)$. Clearly,

836:     \[

837:     E[Y] = \weight(P,R) = \Wopt(P,L) \nu(l, \eps) = O \pth{

838:        \frac{U n \log n}{\eps^2}}.

839:     \]

840:     Namely, $E[ \Wopt(P,R) ] \leq E[ Y] = O \pth{ \frac{ U n

841:           \log n}{\eps^2}}$.  The running time bound now

842:     follows immediately by applying the algorithm of

843:     \lemref{propagate} to $P$ and $R$.

844: \end{proof}

845:

846: The algorithm of \lemref{propagate:ext} first performs

847: wavefront propagation for distances in $\Arr(R)$ which are

848: smaller than $\rho( l, \eps)$.  For such distances $\Arr(R)$

849: {\em does not provide} reliable estimate (i.e., ordering) of

850: the crossing distances between points.  However, once the

851: distances propagated exceed $\rho(l,\eps)$, we know by

852: \lemref{good:estimate} that the distances are now

853: $(\eps,l)$-approximated correctly. The main importance of

854: the algorithm of \lemref{propagate:ext} is that the

855: algorithm has near linear running time for small values of

856: $U$ and $i$.

857:

858: Using \lemref{propagate:ext} together with \corref{idiotic}

859: implies that we can compute a spanning forest for the

860: ``short'' edges of $\Topt(P,L)$ in near linear time.

861:

862: \begin{lemma}

863:     Given a set $P$ of $n$ points in the plane, and a set

864:     $L$ of $n$ lines in the plane. One can compute a

865:     spanning forest $F$ of $P$, such that the weight of $F$

866:     is $\leq \eps\Wopt(P,L)/10$. Furthermore, every pair of

867:     points of $P$ in distance $\Omega( \Wopt(P,L)\eps/(n

868:     \log^3 n) )$ belong to the same connected components of

869:     $F$. The running time of this algorithm is $\Ot \pth{ n

870:     }$.

871:

872:     \lemlab{start:forest}

873: \end{lemma}

874: \begin{proof}

875:     Using the algorithm of \lemref{rough}, compute in

876:     $\Ot(n)$ time, a number $M$ such that $\Wopt(P,L) \leq M

877:     = O(n \alpha(n) \log^2{n} + \Wopt(P,L) \alpha(n) \log

878:     n)$. In particular, let

879:     \begin{equation}

880:         \lshort = \frac{\eps M}{\cAnother n\log^3{n}} \leq

881:         \frac{\eps}{40n}\Wopt(P,L),

882:         \eqlab{specify:l}

883:     \end{equation}

884:     for $\cAnother$ large enough. On the other hand,

885:     $\lshort = \Omega( \Wopt(P,L)/ ( U n) )$, where $U =

886:     O((\log^3{n})/\eps)$.

887:

888:     We now compute a spanning forest for $P$, using

889:     \lemref{propagate:ext} with $\lshort$ and $U$ as specified and

890:     $i= 2\rho(l,\eps)$. The running time of this algorithm

891:     is

892:     \[

893:     \Ot \pth{i U n} = \Ot \pth{ \rho(\lshort,\eps) n } = \Ot \pth{

894:        \frac{\log{n}}{\eps^2} \cdot n } = \Ot\pth{n}.

895:     \]

896:

897:     Clearly, $F$ has at most $n$ edges, and all the points

898:     of $P$ in distance $\leq \lshort$ are in the same connected

899:     component of $F$ by \lemref{good:estimate}.

900:

901:     Furthermore, for any edge $p q$ of $F$, we have that

902:     with high probability

903:     $\IDL(p,q) \leq 2(1+\eps)\lshort \leq 4\lshort$ by

904:     \lemref{good:estimate}. In particular, $\weight(F, L)

905:     \leq 4n\lshort \leq (\eps/10)\Wopt(P,L)$.

906: \end{proof}

907:

908: \lemref{start:forest} implies that we can compute a cheap

909: spanning forest of $P$ in near linear time that ``captures''

910: all the light edges of the MST. Next, we can compute the

911: rest of the edges of the MST using \lemref{propagate:ext}

912: repeatedly.

913: \begin{lemma}

914:     Given a set $P$ of $n$ points in the plane, and a set

915:     $L$ of $n$ lines in the plane, a parameter $\eps>0$, and

916:     a spanning forest $F$ of $P$, such that every pair of

917:     points of $P$ in distance $\leq l$ belong to the same

918:     connected components of $F$, where $l = \Omega(

919:     \Wopt(P,L)\eps/(n \log^3 n) )$. Then, one can compute a

920:     spanning forest $F'$ of $P$ such that all the points of

921:     $F$ in distance $\leq 2l$ belong to the same connected

922:     component of $F'$.  The forest $F'$ can be computed in

923:     $\Ot \pth{ n}$ expected time.

924:

925:     \lemlab{round:forest}

926: \end{lemma}

927:

928: \begin{proof}

929:     We use the same algorithm of \lemref{start:forest}, with

930:     the modification that when calling to the algorithm of

931:     \lemref{propagate:ext}, we pass on $F$, such that the

932:     algorithm ignore generated edges that belong to the same

933:     connected component of $F$. It is again clear, that only

934:     edges of length between $l$ and $2 (1+\eps)l$ would be

935:     added to the spanning forest. The exact details of how

936:     to specify $U$ and $i$ are similar to

937:     \lemref{start:forest}, and are omitted.

938: \end{proof}

939:

940: Our algorithm for computing the MST works by using

941: \lemref{start:forest}. This results in a spanning forest

942: $F_0$ of the points of $P$, and a value $\lshort$ as

943: specified by \eqref{specify:l}. We now use

944: \lemref{round:forest} repeatedly $O(\log{n})$ times, in the

945: $i$-th iteration handling distances between $2^{i-1}\lshort$

946: to $2 \cdot 2^i \lshort (1+\eps)$, for $i=1, \ldots,

947: O(\log{n})$), till we handle all distances $\leq n$. Namely,

948: in the $i$-th iteration, we compute a spanning forest $F_i$

949: of all points in distance $\leq 2^i\lshort$ from each other

950: using \lemref{round:forest} using $F_{i-1}$ as our

951: ``starting'' spanning forest.

952:

953: Clearly, the expected running time of the resulting

954: algorithm is $\Ot \pth{ n }$.  What is not clear, is that

955: the resulting MST is indeed an $\eps$-approximate MST.

956:

957: \begin{lemma}

958:     With high probability, the tree $T$ computed by the above

959:     algorithm is an $\eps$-MST of $P$ in $\Arr(L)$.

960: \end{lemma}

961:

962: \begin{proof}

963:     All the edges generated by the algorithm of

964:     \lemref{start:forest}, in the first stage of the

965:     algorithm, have total weight $\leq (\eps/20) \Wopt(P,L)$

966:     with high probability.

967:

968:     Let $\Topt(P,L)$ be the optimal spanning tree. If $T$ is

969:     not an $\eps$-approximate MST, then $\weight_{\IDL}(T)>

970:     (1+\eps)\weight_{\IDL}(\Topt)$. In particular, there

971:     must be an edge of $\Topt$ which its insertion into $T$

972:     would results in substantially lightly spanning tree.

973:     Formally, for an edge $e$, let $T(e)$ be the tree

974:     resulting from $T$ by inserting $e$ into $T$, and

975:     removing from $T$ the heaviest (according to $\IDL$)

976:     edge on the new cycle that was created, and let

977:     $\out(T,e)$ denote this ``ejected'' edge.

978:

979:     Arguing as in the proof of \lemref{approx:mst}, it must

980:     be that there exists an edge $\phi=p q$ of $\Topt$ such that

981:     \[

982:     (1+\eps)\IDL(\phi) < \IDL( \out(T,\phi) ),

983:     \]

984:     and $\IDL(\phi) > \Wopt(P,L)/(20n)$.

985:

986:     Let $i$ be the index such that $2^{i-1} \lshort \leq

987:     \IDL(\phi) \leq 2^i \lshort$. With high probability, we

988:     know that after the $i$-th iteration $p$ and $q$ are in

989:     the same connected component of $F_i$. Assume that $p$

990:     and $q$ were not in the same connected component of

991:     $F_{i-1}$ (the other case is easier and as such is

992:     omitted).

993:

994:     Let $T''$ be the spanning forest maintained by the

995:     algorithm just after $p$ and $q$ were present in the

996:     same connected component.  With high probability, for

997:     any edge $e''$ of $T''$, we have $\IDL(e'') \leq

998:     (1+\eps)\IDL(\phi)$, since the random sample $R_i$ we

999:     used in the $i$-iteration is $(2^{i-1}

1000:     \lshort,\eps)$-approximation to $\IDL$.

1001:

1002:     But then, it is not possible that the algorithm added

1003:     $\out(T,\phi)$ to the spanning tree $T''$, as all the

1004:     edges on the cycle in $T'' \cup \brc{\phi}$ are lighter

1005:     than $(1+\eps)\IDL( \phi)$. A contradiction.

1006: \end{proof}

1007:

1008: We summarize our result:

1009: \begin{theorem}

1010:     Given a set $P$ of $n$ points in the plane, $L$ a set of

1011:     $n$ lines, and $\eps > 0$ a parameter. Then one can

1012:     compute a spanning tree $T$ of $P$, in $\Ot \pth{ n }$

1013:     expected time, such that $\weight(T, L) \leq

1014:     (1+\eps)\Wopt(P,L)$. The result is correct with high

1015:     probability.

1016: \end{theorem}

1017:

1018:

1019:

1020:

1021:

1022:

1023:

1024:

1025:

1026:

1027:

1028:

1029:

1030:

1031: %----------------------------------------------------------------

1032: %----------------------------------------------------------------

1033:

1034: \section{Approximation Algorithms for the Intersection

1035:    Metric via Embeddings}

1036:

1037: \seclab{embed}

1038:

1039: Let $P=\brc{p_1, \ldots, p_n}$ be a given set of $n$ points,

1040: and $L = \brc{l_1, \ldots, l_m}$ be a set of $m$ lines,

1041: where $m= n^{O(1)}$.  As mentioned earlier, the metric

1042: $\IDL$ is computationally cumbersome. One possible way to

1043: overcome this problem, is to embed this metric into a more

1044: convenient metric (while introducing a small distortion

1045: error).

1046:

1047: In this section, we show a somewhat weaker result. We show

1048: how to embed the points of $P$ into $O(\log^\EmbedDim

1049: n)$-dimensional space in $\Ot(n+m+n^{2/3}m^{2/3})$ time, so

1050: that a specific distance gap in the crossing metric, is

1051: mapped to a corresponding gap in the target space.

1052:

1053: We first observe that the crossing distance between two

1054: points $p$ and $q$, can be computed by interpreting this

1055: distance as a Hamming distance on the hypercube in $m$

1056: dimensions induced by the lines. Namely, each line $l$

1057: contribute a coordinate --- a point gets a '1' in this

1058: coordinate if it is on one side of $l$, and a '0' if it is

1059: on the other side of $l$. Formally, let $l^+$ denote the

1060: open half-plane defined by a line $l$ that contains the

1061: origin, and $l^-$ denote the other open plane.  For a point

1062: $p \in \Re^2$, let $\vL(p) = (b_1, \ldots, b_m)$ be a

1063: $m$-bit vector so that $b_i=1$ {\bf iff} $p \in l_i^+$.  It

1064: is easy to verify that $\IDL(p,q) = d_H(\vL(p), \vL(q))$,

1065: where $d_H$ is the Hamming distance.

1066:

1067: \remove{

1068: On this mapped set, we can now deploy several approximation

1069: algorithms for points in high-dimension. However, all those

1070: algorithms first need to read all their input, which

1071: requires $\Omega(nm)$ time. A standard technique to reduce

1072: the dimension of the input (and thus its size), while

1073: preserving distances between points, is to use dimension

1074: reduction techniques \cite{jl-elmih-84,im-anntr-98}.  We

1075: next show how one performs a (somewhat restricted) dimension

1076: reduction in an implicit way, by using the underlining

1077: geometry in $o(m n)$ time.

1078: }

1079:

1080: \begin{defn}

1081:     Let $R \subseteq L$, let $f_R:\Re^2 \rightarrow \ZZ$ be

1082:     the mapping that maps a point $p$ in the plane to its

1083:     face ID in the arrangement $\Arr(R)$. Formally, we

1084:     assign for each face in the arrangement $\Arr(R)$ a

1085:     unique integer (say, and integer between $1$ and

1086:     $O(|R|^2)$). The mapping $f_R$ maps a point $p$ in the

1087:     plane to the integer identifying the face that contains

1088:     $p$. (Note, that is does not uniquely define

1089:     $f_R(\cdot)$ as we did not specify how we assign the IDs

1090:     to the faces.)

1091:

1092:     For a set $\R = (R_1, \ldots, R_\mu)$ of subsets of $L$,

1093:     let $f_\R:\Re^2 \rightarrow \ZZ^\mu$ be the mapping

1094:     $f_\R(p) = ( f_{R_1}(p), f_{R_2}(p), \ldots,

1095:     f_{R_\mu}(p))$. For two points $p,q \in \Re^2$, let

1096:     $d_H(f_\R(p),f_\R(q))$ be the Hamming distance between

1097:     $f(p)$ and $f_\R(q)$. Namely, this is the number of

1098:     coordinates, where the two vectors $f_\R(p)$ and

1099:     $f_\R(q)$ disagree.

1100:

1101:     One can view $f_\R$ as an embedding of the crossing

1102:     metric $\IDL$ to the Hamming space $\ZZ^\mu$.

1103: \end{defn}

1104:

1105: \begin{lemma}

1106:     Given a set $P$ of $n$ points in the plane, a set $L$ of

1107:     lines in the plane, a parameter $\eps > 0$ and a

1108:     parameter $r$. One can compute a set $\R$ of $\mu$

1109:     subsets of $L$, such that for the embedding $f_\R:\Re^2

1110:     \rightarrow \ZZ^\mu$, we have that, with high

1111:     probability, for any $p,q \in P$ it holds:

1112:     \begin{itemize}

1113:         \item If $\IDL(p,q) \leq r$, then $d_H(f(p), f(q))

1114:         \leq M$,

1115:         \item If $\IDL(p,q) \geq (1+\eps)r$ then $d_H(f(p),

1116:         f(q)) \geq (1+\eps)(1-a/\log{n})M$,

1117:     \end{itemize}

1118:     where $M$ and $a$ are appropriate constants and $\mu

1119:     =O(\log^4 n)$.

1120:

1121:     \lemlab{good:embed}

1122: \end{lemma}

1123: \remove{

1124: In the following, we restrict ourselves to the case where

1125: only distances in a certain range are approximately

1126: preserved by the embedding.  Namely, for a prescribed

1127: parameters $r > 0$, $\eps > 0$ we describe a mapping

1128: $f(\cdot)$ so that if a pair of points $p,q$ is in distance

1129: $\leq r$, then it is mapped (with high probability) into a

1130: pair $f(p),f(q)$ having distance $\leq M$, and if $p,q \geq

1131: (1+\eps)$, then the pair $f(p),f(q)$ are in distance

1132: $\geq(1+\eps')M$, where $M$ is an appropriate constant, and

1133: $\eps, \eps'$ are of the same up to the factor of

1134: $(1+O(1)/\log n)$.

1135:

1136:   In this way approximate nearest neighbor

1137: in the original space with error $(1+\eps)$ is be reduced

1138: the $(1+\eps')$-approximate nearest neighbor in the

1139: resulting Hamming space.  For the purpose of using the

1140: nearest neighbor algorithms of~\cite{im-anntr-98} this

1141: ``threshold embedding'' is sufficient,

1142: see~\cite{im-anntr-98} for details.

1143: }

1144:

1145: \begin{proof}

1146:     For sake of simplicity of exposition, we assume that $m

1147:     / r \geq \log{n}$, where $m=|L|$.  If this is not

1148:     correct, we can add ``fictitious'' lines to $L$ that have

1149:     all the points of $P$ on one side of them. If we pick

1150:     such a line to a set of $\R$, we can ignore it when we

1151:     compute the face IDs.

1152:

1153:     For a parameter $\alpha$ to be specified shortly, let

1154:     $k= \alpha m/r$, $R$ be a sample of $k$ lines out of $L$

1155:     (performed with replacement), and let $p,q$ be two

1156:     points of $P$.  Let $\rho = \IDL(p,q) /n$.  The

1157:     probability that $p,q$ will be in two different faces of

1158:     $\Arr(R)$ is

1159:     \[

1160:     U(\rho) = 1 - (1-\rho)^k,

1161:     \]

1162:     as this is the probability that not all the lines will

1163:     miss the segment connecting $p$ and $q$.

1164:

1165:     Our target is to approximate the value of $U(\rho)$ so

1166:     we could decide whether $p,q$ are close or far. Indeed,

1167:     if $U(\rho) \geq U( (1+\eps)r/m )$ then $\IDL(p,q) \geq

1168:     (1+\eps)r$, and if $U(\rho) \leq U( r/m )$ then

1169:     $\IDL(p,q) \leq r$.

1170:

1171:     To do so, we generate a set of subsets $\R = (R_1,

1172:     \ldots, R_{\mu})$, by random sampling as described

1173:     above, where $\mu$ would be specified shortly.  Now we

1174:     consider the quality of the distance approximation

1175:     provided by the embedding\footnote{A similar analysis

1176:        (in the context of Hamming spaces) appeared already

1177:        in~\cite{i-drtpp-00}; in our case, however, we have

1178:        to put more care into the analysis, since we want

1179:        $\eps$ and $\eps'$ to be very close.}.  Let

1180:     $X(p,q)$ denote the random variable which is the number

1181:     of arrangements of $\Arr(R_1), \ldots, \Arr(R_\mu)$ that

1182:     have $p,q$ in different faces.  Note, that $X(p,q)$ is

1183:     equal to the Hamming distance between $f_\R(p)$ and $f_\R(q)$,

1184:     and it thus the distance between the images of $p$ and

1185:     $q$ in the new space.  Clearly, as $\mu$ tends to

1186:     infinity, $X(p,q)/\mu$ tends to $U(\rho)$. Using

1187:     Chernoff inequality, we can quantify the quality of

1188:     approximation provided by $\mu$.  Specifically, let

1189:     $\plow=U(r/m)$ and $\phigh=U((1+\eps)r/m)$; in the

1190:     following we will make sure that $\phigh<1/2$.  Then,

1191:     from the Chernoff bound~\cite{mr-ra-95,mps-lpvaa-98} it

1192:     follows that for any $\alpha>0$ if $\mu=C \frac{\log

1193:        n}{\plow \alpha^2}$ for some constant $C$, then with

1194:     high probability:

1195:     \begin{itemize}

1196:         \item if $\IDL(p,q) \le r$ then $X(p,q)/\mu \le

1197:         \plow(1+\alpha)$

1198:

1199:         \item if $\IDL(p,q) \ge r(1+\eps)$ then $X(p,q)/\mu

1200:         \ge \phigh(1-\alpha)$

1201:     \end{itemize}

1202:     Therefore, the mapping $f_\R$ converts the distance gap

1203:     $r:(1+\eps)r$ into the gap $\plow(1+\alpha)\mu :

1204:     \phigh(1-\alpha)\mu$.  We next fine tune $k$ (the size

1205:     of each sample) so that the resulting gap will be as

1206:     large as possible. (Intuitively, the larger the target

1207:     gap is, the easier it is to detect it in later stages.)

1208:     Therefore, in the following we focus on finding $k$ such

1209:     that the ratio

1210:     \[

1211:     \Delta = \frac{\phigh(1-\alpha)\mu}{\plow(1+\alpha)\mu}

1212:     \]

1213:     is as large as possible.  To this end, we observe that

1214:     \begin{eqnarray*}

1215:         \plow &=& U\pth{\frac{r}{m}}=1-\pth{1-\frac{r}{m}}^k

1216:         \leq  1 - e^{-r k/m}\pth{ 1- \frac{(r k/m)^2}{k}}

1217:         =  1 - e^{-\alpha}\pth{ 1- \frac{\alpha^2}{k}} \\

1218:         &\leq&

1219:         1 - e^{-\alpha}\pth{ 1 - \alpha^2}

1220:         \leq \alpha^2 + (1 - e^{-\alpha}) \pth{ 1 -

1221:         \alpha^2}

1222:         \leq \alpha^2 + \alpha \pth{ 1 -

1223:            \alpha^2}

1224:         \leq \alpha(1+\alpha)

1225:     \end{eqnarray*}

1226:     since $\displaystyle \pth{ 1 - \frac{t}{n}}^{n} \geq

1227:     e^{-t} \pth{ 1 - \frac{t^2}{n}}$~\cite{mr-ra-95},

1228:     $k=\frac{\alpha m}{r}$, and $x \geq 1 -e^{-x}$.

1229:     Furthermore,

1230:     \begin{eqnarray*}

1231:         \phigh  &=&  U((1+\eps)r/m) =1-(1-(1+\eps)r/m)^k

1232:         \geq 1-e^{-(1+\eps)r k/m}

1233:         = 1-e^{-(1+\eps)\alpha}\\

1234:         & \geq & (1+\eps)\alpha - ((1+\eps)\alpha)^2

1235:          \geq  (1+\eps)\alpha(1 - (1+\eps)\alpha)

1236:     \end{eqnarray*}

1237:     since $(1-t/n)^{n} \leq e^{-t}$~\cite{mr-ra-95} and

1238:     $1-e^{-x} \ge x-x^2/2 \geq x - x^2$.

1239:

1240:     Therefore

1241:     \begin{eqnarray*}

1242:         \frac{\phigh}{\plow} &\geq&

1243:         \frac{(1+\eps)\alpha(1 -

1244:            (1+\eps)\alpha)}{\alpha(1+\alpha)}

1245:         \geq (1+\eps)(1 - (1+\eps)\alpha)(1-\alpha)

1246:         \geq (1+\eps)(1 - (2+\eps)\alpha),

1247:     \end{eqnarray*}

1248:     since $1/(1+x) \geq (1-x)$.  Thus, if we set $\alpha$ to

1249:     be $1/\log n$, then the distance gap becomes (at least)

1250:     \[

1251:     \Delta = \frac{\phigh(1-\alpha)\mu}{\plow(1+\alpha)\mu} \geq

1252:     (1+\eps)(1-(2+\eps)\alpha)(1-\alpha)^2 \geq

1253:     (1+\eps)\pth{1 - \frac{a}{\log{n}}},

1254:     \]

1255:     where $a$ is an appropriate constant.  Also, note that

1256:     the resulting value of $\plow$ is

1257:     \[

1258:     \plow =

1259:     1-(1-r/m)^k

1260:     \geq 1 - e^{-p_0k} = 1 -e^{-\alpha} \geq \alpha - \alpha^2/2 =

1261:     \Omega(1/\log n)

1262:     \]

1263:     and $\mu=(C\log{n})/(z\alpha^2) = C\log^2{n}/\alpha^2 =

1264:     O(\log^4 n)$.  Finally, since $m/r \geq \log{n}$, we

1265:     have that $k= \alpha(m/r) = (1/\log{n}) (m/r) \geq 1$

1266:     (i.e., the sample size $k$ is at least $1$).

1267: \end{proof}

1268:

1269: \begin{lemma}

1270:     Given a set $P$ of $n$ points, and a set $L$ of $m$

1271:     lines, one can compute the function $f_\R(\cdot)$, of

1272:     \lemref{good:embed}, for all the points of $P$ in

1273:     $\Ot( (m^{2/3}n^{2/3} + m + n))$ expected time.

1274: \end{lemma}

1275:

1276: \begin{proof}

1277:     We have to compute for each point of $P$ the face that

1278:     contains it in each of the arrangements $\Arr(R_1),

1279:     \ldots, \Arr(R_\mu)$, where $\mu = O( \log^4 n )$. Or

1280:     alternatively, compute all the faces of $\Arr(R_1),

1281:     \ldots, \Arr(R_\mu)$ that contains points of $P$. For a

1282:     single arrangement $A_i$ this can be done in \linebreak

1283:     $O(m^{2/3}n^{2/3} \log^{2/3}(m/\sqrt{n}) + (m +

1284:     n)\log{m})$ expected time \cite{ams-cmfal-98}. Since

1285:     there are $\mu$ coordinates (i.e., arrangements), the

1286:     result follows.

1287: \end{proof}

1288:

1289: Thus, we showed how to embed $\IDL$ into $\mu$-dimensional

1290: Hamming space $\Sigma^{\mu}$ in $\Ot(n+m+n^{2/3}m^{2/3})$

1291: time, mapping a $(1+\eps)$ gap between close and far points

1292: into a gap of size $(1+\eps)(1-O(1)/\log{n})$, where $\mu =

1293: O(\log^4 n)$ and $\Sigma \subseteq \ZZ$ is the set of face

1294: labels we use (i.e., $|\Sigma| = O(m^2)$.  By using standard

1295: embedding techniques (e.g.  see~\cite{kor-esann-00}) we can

1296: embed the Hamming space $\Sigma^{\mu}$ into $\{0,1\}^D$ with

1297: $D=O(\mu \log |\Sigma| \log^2 n) = O(\log^{6}{n} \log{m})$,

1298: preserving the gap up to another factor $(1-O(1)/\log n)$.

1299: This gives an embedding of $\IDL$ into $D=O(\mu \log m

1300: \log^2 n)$-dimensional binary Hamming cube, with error

1301: $(1-O(1)/\log n)$.  Thus it is sufficient for us to maintain

1302: $c$-nearest neighbor in $\{0,1\}^D$ where $c=(1+\eps)

1303: (1-O(1)/\log n)$, which takes

1304: $\Ot(n^{1/c})=\Ot(n^{1/(1+\eps/2)})$ time per operation

1305: \cite{im-anntr-98}.

1306:

1307: We conclude:

1308: \begin{theorem}

1309:     By performing a $\Ot(n+m+n^{2/3}m^{2/3})$-time

1310:     preprocessing, one can reduce the problem of maintaining

1311:     dynamic $(1+\eps)$-approximate nearest neighbor for any

1312:     $n$-point crossing metric over $m$ lines, to the problem

1313:     of maintaining dynamic $(1+\eps)(1-O(1)/\log

1314:     n)$-approximate nearest neighbor in Hamming space with

1315:     $O(\log^\EmbedDim n)$ dimensions (assuming $m=

1316:     n^{O(1)}$).  The latter can be solved in

1317:     $\Ot(n^{1/(1+\eps/2)})$ time per operation.

1318: \end{theorem}

1319:

1320: \remove{

1321:    \subsection{Embedding of the Crossing Metric over $\Re^d$}

1322:

1323:    In this Section, we extend the methods from the previous

1324:    section to the crossing metric defined by

1325:    $d-1$-dimensional hyperplanes in $\Re^d$, for any fixed

1326:    $d \ge 2$.  To this end, it is sufficient to design an

1327:    efficient procedure, which given a set of $n$ points

1328:    $p_1, \ldots, p_n$ and $m$ hyperplanes $H_1, \ldots,

1329:    H_m$, assigns a symbol $a_i \in \Sigma$ to each $p_i$ in

1330:    such a way that $a_i \neq a_j$ iff there exists $H_k$

1331:    which separates $p_i$ from $p_j$.  Unfortunately, the

1332:    idea from the previous section does not give subquadratic

1333:    time algorithm for $d>2$, since even in $d=3$ the

1334:    complexity of $n$ arrangement cells from an arrangement

1335:    formed by $n$ planes could be $\Omega(n^2)$.

1336:    Fortunately, for our purpose, we do not need to compute

1337:    the actual cells containing $p_i$s; rather, it is just

1338:    sufficient to find {\em labels} of those cells.

1339:

1340:

1341:    The algorithm for finding the labels is based on {\em

1342:       partition trees} by \matousek{}~\cite{m-ept-92}, which

1343:    are defined as follows.  }

1344:

1345:

1346:

1347: \subsection{Embedding of the Crossing Metric over $\Re^d$}

1348:

1349: In this Section, we extend the methods from the previous

1350: section to the crossing metric defined by

1351: $(d-1)$-dimensional hyperplanes in $\Re^d$, for any fixed $d

1352: \ge 2$.  To this end, it is sufficient to design an

1353: efficient procedure, which given a set of $n$ points $P=p_1,

1354: \ldots, p_n$ and a set of $m$ hyperplanes $\HX = \brc{H_1,

1355:    \ldots, H_m}$, assigns a symbol $\sigma_i \in \Sigma

1356: \subset \ZZ$ to each $p_i$ in such a way that $\sigma_i \neq

1357: \sigma_j$ iff there exists $H_k$ which separates $p_i$ from

1358: $p_j$.  Unfortunately, the idea from the previous section

1359: does not give subquadratic time algorithm for $d>2$, since

1360: even in $d=3$ the complexity of $n$ cells in an arrangement

1361: formed by $n$ planes could be $\Omega(n^2)$.  Fortunately,

1362: for our purpose, we do not need to compute the actual cells

1363: containing $p_i$s.  Rather, it is just sufficient to find

1364: the {\em labels} for those cells, or more specifically, a

1365: function $h: P \to \Sigma$ such that $h(p)=h(q)$ iff $p$ and

1366: $q$ belong to the same arrangement cell.

1367:

1368: Abusing notations, we denote by $H_k(p)$ the function

1369: returning $1$ if $p$ lies on one side of $H_k$ and zero

1370: otherwise. We use the following hashing function

1371: \[

1372: h(x)= \pth{\sum_i a_i H_i(x)},

1373: \]

1374: where $a_1 \ldots a_m$ are independent and identically

1375: distributed random variables with uniform distribution over

1376: $\brc{0, \ldots ,n^c}$, where $c$ is a constant to be

1377: specified shortly.  Note, that if $p,q \in \Re^d$ lie in two

1378: different full-dimensional faces of $\Arr(\HX)$, then, as

1379: noted above, there must be a hyperplane $H_k \in \HX$, so

1380: that $H_k(p) \neq H_k(q)$, and say that $H_k(p) = 1$. That

1381: is, $h(p) = h'(p) + a_k$ and $h(q) = h'(q)$, where $h'(x) =

1382: \sum_{i\neq k} a_i H_i(x)$. Since the $a_i$ were picked

1383: independently, it follows that $h(p)=h(q)$ only if $h'(p) -

1384: h'(q) = a_k$. But the probability of that to happen is

1385: $1/n^c$. We conclude, that the probability of two points

1386: belonging to two different faces to be mapped to the same

1387: value by $h(\cdot)$ is $1/n^c$. Thus, since we have $O(n^2)$

1388: pairs of points to consider in our algorithm, it follows

1389: that the probability of the hashing to fail is $n^{2-c}$

1390: which can be made to be arbitrarily small by picking $c$ to

1391: be large enough.

1392:

1393: Namely, we associate a weight $a_i$ with each half-space

1394: induced by a hyperplane $H_i$. For each point $p_j$, we

1395: compute the total weight of all the half-spaces that contain

1396: it, and all the points having the same total weight are

1397: associated with the same label. Computing the weight of a

1398: point $p_j$ falls into the class of problems known as

1399: intersection-searching \cite{a-rs-97}. In particular, one

1400: can construct a data-structure in $O(m^{1+\delta})$ time, so

1401: that one can answer intersection-searching queries in $O(

1402: (n/m^{1/d}) \log^{d+1} n )$ time, where $\delta >0$ is

1403: arbitrarily small constant. As the algorithm needs to perform

1404: a linear number of such queries, we set $m= n^{2d/(d+1)}$.

1405: Thus, the algorithm computes the required labels in

1406: $O(n^{2d/(d+1) + \delta})$ time.

1407: We conclude:

1408: \begin{theorem}

1409:     By performing a $O(n^{2d/(d+1)+\delta})$-time

1410:     preprocessing, where $\delta >0$ is arbitrary constant,

1411:     one can reduce the problem of maintaining dynamic

1412:     $(1+\eps)$-approximate nearest neighbor for any

1413:     $n$-point crossing metric over $n$ hyperplanes in

1414:     $\Re^d$, to the problem of maintaining dynamic

1415:     $(1+\eps)(1-O(1)/\log n)$-approximate nearest neighbor

1416:     in Hamming space with $O(\log^\EmbedDim n)$ dimensions.

1417:

1418:     \theolab{reduction}

1419: \end{theorem}

1420:

1421: \begin{remark}

1422:     Note, that the constants in the bounds of Theorem

1423:     \ref{theo:reduction} depend exponentially (or worse) on

1424:     the dimension $d$.

1425: \end{remark}

1426:

1427:

1428: \begin{remark}

1429:     As indicated in the introduction, having such a

1430:     embedding, enable one to use a large collection of

1431:     subquadratic approximation algorithms for the

1432:     intersection metric, including dynamic amortized

1433:     $\Ot(n^{4/3} + n^{1+1/c})$-time (for $d=2$)

1434:     $c$-approximation algorithms for bichromatic closest

1435:     pair~\cite{e-demst-95} and $\Ot(n^{4/3} +

1436:     n^{1+1/c})$-time algorithms for: $c$-approximate

1437:     diameter and discrete minimum enclosing ball

1438:     \cite{giv-rahdp-01}, $O(c)$-approximate facility

1439:     location and bottleneck matching~\cite{giv-rahdp-01}.

1440:     Similar (i.e., subquadratic time) results hold for any

1441:     $d>2$.

1442: \end{remark}

1443:

1444: \subsection{Computing an MST Using the Embedding}

1445:

1446: We next describe how to use the embedding described in the

1447: previous two sections, for getting an

1448: $(1+\eps)$-approximation algorithm for the MST under

1449: crossing metric. Note that everything described in this

1450: section is well known \cite{im-anntr-98}, and we provide it

1451: only for the sake of completeness. Also, the resulting

1452: algorithm is slower in the planar case than the algorithm of

1453: Section \ref{sec:speedup}.

1454:

1455:

1456: Computing the minimum spanning tree under the intersection

1457: metric, using the Kruskal's algorithm, boils down to

1458: maintaining the bichromatic nearest-neighbor pair (under

1459: the intersection metric) between two sets $P_1, P_2

1460: \subseteq P$, under insertions and deletions. A consequence

1461: of Eppstein result \cite{e-demst-95} is the following:

1462:

1463: \begin{theorem}[\cite{e-demst-95}]

1464:     Given a dynamic data-structure for nearest-neighbor

1465:     queries, where each insertion / deletion / query operation

1466:     takes $T(n)$ time, then one can compute the MST in

1467:     $O(n T(n)\log^2 n)$ time.

1468: \end{theorem}

1469:

1470: It is easy to verify that if we get a

1471: $(1+\eps)$-approximation to the MST if we use an

1472: $(1+\eps)$-approximate dynamic nearest-neighbor

1473: data-structure (Eppstein, personal communication, 1999).

1474:

1475: Namely, we need a data-structure that support dynamic

1476: approximation nearest-neighbor queries.  After applying the

1477: embedding described above, we use the $\eps'$-PLEB

1478: data-structure of \cite{im-anntr-98} to maintain a

1479: $(1+\eps')$-approximate nearest neighbor in the embedded

1480: space. Specifically, we construct an $\eps$-PLEB in the

1481: embedded points. In this way, we obtain an $\eps$-PLEB for

1482: our original points (i.e., we embedded a gap to a gap, so

1483: that a close point in the embedded space, corresponds to a

1484: close point in the crossing metric) data-structure that for

1485: a query $p$ return us a point of $q \in P$ so that

1486: $\IDL(p,q) \leq (1+\eps)r$, if there exits a point $q^* \in

1487: P$ so that $\IDL(p,q^*) \leq r$.

1488:

1489: Thus, by constructing $\log_{1+\eps}n$ such data-structures,

1490: we can use binary search on those data-structures to find

1491: and $(1+\eps)$-approximate nearest neighbor to a query

1492: point.  Namely, this data-structure can be used to answer

1493: approximate nearest neighbor queries for the intersection

1494: metric.  For the whole scheme to work, we need those

1495: data-structures to be dynamic; i.e., support insertions and

1496: deletions of points.  Fortunately, the only part of the

1497: algorithm that needs to be dynamic is the second stage that

1498: uses the data-structure of \cite{im-anntr-98} which is

1499: dynamic.

1500:

1501: We conclude:

1502: \begin{theorem}

1503:     Given a set $P$ of $n$ points in the plane, and a set

1504:     $L$ of $n$ lines, one can compute in $\Ot \pth{ n^{4/3}

1505:        + n^{1+ 1/(1+\eps)} }$ time, a spanning tree of $P$

1506:     of weight $\leq (1+\eps)\Wopt(P,L)$. The result returned

1507:     by the algorithm is correct with high probability.  For

1508:     $d>2$ dimensions, such an MST can be approximated in

1509:     $\Ot \pth{ n^{2d/(d+1) + \delta} + n^{1+ 1/(1+\eps)} }$

1510:     time, where $\delta>0$ is an arbitrary constant.

1511: \end{theorem}

1512:

1513:

1514:

1515:

1516: %----------------------------------------------------------------

1517: %----------------------------------------------------------------

1518: %----------------------------------------------------------------

1519: %----------------------------------------------------------------

1520: \section{Conclusions}

1521: \seclab{conc}

1522:

1523: We presented the first $(1+\eps)$-algorithm for

1524: approximating the minimum spanning tree under the crossing

1525: metric in the plane.  We also presented a subquadratic time

1526: approximation algorithms for a variety of other problems,

1527: obtained by embedding the crossing metric into higher

1528: dimensional space.  The techniques used in our paper seems

1529: to be new to low-dimension computational geometry, and we

1530: believe that they might be useful for other problems in

1531: computational geometry.

1532:

1533: There are several interesting open problems for further

1534: research:

1535: \begin{itemize}

1536:     \item Can the result be extended to other cases:

1537:     segments or arcs instead of lines?

1538:

1539:     \item Can a similar approximation algorithm be found

1540:     for the case of minimum weight triangulation under the

1541:     crossing metric?

1542: \end{itemize}

1543:

1544: \subsection*{Acknowledgments}

1545:

1546: The authors wish to thank Pankaj Agarwal, Boris Aronov and

1547: Micha Sharir for helpful discussions concerning the problems

1548: studied in this paper and related problems.

1549:

1550: %-------------------------------------------------------------------------

1551: % Bibliography

1552: %-------------------------------------------------------------------------

1553: \bibliographystyle{salpha}

1554: \bibliography{shortcuts,geometry}

1555:

1556:

1557:

1558: %-------------------------------------------------------

1559:

1560: \appendix

1561: \section{A Rough Approximation to the Weight of the

1562:    MST in Near Linear Time}

1563: \seclab{fast:approx}

1564:

1565: In this appendix, we show how to approximate the weight of

1566: the minimum spanning tree up to roughly a factor of

1567: $O(\alpha(n)\log{n})$ if its weight is at least linear.  In

1568: Section \ref{sec:speedup}, we presented a near linear time

1569: algorithm for $(1+\eps)$-approximation for the minimum

1570: spanning tree, that relies on this approximation algorithm.

1571:

1572: Underlining the approximation algorithm, is the observation

1573: that an MST for a random sample of the lines of $L$ provides

1574: a rough approximation to the weight of the MST of $L$.

1575: If the weight of the MST of the sample is near linear,

1576: we can approximate it up to a $O(\alpha(n)\log{n})$, using

1577: the following algorithm.

1578:

1579: \begin{lemma}

1580:     Given a set $R$ of $r$ lines, $P$ a set of $n$ points,

1581:     and $W$ a prescribed parameter, one can decide whether

1582:     $\Wopt(P,R)$ is large; namely, $\Wopt(P,R) = \Omega( (r

1583:     + n + W) \alpha(n)\log{n} )$.  The algorithm takes $O(

1584:     (r + n + W) \alpha(n)\log^2{n} )$ expected time.

1585:     Furthermore, if $\Wopt(P,R) \leq W$, the algorithm will

1586:     report that its weight is large with probability at most

1587:     $n^{-c}$, where $c$ is an appropriate constant.

1588:

1589:     \lemlab{brute:estimate}

1590: \end{lemma}

1591:

1592: \begin{proof}

1593:     Use the algorithm of Theorem~\ref{theo:hs} and execute

1594:     it $O(\log{n})$ times on $P$ and $R$. If the running

1595:     time of the $i$-th execution of the algorithm exceeds

1596:     $\Omega( (r + n + W)\alpha(n) \log{n})$ abort it, and

1597:     move on to the next execution. If $\Wopt(P,R) \leq W$,

1598:     then the algorithm of \cite{hs-oplpa-01-dcg} provides a

1599:     spanning tree of expected weight $O( (r+n+ W)\alpha(n)

1600:     \log{n})$ with the same bound on the expected running

1601:     time. Thus, if in $O(\log{n})$ executions the algorithm

1602:     returns always that $\Wopt$ is large, we can conclude

1603:     that with probability $\geq 1 - n^{-c}$ the weight of

1604:     $\Wopt(P,R)$ is not $\leq W$.

1605: \end{proof}

1606:

1607:

1608: \lemref{brute:estimate} shows that we can

1609: approximate the weight of the MST in near linear time if its

1610: weight is near linear. However, if it is heavier, we will

1611: use random sampling to keep the running time under control.

1612:

1613: Let $R \subseteq L$ be a random sample of lines out of $L$,

1614: where each line is picked independently with probability

1615: $r/n$.  Clearly, the probability of an intersection point

1616: $u$ (between a connected set $\gamma$ and a line of $L$), to

1617: be present in $\Arr(R)$ is $r/n$ (this is the probability

1618: that the line of $L$ passing through $u$ will be chosen to

1619: be in the random sample).

1620:

1621: \begin{defn}

1622:     For a curve $\gamma$, and a set of lines $L$, let

1623:     $\weight(\gamma,L)$ denote the {\em weight} of $\gamma$

1624:     in the arrangement $\Arr(L)$. This is the number of

1625:     intersections of $\gamma$ with the lines of $L$.

1626: \end{defn}

1627:

1628: \begin{lemma}

1629:     Let $R$ be a sample of lines of $L$ (chosen as described

1630:     above), then with high probability:

1631:     \[

1632:     \Wopt(P,L) \leq \frac{n}{r} \pth{ c_0 n \log{n} + 2

1633:        \Wopt(P,R)},

1634:     \]

1635:     and with probability $\geq 0.9$ we have $\frac{n}{r}

1636:     \cdot \frac{\Wopt(P,R)}{10} \leq \Wopt(P,L)$, where $c_0$ is an

1637:     appropriately large constant.

1638:     \lemlab{wopt:sample}

1639: \end{lemma}

1640:

1641: \begin{proof}

1642:     Let $\Topt^L = \Topt(P,L)$, and let $W_R =

1643:     \weight(\Topt^L, R)$ be the weight of $\Topt^L$ under

1644:     the crossing metric of $R$. Clearly, $E[ W_R] =

1645:     \Wopt(P,L)\frac{r}{n}$. Thus, we know that with

1646:     probability $\geq 0.9$ we have $W_R \leq 10 \Wopt(P,L)

1647:     \frac{r}{n}$ (by Markov inequality), and with

1648:     probability $\geq 0.9$, we have that $\displaystyle

1649:     \Wopt^R = \Wopt(P,R) \leq W_R \leq 10 \Wopt(P, L)

1650:     \frac{r}{n}$.

1651:

1652:     Let $p,q \in P$ be two points, and let $X_{p q}$ be

1653:     the distance between $p,q$ in the arrangement $\Arr(R)$.

1654:     If the distance between $p,q$ is large, that is $U =

1655:     \IDL(p,q) \geq c_0 (n/r) \log{n}$ (where $c_0$ is a

1656:     large enough constant), then one can show using Chernoff

1657:     inequality, that with high probability, we have:

1658:     \[

1659:     \frac{U}{2} \leq X_{p q} \frac{n}{r} \leq 2 U.

1660:     \]

1661:

1662:     On the other hand, by the above argument, each edge $e

1663:     =p q$ of $\Topt^R = \Topt(P,R)$ either intersects at most

1664:     $c_0 (n/r)\log{n}$ lines of $L$, or alternatively, the

1665:     number of lines of $L$ intersected by $e$ is smaller

1666:     than $2(n/r)X_e$, where $X_e$ is the number of lines of

1667:     $R$ that $e$ intersects.  Thus, with high probability,

1668:     we have

1669:     \begin{eqnarray*}

1670:         \Wopt(P,L) &\leq & \weight( \Topt^R, L) = \sum_{e =

1671:            p q \in \Topt^{R}} \IDL(p,q)

1672:         \leq \sum_{e \in

1673:            \Topt^{R}} \pth{ c_0 \frac{n}{r}\log{n} +

1674:            2X_e\frac{n}{r}}\\

1675:         &=& c_0 \frac{n^2 \log{n}}{r}

1676:         + \Wopt(P,R) \frac{2n}{r}.

1677:     \end{eqnarray*}

1678: \end{proof}

1679:

1680: \begin{remark}

1681:     We can make both probabilities in \lemref{wopt:sample}

1682:     large by repeating the experiment $O(\log{n})$ times,

1683:     and picking the smallest $W(P,R)$ computed. With high

1684:     probability, we have

1685:     \[

1686:     \frac{n}{r} \cdot \frac{\Wopt(P,R)}{10}

1687:     \leq \Wopt(P,L) \leq

1688:     \frac{n}{r} \pth{ c_0 n \log{n} + 2 \Wopt(P,R)}.

1689:     \]

1690:     In particular, if $\Wopt(P,R) > c_0 n \log{n}$, we get

1691:     that $3\Wopt(P,R)\frac{n}{r}$ is a constant factor

1692:     approximation to $\Wopt(P,L)$.

1693: \end{remark}

1694:

1695: \begin{lemma}

1696:     Let $r$ be a prescribed parameter, and $\Wopt =

1697:     \Wopt(P,L)$.  Then, an algorithm can decide whether

1698:     \begin{itemize}

1699:         \item $\Wopt$ is small -  namely $\Wopt \leq

1700:         \frac{10c_0 n^2\log{n}}{r} $.

1701:

1702:         \item $\Wopt$ is large - $\Wopt = \Omega(

1703:         \frac{n^2}{r} \alpha(n)\log^2{n})$.

1704:

1705:         \item $\Wopt$ is in between. Any of the two above

1706:         answers are valid.

1707:     \end{itemize}

1708:     The algorithm takes $O( n \alpha(n)\log^4{n})$ time, and

1709:     returns a correct result with high probability.

1710:

1711:     \lemlab{fine:estimate}

1712: \end{lemma}

1713:

1714: \begin{proof}

1715:     We pick $m = O(\log{n})$ samples $R_1, \ldots, R_m$ by

1716:     picking each line with probability $r/n$ into the

1717:     sample.  For each sample, we check whether $\Wopt(P,R_i)

1718:     \leq 10 c_0 n\log{n}$, using the algorithm of

1719:     \lemref{brute:estimate}. This will require

1720:     $O(n\alpha(n)\log^{3}(n))$ time for each sample, and

1721:     $O(n\alpha(n)\log^{4}(n))$ overall.

1722:

1723:     If the algorithm of \lemref{brute:estimate}

1724:     returned {\em not large} for any sample $R$, we know

1725:     that $\Wopt(P,R) = O(n\alpha(n)\log^2{n})$.  And by

1726:     \lemref{wopt:sample}, we know that $\Wopt(P,L)

1727:     = O\pth{ \frac{n^2 \alpha(n)\log^{2}n }{r} }$ with high

1728:     probability.  \remove{On the other hand, if all the

1729:        spanning trees are ``long'' for all the samples, we

1730:        know that $\Wopt(P,L) \geq \frac{n}{10r}\cdot 10 c_0

1731:        n\log{n} = \frac{c_0n^2 \log{n}{r}$ with high

1732:           probability by \lemref{wopt:sample}.  }}

1733: \end{proof}

1734:

1735: Now, we can perform a binary search to approximate the

1736: weight of $\Wopt(P,L)$.

1737:

1738: \begin{lemma}

1739:     One can compute in $O(n\alpha(n)\log^{5}{n})$ time a

1740:     value $M$, so that

1741:     \[

1742:     \Wopt(P,L) \leq M = O(n \alpha(n) \log^2{n} + \Wopt(P,L)

1743:     \alpha(n) \log n).

1744:     \]

1745:

1746:     \lemlab{rough}

1747: \end{lemma}

1748:

1749: \begin{proof}

1750:     Use \lemref{fine:estimate}, set $r_0 = n$.  In

1751:     the $i$-th iteration check whether $\Wopt = \Omega\pth{

1752:        \frac{n^2}{r_i} \alpha(n)\log^2{n}}$, by using the

1753:     algorithm of \lemref{fine:estimate}. If it is,

1754:     we set $r_{i+1} = r_i/2$, and repeat the process.  We

1755:     stop as soon as this check fails.  Then, we know that

1756:     with high probability

1757:     \[

1758:     \frac{10c_0 n^2\log{n}}{r_{i-1}}

1759:     \leq

1760:     \Wopt(P, L) = O\pth{ \frac{n^2 \alpha(n)\log^{2} n

1761:     }{r_i} } = M,

1762:     \]

1763:     implying that $M$ is the required approximation.

1764: \end{proof}

1765:

1766: \begin{remark}

1767:     Note, that if algorithm of \lemref{rough} stops after

1768:     the first iteration, then $\Wopt = O(n

1769:     \alpha(n)\log^2{n})$. In such a case the approximation

1770:     we get is much worse then logarithmic.  However, this is

1771:     to some extent the easiest case: Without any sampling we

1772:     get a spanning tree of near linear (or sub linear)

1773:     weight.

1774: \end{remark}

1775:

1776: \end{document}

1777:

1778: %--------------------------------------------------------

1779: %

1780: % mst.tex - end of file

1781: %-------------------------------------------------------

1782:

1783: