0302:cs0302022/arxiv.tex

1: \documentclass[11pt]{article}

2:

3: \usepackage{epsfig}

4: \usepackage{multirow}

5: \usepackage{amsmath}

6: \usepackage{amssymb}

7: \usepackage{comment}

8: \usepackage{fullpage}

9: \usepackage{bigstrut}

10: \usepackage{subfigure}

11: \usepackage{color}

12:

13: \setcounter{secnumdepth}{5}

14: \setcounter{tocdepth}{5}

15:

16: \newcommand{\flrtwok}[1]{2^{\lfloor \lg k\rfloor #1}}

17: \newcommand{\flrbk}[1]{b^{\lfloor \log_b k\rfloor #1}}

18:

19: \newcommand{\alert}[1]{\typeout{ALERT: #1}\textbf{[[[ #1 ]]]}}

20: \newcommand{\buzz}[1]{\emph{#1}}

21:

22: \newtheorem{theorem}{Theorem}

23: \newtheorem{lemma}[theorem]{Lemma}

24: \newtheorem{corollary}[theorem]{Corollary}

25: \newtheorem{definition}[theorem]{Definition}

26: \newtheorem{conjecture}[theorem]{Conjecture}

27:

28: \renewcommand{\multirowsetup}{\centering}

29:

30: \newcommand{\prob}[1]{{\rm Prob}[#1]}

31:

32: \newcommand{\newloglike}[2]{\newcommand{#1}{\mathop{\rm #2}\nolimits}}

33: \newloglike{\E}{E}

34: \newloglike{\sgn}{sgn}

35:

36: \newcommand{\calF}{{\cal F}}

37:

38: \newcommand{\etal}[1]{{\it et al.\/}}

39:

40: \newenvironment{proof}{\noindent\par{\bf Proof: }}{\nopagebreak\rule{1 ex}{0.8 em}\medskip}

41:

42: \newcommand{\ceil}[1]{\left\lceil{#1}\right\rceil}

43: \newcommand{\floor}[1]{\left\lfloor{#1}\right\rfloor}

44:

45: \interfootnotelinepenalty=10000

46:

47:

48:

49: \begin{document}

50:

51: \title{Fault-tolerant Routing in Peer-to-peer Systems\footnote{This

52: is an extended version of the paper appearing in the proceedings of the

53: \emph{Twenty-First ACM Symposium on Principles of Distributed Computing},

54: 2002}}

55:

56: \author{James Aspnes\thanks{

57: Department of Computer Science, Yale University,

58: New Haven, CT 06520-8285, USA.

59: Email: {\tt aspnes@cs.yale.edu}.

60: Supported by NSF grants CCR-9820888 and CCR-0098078.}

61: \and Zo\"{e} Diamadi\thanks{

62: Department of Computer Science, Yale University,

63: New Haven, CT 06520-8285, USA.

64: Email: {\tt diamadi@cs.yale.edu}.

65: Supported in part by ONR grant N00014-01-1-0795.}

66: \and Gauri Shah\thanks{

67: Department of Computer Science, Yale University,

68: New Haven, CT 06520-8285, USA.

69: Email: {\tt gauri.shah@yale.edu}.

70: Supported by NSF grants CCR-9820888 and CCR-0098078.}

71: }

72:

73: \maketitle

74:

75: \begin{abstract}

76: We consider the problem of designing an overlay network and routing

77: mechanism that permits finding resources efficiently in a peer-to-peer

78: system. We argue that many existing approaches to this problem can be

79: modeled as the construction of a random graph embedded in a metric

80: space whose points represent resource identifiers, where the

81: probability of a connection between two nodes depends only on the

82: distance between them in the metric space.  We study the performance of

83: a peer-to-peer system where nodes are embedded at grid points in a simple

84: metric space: a one-dimensional real line. We prove upper and lower bounds

85: on the message complexity of locating particular resources in such a system,

86: under a variety of assumptions about failures of either nodes or the

87: connections between them. Our lower bounds in particular show that the

88: use of inverse power-law distributions in routing, as suggested by

89: Kleinberg~\cite{KL99}, is close to optimal. We also give efficient

90: heuristics to dynamically maintain such a system as new nodes arrive and

91: old nodes depart. Finally, we give experimental results that suggest

92: promising directions for future work.

93: \end{abstract}

94:

95:

96:

97:

98: \section{Introduction}

99: \label{sec:INTRODCUTION}

100:

101: Peer-to-peer systems are distributed systems without any central

102: authority and with varying computational power at each machine.

103: We study the problem of locating resources in such a large network

104: of heterogeneous machines that are subject to crash failures. We

105: describe how to construct distributed data structures that have

106: certain desirable properties and allow efficient resource location.

107:

108: Decentralization is a critical feature of such a system

109: as any central server not only provides a vulnerable point of

110: failure but also wastes the power of the clients. Equally important

111: is scalability: the cost borne by each node must not depend too much

112: on the network size and should ideally be proportional, within

113: polylogarithmic factors, to the amount of data the node seeks or

114: provides. Since we expect nodes to arrive and depart at a high rate,

115: the system should be resilient to both link and node failures.

116: Furthermore, disruptions to parts of the data structure should

117: self-heal to provide self-stabilization.

118:

119: Our approach provides a hash table-like functionality, based on

120: keys that uniquely identify the system resources. To accomplish this,

121: we map resources to points in a metric space either directly from

122: their keys or from the keys' hash values. This mapping dictates an

123: assignment of nodes to metric-space points. We construct and maintain

124: a random graph linking these points and use greedy routing to

125: traverse its edges to find data items. The principle we

126: rely on is that failures leave behind yet another (smaller) random

127: graph, ensuring that the system is robust even in the face of

128: considerable damage. Another compelling advantage of random graphs

129: is that they eliminate the need for global coordination. Thus, we

130: get a fully-distributed, egalitarian, scalable system with no

131: bottlenecks.

132:

133: We measure performance in terms of the number of messages sent by

134: the system for a search or an insert operation. The self-repair

135: mechanism may generate additional traffic, but we expect to amortize

136: these costs over the search and insert operations.  Given the growing

137: storage capacity of machines, we are less concerned with minimizing

138: the storage at each node; but in any case

139: the space requirements are small. The

140: information stored at a node consists only of a network address for

141: each neighbor.

142:

143: The rest of the paper is organized as follows.

144: Section~\ref{sec:APPROACH} explains our abstract model in detail,

145: and Section~\ref{sec:RELATED} describes some existing

146: peer-to-peer systems. We prove our results for routing in

147: Section~\ref{sec:ROUTING}. In Section~\ref{sec:RANDOMGRAPHS},

148: we present a heuristic method for constructing the random graph and

149: provide experimental results that show its performance in practice.

150: Section~\ref{sec:EXPERIMENTS} describes results of experiments

151: we performed to test the routing performance of our constructed

152: distributed data structure. Conclusions and future work

153: are discussed in Section~\ref{sec:CONCLUSIONS}.

154:

155:

156: \section{Our approach}

157: \label{sec:APPROACH}

158:

159: The idea underlying our approach consists of three basic parts:

160: (1) embed resources as points in a metric space, (2) construct a

161: random graph by appropriately linking these points, and

162: (3) efficiently locate resources by routing greedily along the

163: edges of the graph. Let $R$ be a set of resources spread over a large,

164: heterogeneous network $N$. For each resource $r \in R$,

165: $owner(r)$ denotes the node in $N$ that provides $r$ and

166: $key(r)$ denotes the resource's key. Let $K$ be the set of all

167: possible keys.

168: We assume a hash function $h: K \rightarrow V$ such that

169: resource $r$ maps to the point

170: $v=h(key(r))$ in a metric space $(V,d)$, where $V$ is the point set

171: and $d$ is the distance metric as shown in Figure~\ref{mapping}.

172: The hash function is assumed to populate the metric space evenly.

173: Note that via this resource embedding, a node

174: $n$ is mapped onto the set $V_n=\{v \in V: \exists r \in R,

175: \: v=h(key(r)) \wedge (owner(r)=n)\}$, namely the set of

176: metric-space points assigned to the resources the node provides.

177: \begin{figure}

178:    \centerline{\epsfig{figure=net.eps, width=400pt}}

179:    \caption{An example of the metric-space embedding.}

180:    \label{mapping}

181: \end{figure}

182:

183: Our next step is to carefully construct a directed random graph

184: from the points embedded in $V$.

185: We assume that each newly-arrived node $n$ is initially connected

186: to some other node in $N$.

187: Each node $n$ generates the outgoing links for each vertex $v

188: \in V_n$ independently.

189: A link $(v,u) \in V_n \times V_m$ simply denotes that $n$ knows that

190: $m$ is the network node that provides the resource mapped to

191: $u$; hence, we can view the graph as a virtual overlay network

192: of information, pieces of which are stored locally at each node.

193: Node $n$ constructs each link by executing the search algorithm to locate

194: the resource that is mapped to the sink of that link. If the metric

195: space is not populated densely enough, the choice of a sink may

196: result in a vertex corresponding to an absent resource. In that

197: case, $n$ chooses the neighbor present closest to the original sink.

198: Moving to nearby vertices will introduce some bias in the link

199: distribution, but the magnitude of error does not appear to be large.

200: A more detailed description of the graph construction

201: is given in Section~\ref{sec:RANDOMGRAPHS}.

202:

203: Having constructed the overlay network of information, we

204: can now use it for resource location. As new nodes arrive,

205: old nodes depart, and existing ones alter the set of

206: resources they provide or even crash, the resources available

207: in the distributed database change. At any time $t$, let

208: $R^t \subseteq R$ be the set of available resources and $I^t$ be

209: the corresponding overlay network.  A request by node

210: $n$ to locate resource $r$ at time $t$ is served in a simple,

211: localized manner: $n$ calculates the metric-space point $v$ that

212: corresponds to $r$, and a request message is then routed over

213: $I^t$ from the vertex in $V_n$ that is closest to $v$ to $v$

214: itself.\footnote{Note that since $R^t$ generally changes with

215: time, and may specifically change while the request is being

216: served, the request message may be routed over a series of

217: different overlay networks $I^{t_1},\:I^{t_2},\: \ldots,\:

218: I^{t_k}$.}

219: Each node needs only local information, namely its set

220: of neighbors in $I^t$, to participate in the resource location.

221: Routing is done greedily by forwarding the message to the node mapped

222: to a metric-space point as close to $v$ as possible. The problem of

223: resource location is thus translated into routing on random graphs

224: embedded in a metric space.

225:

226: To a first approximation, our approach is similar to the

227: ``small-world" routing work by Kleinberg~\cite{KL99}, in which points

228: in a two-dimensional grid are connected by links drawn from a normalized

229: power-law distribution (with exponent 2), and routing is done by having

230: each node route a packet to its neighbor closest to the packet's

231: destination.

232: Kleinberg's approach is somewhat brittle because it assumes a

233: constant number of links leaving each node. Getting good

234: performance using his technique

235: depends both on having a complete two-dimensional

236: grid of nodes and on very carefully adjusting the exponent of

237: the random link distribution. We are not as interested in keeping

238: the degree down and accept a larger degree to get more

239: robustness. We also cannot assume a complete grid: since

240: fault-tolerance is one of our main objectives, and since nodes

241: are mapped to points in the metric space based on what resources

242: they provide, there may be missing nodes.

243:

244: The use of random graphs is partly motivated by a desire to keep

245: the data structure scalable and the routing algorithm as

246: decentralized as possible, as random graphs can be constructed

247: locally without global coordination. Another important reason

248: is that random graphs are by nature robust against node failures:

249: a node-induced subgraph of a random graph is generally still a

250: random graph; therefore, the disappearance of a vertex, along with

251: all its incident links (due to failure of one of the machines

252: implementing the data structure) will still allow routing

253: while the repair mechanism is trying to heal the damage.

254: The repair mechanism also benefits from the use of random graphs,

255: since most random structures require less work to maintain their

256: much weaker invariants compared to more organized data structures.

257:

258: Embedding the graph in a metric space has the very important

259: property that the only information needed to locate a resource

260: is the location of its corresponding metric-space point. That

261: location is permanent, both in the sense of being unaffected by

262: disruption of the data structure, and easily computable by any

263: node that seeks the resource. So, while the pattern of links

264: between nodes may be damaged or destroyed by failure of nodes or

265: of the underlying communication network, the metric space forms

266: an invulnerable foundation over which to build the ephemeral

267: parts of the data structure.

268:

269: \section{Current peer-to-peer systems}

270: \label{sec:RELATED}

271:

272: Most of the peer-to-peer systems in widespread use are

273: not scalable. Napster~\cite{NP} has a central server that services

274: requests for shared resources even though the actual

275: resource transfer takes place between the peer requesting

276: the resource and the peer providing it, without involving

277: the central authority. However, this has several

278: disadvantages including a vulnerable single point of

279: failure, wasted computational power of the clients as

280: well as not being scalable. Gnutella~\cite{GN}

281: floods the network to locate a resource.  Flooding creates a

282: trade-off between overloading every node in the network

283: for each request and cutting off searches before completion.

284: While the use of super-peers \cite{MOR} ameliorates the problem

285: somewhat in practice, it does not improve performance in the limit.

286:

287: Some of these first-generation systems

288: have inspired the development of more sophisticated ones

289: like CAN~\cite{SR01}, Chord~\cite{CH01} and Tapestry~\cite{TP01}.

290: CAN partitions a $d$-dimensional metric space into {\em zones}.

291: Each key is mapped to a point in some zone and stored at the node

292: that owns the zone.

293: Each node stores $O(d)$ information, and resource location,

294: done by greedy routing, takes $O(dn^{1/d})$ time. Chord maps nodes

295: to identities of $m$ bits placed around a {\em modulo

296: $2^m$ identifier circle}. Resources are stored at existing

297: {\em successor} nodes of the nodes they are mapped to. Each node

298: stores a routing table with $m$ entries such that the $i$-th entry

299: stores the key of the first node succeeding it by at least

300: $2^{i-1}$ on the identifier circle. Each resource is also

301: mapped onto the identifier circle and stored at the first

302: node succeeding the location that it maps to. Routing is

303: done greedily to the farthest possible node in the

304: routing table, and it is not hard to see that this gives an

305: $O(\log n)$ delivery time with $n$ nodes in the system.

306: Tapestry uses Plaxton's algorithm~\cite{PL97}, a form of

307: suffix-based, hypercube routing, as the routing mechanism:

308: in this algorithm, the message is forwarded deterministically

309: to a node whose identifier is one digit closer to the

310: target identifier. To this end, each node maintains $O(\log n)$

311: pieces of information and delivery time is also $O(\log n)$.

312:

313: Although these systems seem vastly different, there is a recurrent

314: underlying theme in the use of some variant of an overlay

315: metric space in which the nodes are embedded. The location

316: of a resource in this metric space is determined by its key.

317: Each node maintains some information about its neighbors

318: in the metric space, and routing is then simply done by

319: forwarding packets to neighbors {\em closer} to the target

320: node with respect to the metric.

321: In CAN, the metric space is explicitly defined

322: as the coordinate space which is covered by the zones and

323: the distance metric used is simply the Euclidean distance.

324: In Chord, the nodes can be thought of being

325: embedded on grid points on a real circle, with distances measured

326: along the circumference of the circle providing the required

327: distance metric. In Tapestry, we can think of the nodes being

328: embedded on a real line and the identifiers are simply the

329: locations of the nodes on the real line. Euclidean distance

330: is used as the metric distance for greedy forwarding

331: to nodes with identifiers closest to the target node.

332: This inherent common structure leads to similar results

333: for the performance of such networks. In this paper, we

334: explain why most of these systems achieve similar

335: performance guarantees by

336: describing a general setting for such overlay metric spaces,

337: although most of our results apply only in one-dimensional

338: spaces.

339:

340: In general, the fault-tolerance properties of these systems are

341: not well-defined. Each system provides a repair mechanism for

342: failures but makes no performance guarantees till this mechanism

343: kicks in.  For large systems, where nodes appear and

344: leave frequently, resilience to repeated and concurrent failures

345: is a desirable and important property. Our experiments show that with

346: our overlay space and linking strategies, the system performs

347: reasonably well even with a large number of failures.

348:

349:

350:

351: \section{Routing}

352: \label{sec:ROUTING}

353:

354: In this section, we present our lower and upper bounds on routing.

355: We consider greedy routing in a graph embedded in a line where

356: each node is connected to its immediate neighbors and to multiple

357: long-distance neighbors chosen according to a fixed link distribution.

358: We give lower bounds for greedy routing for \buzz{any} link

359: distribution satisfying certain properties

360: (Theorem~\ref{theorem-lower-bound}). We also present upper bounds

361: in the same model where the long-distance links are chosen as per

362: the inverse power-law distribution with exponent $1$ and analyze

363: the effects on performance in the presence of failures.

364:

365: \subsection{Tools}

366:

367: Some of our upper bounds will be proved using

368: a well-known upper bound of Karp~\etal~\cite{KarpUW1988}

369: on probabilistic recurrence relations.

370: We will restate this bound as

371: Lemma~\ref{lemma-probabilistic-recurrence-ub}, and then show how

372: a similar technique can be used to get \emph{lower bounds} with some

373: additional conditions in Theorem~\ref{theorem-mean-lower-bound}.

374:

375: \begin{lemma}[\cite{KarpUW1988}]

376: \label{lemma-probabilistic-recurrence-ub}

377: The time $T(X_0)$

378: needed for a nonincreasing real-valued Markov chain

379: $X_0, X_1, X_2, X_3\ldots$ to drop to $1$ is bounded by

380: \begin{equation}

381: \label{eq-karp}

382: T(X_0) \le \int_{1}^{X_0} \frac{1}{\mu_z} dz,

383: \end{equation}

384: when $\mu_z = \E[X_t - X_{t+1} : X_t = z]$ is a nondecreasing

385: function of $z$.

386: \end{lemma}

387:

388: This bound has a nice physical interpretation.  If it takes one

389: second to jump down $\mu_x$ meters from $x$, then we are traveling at a

390: rate of $\mu_x$ meters per second during that interval.  When we zip

391: past some position $z$, we are traveling at the average speed $\mu_x$

392: determined by our starting point $x \ge z$ for the interval.  Since

393: $\mu$ is nondecreasing, using $\mu_z$ as our estimated speed

394: underestimates our actual speed when passing $z$.  The integral

395: computes the time to get all the way to zero if we use

396: $\mu_z$ as our instantaneous speed when passing position $z$.  Since

397: our estimate of our speed is low (on average), our estimate of our time

398: will be high, giving an upper bound on the actual expected time.

399:

400: We would like to get lower bounds on such processes in

401: addition to upper bounds, and we will not necessarily be able to

402: guarantee that $\mu_z$, as defined in

403: Lemma~\ref{lemma-probabilistic-recurrence-ub}, will be a

404: nondecreasing function of $z$.  But we will still use the same basic

405: intuition: The average speed at which we pass $z$ is at most the

406: maximum average speed of any jump that takes us past $z$.  We can find

407: this maximum speed by taking the maximum over all $x > z$;

408: unfortunately, this may give us too large an estimate.  Instead, we

409: choose a threshold $U$ for ``short'' jumps,

410: compute the maximum speed of short jumps of at most $U$ for

411: all $x$ between $z$ and $z+U$, and

412: handle the (hopefully rare) long jumps of more than $U$ by

413: conditioning against them.  Subject to this conditioning, we can

414: define an upper bound $m_z$ on the average speed passing $z$, and

415: use essentially the same integral as in (\ref{eq-karp}) to get a lower

416: bound on the time.  Some additional tinkering to account for the

417: effect of the conditioning then gives us our real lower bound,

418: which appears in Theorem~\ref{theorem-mean-lower-bound} below, as

419: Inequality (\ref{eq-mean-lower-bound}).

420:

421: \newcommand{\dft}{f(X_t) - f(X_{t+1})}

422: \newcommand{\dyt}{Y_t - Y_{t+1}}

423: \newcommand{\dzt}{Z_t - Z_{t+1}}

424: \newcommand{\ftat}{\calF_t, A_t}

425: \newcommand{\ef}[1]{\E\left[{#1}:\ftat\right]}

426: \newcommand{\muf}{\mu_{f(X_t)}}

427: \newcommand{\ydenom}{\epsilon Y_0 + (1-\epsilon)}

428: \newcommand{\yydenom}{\left(\epsilon Y_0 + (1-\epsilon)\right)}

429:

430: \begin{theorem}

431: \label{theorem-mean-lower-bound}

432: Let $X_0, X_1, X_2, \ldots$ be

433: Markov process with state space $S$, where

434: $X_0$ is a constant.

435: Let $f$ be a non-negative real-valued function on $S$

436: such that, for all $t$,

437: \begin{equation}

438: \label{eq-nonincreasing}

439: \Pr[\dft \ge 0 : X_t] = 1.

440: \end{equation}

441: Let $U$ and $\epsilon$ be constants such that for any $x > 0$,

442: \begin{equation}

443: \label{eq-mean-lower-bound-U-epsilon}

444: \Pr[\dft \ge U : X_t = x] \le \epsilon.

445: \end{equation}

446: Let

447: \begin{equation}

448: \label{eq-mean-lower-bound-tau}

449: \tau = \min \{ t: f(X_t) = 0 \}.

450: \end{equation}

451: For each $x$ with $f(x) > 0$, let $\mu_x > 0$ satisfy

452: \begin{equation}

453: \label{eq-mean-lower-bound-mu}

454: \mu_x \ge \E[\dft : X_t = x, \dft < U].

455: \end{equation}

456: Now define

457: \begin{equation}

458: \label{eq-mean-lower-bound-m}

459: m_z = \sup \left\{ \mu_x: x\in S, f(x) \in [z, z+U) \right\},

460: \end{equation}

461: and define

462: \begin{equation}

463: \label{eq-mean-lower-bound-T}

464: T(x) = \int_{0}^{f(x)} \frac{1}{m_z} dz.

465: \end{equation}

466: Then

467: \begin{equation}

468: \label{eq-mean-lower-bound}

469: \E[\tau] \ge \frac{T(X_0)}{\epsilon T(X_0) + (1-\epsilon)}.

470: \end{equation}

471: \end{theorem}

472: \begin{proof}

473: Define

474: \begin{equation}

475: \label{eq-mean-lower-bound-y}

476: Y_t = \left\{

477: \begin{array}{cl}

478: T(X_t) & \mbox{, if $f(X_{t'}) - f(X_{t'+1}) < U$ for all $t' < t$,} \\

479: 0      & \mbox{, otherwise.}

480: \end{array}

481: \right.

482: \end{equation}

483: The idea is that $Y_t$ drops to zero immediately if a long jump

484: occurs.  We will show that even with such overeager jumping, $Y_t$

485: does not drop too quickly on average.  The intuition is that the chance of a

486: long jump reduces

487: $Y_t$ by at most an expected $\epsilon Y_t \le \epsilon Y_0$, while

488: the effect of short jumps can be bounded by applying the definition of

489: $T$.

490:

491: Let $\calF_t$ be the $\sigma$-algebra

492: generated by $X_0, X_1, \ldots X_t$.  Let $A_t$ be the event that

493: $\dft < U$, that is, that the jump from $f(X_t)$ to $f(X_{t+1})$ is a

494: short jump.

495: Now compute

496: \begin{eqnarray}

497: E\left[\dyt:\calF_t\right]

498: &=&

499: \Pr\left[\,\overline{A_t}:\calF_t\right] (Y_t - 0)

500: + (1 - \Pr\left[\,\overline{A_t}:\calF_t\right]) \ef{\dyt}\nonumber

501: \\

502: &\le&

503: \Pr\left[\,\overline{A_t}:\calF_t\right] Y_0

504: + (\epsilon - \Pr\left[\,\overline{A_t}:\calF_t\right]) Y_0

505: + (1-\epsilon) \ef{\dyt}\nonumber

506: \\

507: &=&

508: \epsilon Y_0 + (1-\epsilon) \ef{\dyt}.\label{eq-mean-lower-bound-dyt}

509: \end{eqnarray}

510:

511: Now let us bound $\ef{\dyt}$.  Expanding the definitions

512: (\ref{eq-mean-lower-bound-T}) and (\ref{eq-mean-lower-bound-y})

513: gives

514: \begin{equation}

515: \label{eq-mean-lower-bound-integral-expansion}

516: \ef{\dyt}

517: =

518: \ef{\int_{f(X_{t+1})}^{f(X_t)} \frac{1}{m_z} dz}.

519: \end{equation}

520:

521: Now, conditioning on $A_t$ means that

522: $f_(X_{t+1}) > f(X_t) - U$

523: and thus

524: $z > f(X_t) - U$ for the entire range of the integral.

525: It follows that $f(X_t)$ lies in the half-open interval $[z,z+U)$ for

526: each such $z$, from which we have $m_z \ge \muf$

527: from (\ref{eq-mean-lower-bound-m}).

528: Inverting gives $\frac{1}{m_z} \le \frac{1}{\muf}$,

529: and plugging this inequality into

530: (\ref{eq-mean-lower-bound-integral-expansion}) gives

531: \begin{eqnarray}

532: \ef{\dyt}

533: &\le&

534: \ef{\int_{f(X_{t+1})}^{f(X_t)} \frac{1}{\muf} dz}

535: \nonumber

536: \\

537: &=&

538: \frac{1}{\muf} \ef{\dft}

539: \nonumber

540: \\

541: &\le&

542: \frac{1}{\muf} \muf

543: \nonumber

544: \\

545: &=&

546: 1.

547: \label{eq-mean-lower-bound-dytat}

548: \end{eqnarray}

549:

550: Applying (\ref{eq-mean-lower-bound-dytat})

551: to

552: (\ref{eq-mean-lower-bound-dyt}) gives

553: \begin{equation}

554: \label{eq-mean-lower-bound-dyt-final}

555: \E[\dyt : \calF_t ]

556: \le

557: \ydenom.

558: \end{equation}

559:

560: We have now shown that $Y_t$ drops slowly on average.  To turn this

561: into a lower bound on the time at which it first reaches zero, define

562: $Z_t = Y_t + \min(t, \tau) \yydenom$.

563: Conditioning on $t < \tau$, observe that

564: \begin{eqnarray*}

565: \E[\dzt:\calF_t, t < \tau]

566: &=&

567: \E[\dyt:\calF_t, t < \tau] - \yydenom

568: \\

569: &\le&

570: \yydenom - \yydenom

571: \\

572: &=&

573: 0.

574: \end{eqnarray*}

575:

576: Alternatively, if $t \ge \tau$ we have

577: \begin{displaymath}

578: \E[\dzt:\calF_t, t \ge \tau] = 0.

579: \end{displaymath}

580:

581: In either case, $\E[\dzt:\calF_t] \le 0$, implying

582: $Z_t \le \E[Z_{t+1}:\calF_t]$.

583: In other words, $\{Z_t, \calF_t\}$ is a submartingale.

584:

585: Because $\{Z_t, \calF_t\}$ is a submartingale, and $\tau$ is a

586: stopping time relative to $\{\calF_t\}$, we have

587: $Z_0 = Y_0

588: \le \E[Z_\tau]

589: = \E\left[ 0 + \tau \yydenom \right]

590: = \yydenom \E[\tau]$.

591: Solving for $\E[\tau]$ then gives

592: \begin{displaymath}

593: \E[\tau] \ge \frac{Y_0}{\ydenom}

594: = \frac{T(X_0)}{\epsilon T(X_0) + (1-\epsilon)}.

595: \end{displaymath}

596: \end{proof}

597:

598:

599: \subsection{Lower bounds on greedy routing}

600:

601: We will now show a lower bound on the expected time taken by greedy

602: routing on a random graph embedded in a line. Each node in the

603: graph has expected outdegree at most $\ell$ and is connected to its

604: immediate neighbor on either side. For polylogarithmic values of $\ell$,

605: we consider two variants of the greedy routing algorithm and derive lower

606: bounds for them equal to $\Omega(\log^2 n / (\ell^2 \log \log n))$ and to

607: $\Omega(\log^2 n / (\ell \log \log n))$, as stated in

608: Theorem~\ref{theorem-lower-bound}.

609: The routing variants, along with the machinery and proofs of the

610: associated lower bounds, are presented in Sections~\ref{Section-lower-bound}

611: through \ref{Section-putting-the-pieces-together}. For large values

612: of $\ell$, a lower bound of $\Omega(\frac{\lg n}{\lg \ell})$

613: on the worst-case routing time can be

614: derived very simply, as follows.

615:

616: \begin{theorem}

617: \label{theorem-tree-lower-bound}

618: Let $\ell \in (\lg n, n^c]$. Then for any link distribution and any

619: routing strategy, the

620: delivery time $T = \Omega(\frac{\log n}{\log \ell})$.

621: \end{theorem}

622: \begin{proof}

623: With $\ell$ links for each node, we can reach at

624: most $\ell^k$ nodes at step $k$. Assuming that the minimum time to

625: reach all $n$ nodes is T, $\ell^T = n$. This gives a lower bound of

626: $\Omega(\frac{\log n}{\log \ell})$ on $T$.

627: \end{proof}

628:

629:

630:

631: \subsubsection{Lower bound for polylogarithmic number of links}

632: \label{Section-lower-bound}

633:

634: We consider the case of the expected outdegree of each node falling in

635: the range $[1,\lg n]$. The probability that a node at

636: position $x$ is connected to positions $x-\Delta_1, x-\Delta_2,

637: \ldots, x-\Delta_k$ depends only on the set $\Delta=\{\Delta_1, \ldots,

638: \Delta_k\}$ and not on $x$ and is independent of the choice of outgoing

639: links for other nodes.\footnote{We assume that nodes are labeled by

640: integers and identify each node with its label to avoid excessive

641: notation.} Since we assume that each node is connected to its immediate

642: neighbors, we require that $\pm 1$ appears in $\Delta$.

643:

644: We consider two variants of the greedy routing algorithm. Without

645: loss of generality, we assume that the target of the search is labeled

646: $0$. In \buzz{one-sided greedy routing}, the algorithm never traverses a

647: link that would take it past its target.  So if the algorithm is currently

648: at $x$ and is trying to reach $0$, it will move to the node $x-\Delta_i$

649: with the smallest non-negative label.  In \buzz{two-sided greedy routing},

650: the algorithm chooses a link that minimizes the distance to the target,

651: without regard to which side of the target the other end of the link is.

652: In the two-sided case the algorithm will move to a node $x-\Delta_i$

653: whose label has the smallest absolute value, with ties broken

654: arbitrarily. One-sided greedy routing can be thought of as modeling

655: algorithms on a graph with a boundary when the target lies on the

656: boundary, or algorithms where all links point in only one direction

657: (as in Chord).

658:

659: Our results are stronger for the one-sided case than for the two-sided

660: case.  With one-sided greedy routing, we show a lower bound of

661: $\Omega(\log^2 n / (\ell \log \log n))$ on the time to reach $0$ from a

662: point chosen uniformly from the range $1$ to $n$ that applies to any link

663: distribution.  For two-sided routing, we show a lower bound of

664: $\Omega(\log^2 n / (\ell^2 \log \log n))$, with some constraints on the

665: distribution.  We conjecture that these constraints are unnecessary, and

666: that $\Omega(\log^2 n / (\ell \log \log n))$ is the correct lower bound

667: for both models. A formal statement of these results appears as

668: Theorem~\ref{theorem-lower-bound} in

669: Section~\ref{Section-putting-the-pieces-together}, but before we can

670: prove it we must develop machinery that will be useful in the

671: proofs of both the one-sided and two-sided lower bounds.

672:

673: \subsubsection{Link sets: notation and distributions}

674:

675: First we describe some notation for $\Delta$ sets.

676: Write each $\Delta$ as

677: \[\{\Delta_{-s}, \ldots \Delta_{-2}, \Delta_{-1} = -1,

678:  \Delta_{1} = 1, \Delta_{2}, \ldots  \Delta_{t}\},\]

679: where $\Delta_{i} < \Delta_{j}$ whenever $i < j$.

680: Each $\Delta$ is a random variable drawn from some distribution on

681: finite sets; the individual $\Delta_i$ are thus in general \emph{not}

682: independent.

683: Let $\Delta^-$ consist of the $s$ negative elements of $\Delta$

684: and $\Delta^+$ consist of the $t$ positive elements.

685: Formally define $\Delta_{-i} = -\infty$ when $i > s$

686: and $\Delta_{i} = +\infty$ when $i > t$.

687:

688: For one-sided routing, we make no assumptions about the distribution

689: of $\Delta$ except that $|\Delta|$ must have finite expectation and

690: $\Delta$ always contains $1$.  For two-sided routing, we assume that

691: $\Delta$ is generated by including each possible $\delta$ in $\Delta$

692: with probability $p_\delta$, where $p$ is symmetric about the origin

693: (i.e., $p_\delta = p_{-\delta}$ for all $\delta$),

694: $p_1 = p_{-1} = 1$, and $p$ is

695: unimodal, i.e. nonincreasing for positive $\delta$ and nondecreasing

696: for negative $\delta$.\footnote{These constraints imply

697: that $p_0 = 1$;

698: formally, we imagine that $0$ is present in each $\Delta$ but is

699: ignored by the routing algorithm.}  We also require that the events

700: $[\delta \in \Delta]$

701: and

702: $[\delta' \in \Delta]$

703: are pairwise independent for distinct $\delta,\delta'$.

704:

705: \subsubsection{The aggregate chain $S^t$}

706:

707: For a fixed distribution on $\Delta$, the trajectory

708: of a single initial point $X^0$ is a Markov chain $X^0, X^1, X^2, \ldots$,

709: with $X^{t+1} = s(X^t, \Delta^t)$,

710: where $\Delta^t$ determines the outgoing links from the node reached

711: at time $t$ and $s$ is a \buzz{successor function} that selects the next node

712: $X^{t+1} = X^t - \Delta^t_i$

713: according to the routing algorithm.

714: Note that the chain is Markov, because the presence of $\pm 1$ links

715: guarantees that no node ever appears twice in the sequence, and so

716: each new node corresponds to a new choice of links.

717:

718: \newcommand{\Di}{{\Delta i}}

719: \newcommand{\Dis}{{\Delta i \sigma}}

720:

721: From the $X^t$ chain we can derive an \buzz{aggregate chain}

722: that describes the

723: collective behavior of all nodes in some

724: range.

725: Each state of the aggregate chain is a contiguous sets of nodes whose

726: labels all have the same sign;

727: we define the sign of the state to be the common sign of all of its

728: elements.

729: For one-sided routing each state is either $\{0\}$ or an interval

730: of the form $\{1\ldots k\}$ for some $k$.  For two-sided routing the

731: states are more general

732: The aggregate states are characterized formally in

733: Lemma~\ref{lemma-aggregate-ranges}.

734:

735: Given a contiguous set of nodes $S$ and a set $\Delta$,

736: define

737: \begin{displaymath}

738: S_{\Di} = \{ x \in S : s(x, \Delta) = x - \Delta_i \}.

739: \end{displaymath}

740: The intuition is that $S_{\Di}$ consists of all those nodes for which

741: the algorithm will choose $\Delta_i$ as the outgoing link.

742: Note that $S_{\Di}$ will always be a contiguous range because of the

743: greediness of the algorithm.

744: Now define, for each $\sigma \in \{-, 0, +\}$:

745: \begin{displaymath}

746: S_{\Dis} = \{ x \in S_{\Di} : \sgn s(x, \Delta) = \sigma \}.

747: \end{displaymath}

748: Here we have simply split $S_{\Di}$ into those nodes with

749: negative, zero, or positive successors.

750:

751: For any set $A$ and integer $\delta$ write $A-\delta$

752: for $\{x-\delta : x \in A\}$.

753:

754: We will now build our aggregate chain by letting

755: the successors of a range $S$ be the ranges $S_{\Dis}-\Delta_i$

756: for all possible

757: $\Delta$, $i$, and $\sigma$.

758: As a special case, we define $S^{t+1} = \{0\}$ when $S^{t} = \{0\}$;

759: once we arrive at the target, we do not leave it.

760: For all other $S^t$, we let

761: \begin{equation}

762: \label{eq-stdis-prob}

763: \Pr\left[S^{t+1} = S^t_{\Dis} - \Delta_i : \Delta\right]

764: = \frac{|S^t_{\Dis}|}{|S^t|},

765: \end{equation}

766: and define the unconditional transition probabilities by averaging

767: over all $\Delta$.

768:

769: Lemma~\ref{lemma-aggregate-chain-works} shows that moving to the

770: aggregate chain does not misrepresent the underlying single-point

771: chain:

772:

773: \begin{lemma}

774: \label{lemma-aggregate-chain-works}

775: Let $X^0$ be drawn uniformly from the range $S^0$.  Let $Y^t$ be a

776: uniformly chosen element of $S^t$.  Then for all $x$ and $t$,

777: $\Pr[X^t = x] = \Pr[Y^t = x]$.

778: \end{lemma}

779: \begin{proof}

780: Clearly the lemma holds for $t=0$.

781: Fix $S^{t-1}$, and consider two methods for generating $Y^{t}$.

782: The first generates $Y^t$ directly from $Y^{t-1}$ and

783: shows that $Y^t$ generated in this way has the same distribution as

784: $X^t$.

785: The second generates $Y^t$ from $S^t$ as describe in the lemma

786: and produces the same

787: distribution on $Y^t$ as the first.

788:

789: In the first method,

790: we choose $Y^{t-1}$ uniformly from $S^{t-1}$, choose a

791: random $\Delta^{t-1}$, and compute $s(Y^{t-1}, \Delta^{t-1}$.

792: Here the transition rule applied to $Y^{t-1}$ is the same as for

793: $X^{t-1}$, so under the induction hypothesis that $Y^{t-1}$ and

794: $X^{t-1}$ are equal in distribution, so are $Y^t$ and $X^t$.

795:

796: In the second method, we again choose a random $\Delta^{t-1}$

797: and then choose $S^{t}$ by choosing some $S^{t-1}_{\Dis}$ in proportion

798: to its size, let $S^{t} = S^{t-1}_\Dis - \Delta_i$, and then let $Y^t$

799: be a uniformly chosen element of $S^t$.

800: We can implement the choice of $S^{t-1}_\Dis$ by choosing some $Y^{t-1}$

801: uniformly from $S^{t-1}$ and picking $S^{t-1}_\Dis$ as the subrange

802: that contains $Y^{t-1}$; and we can simplify the task of choosing

803: $Y^{t}$ by setting it equal to $Y^{t-1} - \Delta_i$, since

804: conditioning on $Y^{t-1} \in S^{t-1}_\Dis$ leaves $Y^{t-1}$ with a

805: uniform distribution.  But by implementing the second method in this

806: way, we have reduced it to the first, and the lemma is proved.

807: \end{proof}

808:

809: Lemma~\ref{lemma-aggregate-ranges} justifies our earlier

810: characterization of the aggregate state spaces:

811:

812: \begin{lemma}

813: \label{lemma-aggregate-ranges}

814: Let $S^0 = \{ 1 \ldots n \}$ for some $n$.

815: Then with one-sided routing,

816: every $S^t$ is either $\{0\}$ or of the form $\{1\ldots k\}$ for some

817: $k$;

818: and with two-sided routing,

819: every $S^t$ is an interval of integers in which every element has the

820: same sign.

821: \end{lemma}

822: \begin{proof}

823: By induction on $t$.  For one-sided routing, observe that

824: $S^{t-1}_{\Di -}$ is always empty, as the routing algorithm is not

825: allowed to jump to negative nodes.  If $S^t = S^{t-1}_{\Di 0} -

826: \Delta_i$, then

827: $S^t = \{\Delta_i\} - \Delta_i = \{0\}$.

828: Otherwise $S^t = S^{t-1}_{\Di +} - \Delta_i$; but since

829: $S^{t-1} = \{1 \ldots k\}$ for some $k$,

830: if it contains any point $x$ greater than $\Delta_i$ it must contain

831: $\Delta_i + 1$; thus $\min(S^{t-1}_{\Di +} = \Delta_i + 1$

832: and so $\min(S^t)$ becomes $1$.

833:

834: The result for the two-sided case is immediate from the fact that

835: $S^{t} = S^{t-1}_\Dis - \Delta_i$

836: combined with the definition of $S^{t-1}_\Dis$.

837: \end{proof}

838:

839: The advantage of the aggregate chain over the single-point chain is

840: that, while we cannot do much to bound the progress of a single point

841: with an arbitrary distribution on $\Delta$, we can show that the size

842: of $S^t$ does not drop too quickly given a bound $\ell$ on

843: $\E[|\Delta|]$.

844: The intuition is that each successor

845: set of size $a^{-1} |S^t|$ or less occurs

846: with probability at most $a^{-1}$, and there are at most $3\ell$ such

847: sets on average.

848:

849: \newcommand{\Prsta}{\Pr\left[|S^{t+1}| \le a^{-1} |S^t| : S^t\right]}

850: \begin{lemma}

851: \label{lemma-aggregate-max-drop}

852: Let $\E[|\Delta|] \le \ell$.  Then for any $a \ge 1$,

853: in either the one-sided or two-sided model,

854: \begin{equation}

855: \label{eq-aggregate-max-drop}

856: \Prsta \le 3\ell a^{-1}.

857: \end{equation}

858: \end{lemma}

859: \begin{proof}

860: \begin{sloppypar}

861: Fix $S^t$.

862: First note that if $a^{-1} |S^t| < 1$, then $\Prsta = 0$.

863: So we can assume that $a^{-1} |S^t| \ge 1$ and in particular that

864: $a \le |S^t|$.

865: \end{sloppypar}

866:

867: Conditioning on $\Delta$, there are at most $3|\Delta|$ non-empty sets

868: $S^t_{\Dis}$.

869: If $|S^t_\Dis| \le a^{-1} |S^t|$, then $|S^t_\Dis|$ is chosen with

870: probability at most $a^{-1}$ by (\ref{eq-stdis-prob}).

871: Thus the probability of choosing any of the at most $3|\Delta|$ sets

872: $S^t_\Dis$ of size at most $a^{-1}|S^t|$ is at most $3|\Delta|a^{-1}$.

873:

874: Now observe that

875: \begin{eqnarray*}

876: \Prsta &\le&

877:     \sum_d

878:         \Pr\left[ |\Delta| = d \right] 3da^{-1} \\

879:     &=& 3a^{-1} \E\left[|\Delta|\right] \\

880:     &\le& 3 \ell a^{-1}.

881: \end{eqnarray*}

882: \end{proof}

883:

884: \begin{sloppypar}

885: Another way to write (\ref{eq-aggregate-max-drop}) is to say that

886: $\Pr\left[ \ln |S^t| - \ln |S^{t+1}| \ge \ln a : S^t \right] \le 3 \ell

887: a^{-1}$, which will give the bound

888: (\ref{eq-mean-lower-bound-U-epsilon}) on the probability of large

889: jumps when it comes time to apply

890: Theorem~\ref{theorem-mean-lower-bound}.

891: \end{sloppypar}

892:

893: \subsubsection{Boundary points}

894: \label{section-boundary-points}

895:

896: Lemma~\ref{lemma-aggregate-max-drop} says that $|S^t|$ seldom drops by

897: too large a ratio at once, but it doesn't tell us much about how

898: quickly $|S^t|$ drops in short hops.  To bound this latter quantity,

899: we need to get a bound on how many subranges $S^t$ splinters into

900: through the action of $s(\cdot, \Delta)$.

901: We will do so by showing that only certain points can appear as the

902: boundaries of these subranges in the direction of $0$.

903:

904: For fixed $\Delta$, define for each $i > 0$

905: \begin{displaymath}

906: \beta_i = \ceil{\frac{\Delta_i+\Delta_{i+1}}{2}}

907: \end{displaymath}

908: and

909: \begin{displaymath}

910: \beta_{-i} = \floor{\frac{\Delta_{-i}+\Delta_{-i-1}}{2}}.

911: \end{displaymath}

912: Let $\beta$ be the set of all finite $\beta_i$ and $\beta_{-i}$.

913:

914: \begin{lemma}

915: \label{lemma-boundary-points}

916: Fix $S$ and $\Delta$ and let $\beta$ be defined as above.

917: Suppose that $S$ is positive.

918: Let $M = \{ \min(S_\Dis) : S_\Dis \ne \emptyset \}$ be the set of

919: minimum elements of subranges $S_\Dis$ of $S$.

920: Then $M$ is a subset of $S$ and contains no elements other than

921: \begin{enumerate}

922: \item $\min(S)$,

923: \item $\Delta_i$ for each $i > 0$,

924: \item $\Delta_i+1$ for each $i > 0$, and

925: \item at most one of $\beta_i$ or $\beta_i+1$ for each $i > 0$,

926: \end{enumerate}

927: where the last case holds only with two-sided routing.

928:

929: If $S$ is negative, the symmetric condition holds for

930: $M = \{ \max(S_\Dis) : S_\Dis \ne \emptyset \}$.

931: \end{lemma}

932: \begin{proof}

933: Consider some subrange $S_\Dis$ of $S$.  If $S_\Dis$ contains

934: $\min(S)$, the first case holds.  Otherwise:

935: (a) if $S_\Dis = S_{\Di 0}$, the second

936: case holds; (b) if $S_\Dis = S_{\Di +}$, the third case holds;

937: (c) if $S_\Dis = S_{\Di -}$, the fourth case holds, with $\min(S_{\Di

938: -}) = \beta_{i-1}$ if $\Delta_{i-1} + \Delta_i$ is odd, and either

939: $\beta_{i-1}$ or $\beta_{i-1}+1$ if $\Delta_{i-1} + \Delta_i$ is even,

940: depending on whether the tie-breaking rule assigns $\beta_{i-1}$ to

941: $S_{\Delta(i-1)+}$ or $S_{\Di -}$.

942: \end{proof}

943:

944: We will call the elements of $M$ \buzz{boundary points} of $S$.

945:

946: \subsubsection{Bounding changes in $\ln |S^t|$}

947:

948: Now we would like to use Lemmas~\ref{lemma-aggregate-max-drop} and

949: Lemma~\ref{lemma-boundary-points} to get an upper bound on the rate at

950: which $\ln |S^t|$ drops as a function of the $\Delta$ distribution.

951:

952: The following lemma is used to bound a sum that arises in

953: Lemma~\ref{lemma-log-drop}.

954:

955: \begin{lemma}

956: \label{lemma-conditional-convex}

957: Let $c \ge 0$.

958: Let $\sum_{i=1}^{n} x_i = M$ where each $x_i \ge 0$ and at least one

959: $x_i$ is greater than $c$

960: Let $B$ be the set of all $i$ for which $x_i$ is greater than $c$.

961: Then

962: \begin{equation}

963: \label{eq-condition-convex}

964: \frac{

965:   \sum_{i \in B} x_i \ln x_i

966: }{

967:   \sum_{i \in B} x_i

968: }

969: \ge

970: \ln\left( \max \left(c, \frac{M}{n}\right)\right).

971: \end{equation}

972: \end{lemma}

973: \begin{proof}

974: If $\frac{M}{n} < c$,

975: we still have $x_i > c$ for all

976: $i \in B$, so the left-hand side cannot be

977: less than $\ln c$.

978: So the interesting

979: case is when $\frac{M}{n} > c$.

980:

981: Let $B$ have $b$ elements.  Then $\sum_{i \notin B} x_i < (n-b)c$

982: and $\sum_{i \in B} \ge M - (n-b)c = M-nc+bc$.

983: Because $x_i \ln x_i$ is convex, its sum over $B$ is minimized for fixed

984: $\sum_{i\in B} x_i$ by setting all such $x_i$ equal, in which case the

985: left-hand side of (\ref{eq-condition-convex}) becomes simply

986: $\ln(x_i)$ for any $i \in B$.

987:

988: Now observe that setting all $x_i$ in $B$ equal gives

989: $x_i = \frac{M-nc+bc}{b}

990: = \frac{M-nc}{b} + c

991: \ge \frac{M-nc}{n} + c

992: = \frac{M}{n}$.

993: \end{proof}

994:

995: \newcommand{\aS}{a^{-1}|S|}

996: \newcommand{\lndrop}{\ln|S^t|-\ln|S^{t+1}|}

997: \newcommand{\constdrop}{\ln\frac{1}{1 - a^{-1}}}

998: \begin{lemma}

999: \label{lemma-log-drop}

1000: Fix $a > 1$, and

1001: let $S = S^{t}$ be a positive range with $|S| \ge a$.

1002: Define $\beta$ as in Lemma~\ref{lemma-boundary-points}.

1003: Let $S' = [\min(S) + \ceil{\aS} - 1, \max(S)-1]$.

1004: Let $A$ be the event $\left[\lndrop < \ln a\right]$.

1005: Then

1006: \begin{equation}

1007: \label{eq-log-drop}

1008: \E \left[ \lndrop : S^t, A \right]

1009: \le

1010: \constdrop + \frac{\ln \E[1+Z : S^t]}{\Pr[A: S^t]},

1011: \end{equation}

1012: where $Z = 2|\Delta \cap S'|$ with one-sided routing

1013: and $Z=2|\Delta \cap S'| + |\beta \cap S'|$ with two-sided routing.

1014: \end{lemma}

1015: \begin{proof}

1016: Call a subrange $S_\Dis$ \buzz{large} if $|S_\Dis| > \aS$ and

1017: \buzz{small} otherwise; the intent is that the large ranges are

1018: precisely those that yield $\lndrop < \ln a$.

1019: Observe that for any large $S_\Dis$, $|S_\Dis| > \aS \ge 1$,

1020: implying any large set has at least two elements.

1021:

1022: For any large $S_\Dis$,

1023: $\max(S_\Dis)

1024:  \ge \min(S) + \ceil{\aS} - 1$.

1025: Similarly

1026: $\min(S_\Dis)

1027:  \le \max(S) - 1$.

1028: So any large $S_\Dis$ intersects $S'$ in at least one point.

1029:

1030: Let $T = \{T_1, T_2, \ldots, T_k\}$

1031: be the set of subranges $S_\Dis$, large or small, that

1032: intersect $S'$.  It is immediate from this definition

1033: that $\bigcup T \supseteq S'$ and thus $\sum |T_j| \ge |S'|$.

1034:

1035: Using Lemma~\ref{lemma-boundary-points}, we can characterize the

1036: elements of $T$ as follows.

1037: \begin{enumerate}

1038: \item There is at most one set $T_j$ that contains $\min(T_j)$.

1039: \item There is at most one set $T_j$ that has $\min(T_j) = \Delta_i$ for each

1040: $\Delta_i$ in $S'$.

1041: \item There is at most one set $T_j$ that has $\min(T_j) = \Delta_i+1$ for

1042: each $\Delta_i$ in $S'$.

1043: \item With two-sided routing,

1044: there is at most one set $T_j$ that has $\min(T_j) = \beta_i$ or

1045: $\min(T_j) = \beta_i+1$ for each $\beta_i$ in $S'$.  Note that there

1046: may be a set whose minimum element is $\beta_i+1$ where $\beta_i =

1047: \min(S') - 1$, but this set is already accounted for by the first

1048: case.

1049: \end{enumerate}

1050:

1051: Thus $T$ has at most $1+Z = 1+2|\Delta \cap S'|$ elements with one-sided

1052: routing and at most $1+Z = 1+2|\Delta \cap S'| + |\beta \cap S'|$ elements

1053: with two-sided routing.

1054:

1055: Conditioning on $|S^{t+1}| > \aS$,

1056: $|S^{t+1}|$ is equal to $|S_\Dis|$ for some large $S_\Dis$ and thus

1057: for some large $T_j \in T$.

1058: Which large $T_j$ is chosen is proportional to its size, so

1059: for fixed $T$, we have

1060: \begin{eqnarray*}

1061: \E[\ln S^{t+1} : T, A] &=&

1062: \frac{

1063:   \sum_{j=1}^{|T|} |T_j| \ln |T_j|

1064: }{

1065:   \sum_{j=1}^{|T|} |T_j|

1066: } \\

1067: &\ge& \ln\left(\max\left(\aS, \frac{|\bigcup T|}{|T|}\right)\right) \\

1068: &\ge& \ln\left(\frac{|S'|}{|T|}\right),

1069: \end{eqnarray*}

1070: where the first inequality follows from

1071: Lemma~\ref{lemma-conditional-convex}.

1072:

1073: Now let us compute

1074: \begin{eqnarray*}

1075: \E[\lndrop : S^t, A ]

1076: &=& \ln|S^t| - \E[\ln|S^{t+1}| : S^t, A] \\

1077: &\le& \ln|S^t| - \E[\ln |S'| - \ln |T| : S^t, A] \\

1078: &=& \ln \frac{|S^t|}{|S'|} + \E[\ln |T| : S^t, A] \\

1079: &\le& \ln \frac{|S^t|}{|S'|} + \frac{\E[\ln |T| : S^t]}{\Pr[A: S^t]} \\

1080: &\le& \constdrop + \frac{\ln \E[|T| : S^t]}{\Pr[A: S^t]}.

1081: \end{eqnarray*}

1082: In the second-to-last step, we use

1083: $\E[\ln |T| : S^t, A] \le \E[\ln |T| : S^t] / \Pr[A: S^t]$,

1084: which follows from

1085: $\E[\ln |T| : S^t]

1086: =

1087:  \E[\ln |T| : S^t, A] \Pr[A: S^t]

1088: +\E[\ln |T| : S^t, \neg A] \Pr[\neg A: S^t]$.

1089: In the last step, we use $\E[\ln |T| : S^t, A] \le \ln E[|T| : S^t, A]$,

1090: which follows from the concavity of $\ln$ and Jensen's inequality.

1091: \end{proof}

1092:

1093: \subsubsection{Putting the pieces together}

1094: \label{Section-putting-the-pieces-together}

1095:

1096: We now have all the tools we need to prove our lower bound.

1097:

1098: \newcommand{\ZZ}{\mathbf{Z}} % the integers

1099: \newcommand{\bh}{\hat{\beta}}

1100: \newloglike{\roundfromzero}{absceil}

1101: \newcommand{\rfz}[1]{\roundfromzero\left({#1}\right)}

1102: \begin{theorem}

1103: \label{theorem-lower-bound}

1104: Let $G$ be a random graph whose nodes are labeled by the integers.

1105: Let $\Delta_x$ for each $x$ be a set of integer offsets chosen

1106: independently from some common distribution, subject to the constraint

1107: that $-1$ and $+1$ are present in every $\Delta_x$,

1108: and let node $x$ have an outgoing link to $x-\delta$ for each

1109: $\delta\in\Delta_x$.  Let $\ell = \E[|\Delta|]$.

1110: Consider a greedy routing trajectory in $G$ starting at a point chosen

1111: uniformly from $1 \ldots n$ and ending at $0$.

1112:

1113: With one-sided routing, the expected time to reach $0$ is

1114: \begin{equation}

1115: \label{eq-lower-bound-one-sided}

1116: \Omega\left(

1117:       \frac{\log^2 n}{\ell \log \log n}

1118: \right).

1119: \end{equation}

1120:

1121: With two-sided routing, the expected time to reach $0$ is

1122: \begin{equation}

1123: \label{eq-lower-bound-two-sided}

1124: \Omega\left(

1125:       \frac{\log^2 n}{\ell^2 \log \log n}

1126: \right),

1127: \end{equation}

1128: provided $\Delta$ is generated by including each

1129: $\delta$ in $\Delta$ with probability $p_\delta$, where (a) $p$ is

1130: unimodal, (b) $p$ is symmetric about $0$, and (c) the choices to

1131: include particular $\delta, \delta'$ are pairwise independent.

1132: \end{theorem}

1133: \begin{proof}

1134: Let $S^0 = \{ 1 \ldots n \}$.

1135:

1136: We are going to apply Theorem~\ref{theorem-mean-lower-bound} to the

1137: sequence $S^0, S^1, S^2, \ldots$ with $f(S) = \ln |S|$.

1138: We have chosen $f$ so that when we reach the target, $f(S)=0$; so that

1139: a lower bound on $\tau$ gives a lower bound on the expected time of

1140: the routing algorithm.

1141: To apply the theorem,

1142: we need to

1143: show that (a) the probability that $\ln |S|$ drops by a large amount

1144: is small, and (b) that the integral in

1145: (\ref{eq-mean-lower-bound-T}) is large.

1146:

1147: \begin{sloppypar}

1148: Let $a = 3 \ell \ln^3 n$.

1149: By Lemma~\ref{lemma-aggregate-max-drop},

1150: for all $t$,

1151: $\Prsta \le 3 \ell a^{-1} = \ln^{-3} n$,

1152: and thus

1153: $\Pr[\lndrop \ge \ln a : S^t] \le \ln^{-3} n$.

1154: This satisfies (\ref{eq-mean-lower-bound-U-epsilon})

1155: with $U = \ln a$ and $\epsilon = \ln^{-3} n$.

1156: \end{sloppypar}

1157:

1158: For the second step,

1159: Theorem~\ref{theorem-mean-lower-bound} requires that we bound the

1160: speed of the change in $f(S)$ solely as a function of $f(S)$.  For

1161: one-sided routing this is not a problem, as

1162: Lemma~\ref{lemma-aggregate-ranges} shows that $f(S)$, which reveals

1163: $|S|$, characterizes $S$ exactly except when $|S| = 1$ and the lower

1164: bound argument is done.  For two-sided routing, the situation is more

1165: complicated; there may be some $S^t$ which is not of the form

1166: $\{1\ldots |S^t|\}$ or $\{0\}$, and we need a bound on the speed at

1167: which $\ln |S^t|$ drops that applies equally to all sets of the same

1168: size.

1169:

1170: \begin{sloppypar}

1171: It is for this purpose (and only for this purpose)

1172: that we use our conditions on $\Delta$ for

1173: two-sided routing.

1174: Suppose that each $\delta$

1175: appears in $\Delta$ with probability $p_\delta$, that these

1176: probabilities are pairwise-independent, and that the sequence $p$ is

1177: symmetric and unimodal.

1178: Let $\bh = \left\{ \rfz{\frac{x+y}{2}} : x, y \in \Delta, x \ne y \right\}$,

1179: where $\rfz{z}$, the \buzz{absolute ceiling} of $z$,

1180: is $\ceil{z}$ when $z \ge 0$ and $\floor{z}$ when $z \le 0$.

1181: Observe that $\bh \supseteq \beta$; in effect, we are counting in

1182: $\bh$ all

1183: midpoints of pairs of distinct elements of $\delta$ without regard to

1184: whether the elements are adjacent.

1185: For each $k$, the expected number of distinct

1186: pairs $x$, $y$ with $x+y=z$ and

1187: $x,y \in \Delta$ is at most

1188: $b_k = \sum_{i=-\infty}^{\infty} p_{k-i} p_i$;

1189: this is a convolution of the non-negative, symmetric, and unimodal

1190: $p$ sequence with itself and so it is also symmetric and unimodal.

1191: It follows that for all $0 \le k < k'$, $b_k \ge b_{k'}$, and similarly

1192: $b_{-k} \ge b_{-k'}$.

1193: \end{sloppypar}

1194:

1195: Now for the punch line: for each $\delta \ne 0$,

1196: $q_\delta = b_{2\delta - \sgn \delta} + b_{2\delta}$

1197: is an upper bound on the expected number of distinct pairs $x,y$ that

1198: put $\delta$ in $\beta$, which is in turn an upper bound on

1199: $\Pr[\delta \in \beta]$, and from the unimodularity of $b$ we have

1200: that $q_\delta \ge q_{\delta'}$ and $q_{-\delta} \ge q_{-\delta'}$

1201: whenever $0 < \delta < \delta'$.  Though $q$ grossly over counts

1202: the elements of $\beta$ (in particular, it gives a bound on $\E[|\beta|]$

1203: of $\ell^2$), its ordering property means that we can bound the

1204: expected number of elements of $\beta$ that appear in some subrange

1205: of any positive $S^t$

1206: by using $q$ to bound the expected number of elements that

1207: appear in the corresponding subrange of $\{ 1 \ldots |S^t| \}$, and

1208: similarly for negative $S^t$ and $\{-1 \ldots - |S^t| \}$.

1209: Because $p_i$ already satisfies a similar ordering property, we

1210: can thus bound the number of elements of both $\Delta$ and $\beta$

1211: that hit a fixed subrange of $S^t$ given only $|S^t|$.  We do this next.

1212:

1213: For convenience, formally define $p_i = \Pr[i \in \Delta]$ and

1214: $q_i=0$ for one-sided routing.

1215: We will simplify some of the summations by first summing the $p_i$ and $q_i$

1216: over certain pre-defined intervals.

1217: For each integer $i > 0$ let

1218: $A_i = \{ k \in \ZZ : a^i-1 \le k < a^{i+1}-1\}

1219:  = \{k \in \ZZ : \floor{\ln_a k+1} = i \}$.

1220: Let $\gamma_i = \sum_{k \in A_i} 2p_i +

1221: q_i$.  Note that $\gamma_i \ge 2\E[|A_i \cap \Delta|]$

1222: for one-sided routing and

1223: $\gamma_i \ge 2\E[|A_i \cap \Delta|] + \E[|A_i \cap \beta|]$

1224: for two-sided routing.

1225: Observe also that

1226: $\sum_{i=0}^{\infty} \gamma_i$ is at most

1227: $2\ell$ for one-sided routing and at most $2\ell + \ell^2$ for two-sided

1228: routing.

1229:

1230: Consider some $S=S^t$.

1231: Let $A$ be the event $\left[\lndrop < \ln a\right]$.

1232: If $|S| \ge a$,

1233: then by Lemma~\ref{lemma-log-drop} we have

1234: \begin{equation}

1235: \label{eq-log-drop-revisited}

1236: \E \left[ \lndrop : S^t, A \right]

1237: \le \constdrop + \frac{\ln \E\left[1+Z : S^t\right]}{\Pr[A: S^t]},

1238: \end{equation}

1239: where

1240: $Z = 2|\Delta \cap S'|$ with one-sided routing and

1241: $Z = 2|\Delta \cap S'| + |\beta \cap S'|$ with two-sided routing,

1242: with $S' = [\min(S) + \ceil{\aS} -1 , \max(S)-1]$ in each case, as in

1243: Lemma~\ref{lemma-log-drop}.

1244:

1245: As we observed earlier, our choice of $a$ and

1246: Lemma~\ref{lemma-aggregate-max-drop} imply

1247: $\Pr[\lndrop \ge \ln a : S^t] \le \ln^{-3} n$, so

1248: $\Pr[A: S^t] = 1-\Pr[\lndrop \ge \ln a: S^t]

1249: \ge 1 - \ln^{-3} n \ge \frac{1}{2}$ for sufficiently

1250: large $n$.

1251: So we can replace (\ref{eq-log-drop-revisited}) with

1252: \begin{equation}

1253: \label{eq-log-drop-revisited-simple}

1254: \E \left[ \lndrop : S^t, A \right]

1255: \le \constdrop + 2 \ln \E\left[1+Z : S^t\right],

1256: \end{equation}

1257:

1258: Let us now obtain a bound on $\ln \E[1+Z]$ in terms of $|S|$ and

1259: the $p_i$ and $q_i$.

1260: For one-sided routing, we use the fact that $|S| > 1$ implies

1261: $S=\{1\ldots|S|\}$.  For two-sided routing, we use monotonicity of the

1262: $p_i$ and $q_i$ to replace $S$ with $\{1\ldots|S|\}$;

1263: in particular, to replace a sum of $2p_i+q_i$ over a subrange of $S$

1264: with a sum over subrange of $\{1\ldots|S|\}$ that is at least as

1265: large.

1266: In either case, we get that

1267: \begin{equation}

1268: \label{eq-bound-one-plus-z}

1269: \ln \E[1+Z] \le \ln\left(1 + \sum_{i = \ceil{\aS}-1}^{|S|-1} 2p_i + q_i\right),

1270: \end{equation}

1271: and thus

1272: $\E \left[ \lndrop : S^t, A \right]$ is bounded by

1273: \begin{equation}

1274: \label{eq-lower-bound-mu}

1275: \mu_{\ln |S|} =

1276: \constdrop +

1277: 2\ln\left(1 + \sum_{i = \ceil{\aS}-1}^{|S|-1} 2p_i + q_i\right),

1278: \end{equation}

1279: provided $|S| \ge a$.

1280: For $|S| < a$, set $\mu_{\ln|S|} = \ln a$.

1281:

1282: \newcommand{\gzs}{\gamma_{z'} + \gamma_{z'+1} + \gamma_{z'+2}}

1283: \begin{sloppypar}

1284: Let us now compute $m_z$, as defined in (\ref{eq-mean-lower-bound-m}).

1285: For $z < \ln a$, $m_z = \ln a$.

1286: For larger $z$, observe that

1287: $m_z = \sup \left\{ m_{\ln |S|} : e^z \le |S| < a e^z \right\}$.

1288: Now if $e^z \le |S| < a e^z$, then the bounds on the sum in

1289: (\ref{eq-lower-bound-mu}) both lie between $\ceil{a^{-1} e^z}-1$ and

1290: $a e^z -1$, so that

1291: \begin{eqnarray*}

1292: \label{eq-lower-bound-m}

1293: m_z &\le&

1294: \constdrop +

1295: 2\ln\left(1 + \sum_{i = \ceil{a^{-1}e^z}-1}^{\floor{ae^z-1}} 2p_i + q_i\right)

1296: \\

1297: &\le&

1298: \constdrop +

1299: 2\ln(1 + \gzs),

1300: \end{eqnarray*}

1301: where $z' = \floor{z/\ln a} - 1$.

1302: \end{sloppypar}

1303:

1304: Finally, compute

1305: \begin{eqnarray*}

1306: T(\ln n)  &=&

1307: \int_{0}^{\ln n} \frac{1}{m_z} dz \\

1308: &\ge&

1309: \int_{\ln a}^{\ln n} \frac{1}{\constdrop+2\ln(1+\gzs)} dz \\

1310: &\ge&

1311:   \sum_{i = 0}^{\floor{\ln n / \ln a} - 1}

1312:     \frac{\ln a}{\constdrop+2\ln(1+\gamma_i + \gamma_{i+1} + \gamma_{i+2})}.

1313: \end{eqnarray*}

1314:

1315: To get a lower bound on the sum,

1316: note that

1317: \[\sum_{i = 0}^{\floor{\ln n / \ln a} - 1}

1318:   (\gamma_i + \gamma_{i+1} + \gamma_{i+2})

1319:  \le 3 \sum_{i=0}^{\floor{\ln n / \ln a} + 1} \gamma_i

1320:  \le 3 \sum_{i=0}^{\infty} \gamma_i,

1321:  \]

1322: which is at most $L = 6\ell$ for one-sided routing and at most

1323: $L = 6\ell+3\ell^2$ for two-sided routing.

1324: In either case, because $\frac{1}{c+2\ln(1+x)}$ is convex and decreasing,

1325: we have

1326: \begin{eqnarray}

1327: T(\ln n) &\ge&

1328:   \sum_{i = 0}^{\floor{\ln n / \ln a} - 1}

1329:     \frac{\ln a}{\constdrop + 2\ln(1+\gamma_i + \gamma_{i+1} + \gamma_{i+2})}

1330: \nonumber\\

1331: &\ge&

1332: \sum_{i = 0}^{\floor{\ln n / \ln a} - 1}

1333:     \frac{\ln a}{\constdrop+2\ln\left(1 + \frac{L}{\floor{\ln n / \ln a}}\right)}

1334: \nonumber\\

1335: &=&

1336: \frac{ \ln a

1337:     \floor{\ln n / \ln a}

1338: }{

1339:     \constdrop +

1340:     2\ln\left(1 + \frac{L}{\floor{\ln n / \ln a}}\right)}.

1341:     \label{eq-mean-lower-bound-ugly-T}

1342: \end{eqnarray}

1343:

1344: We will now rewrite our bound on $T(\ln n)$ in a more convenient asymptotic

1345: form.  We will ignore the $1$ and concentrate on the large fraction.

1346: Recall that $a = 3 \ell \ln^3 n$,

1347: so $\ln a = \Theta(\ln \ell + \ln \ln n)$.

1348: Unless $\ell$ is polynomial in $n$, we have $\ln n / \ln a =

1349: \omega(1)$ and the numerator simplifies to $\Theta(\ln n)$.

1350:

1351: Now let us look at the denominator.

1352: Consider first the term $\constdrop$.

1353: We can rewrite this term as $-\ln(1-a^{-1})$; since $a^{-1}$ goes to

1354: zero as $\ell$ and $n$ grow we have

1355: $-\ln(1-a^{-1}) = \Theta(a^{-1}) = \Theta(\ell^{-1} \ln^{-3} n)$.

1356: It is unlikely that this term will contribute much.

1357:

1358: \begin{sloppypar}

1359: Turning to the second term, let us use the fact that

1360: $\ln(1+x) \le x$ for $x \ge 0$.

1361: Thus

1362: \begin{eqnarray*}

1363: 2\ln\left(1+\frac{L}{\floor{\ln n / \ln a}}\right)

1364: &\le& 2\,\frac{L}{\floor{\ln n/\ln a}}\\

1365: &=& O\left(\frac{L(\log l + \log \log n)}{\log n}\right),

1366: \end{eqnarray*}

1367: and the bound in (\ref{eq-mean-lower-bound-ugly-T}) simplifies to

1368: $\Omega\left(\log^2 n / \left( L (\log \ell + \log \log n)\right)\right)$.

1369: We can further assume that $\ell = O(\log^2 n)$, since otherwise the

1370: bound degenerates to $\Omega(1)$, and

1371: rewrite it simply as $\Omega\left(\log^2 n / \left(L \log \log n\right)\right).$

1372: \end{sloppypar}

1373:

1374: For large $L$, the approximation

1375: $\ln(1+x) \le 1+\ln x$ for $x \ge 0.59$ is more useful.

1376: In this case (\ref{eq-mean-lower-bound-ugly-T}) simplifies to

1377: $T(\ln n) = \Omega(\ln n / \ln \ell)$, which has a natural

1378: interpretation in terms of the tree of successor nodes of some single

1379: starting node and gives essentially the same bound as

1380: Theorem~\ref{theorem-tree-lower-bound}.

1381:

1382: We are not quite done with Theorem~\ref{theorem-mean-lower-bound} yet,

1383: as we still need to plug our $T$ and $\epsilon$ into

1384: (\ref{eq-mean-lower-bound}) to get a lower bound on $\E[\tau]$.

1385: But here we can simply observe that

1386: $\epsilon T = O(1/\log n)$, so the denominator in

1387: (\ref{eq-mean-lower-bound}) goes rapidly to $1$.

1388: Our stated bounds are thus finally obtained by substituting $O(\ell)$

1389: or $O(\ell^2)$ for $L$.

1390: \end{proof}

1391:

1392: \subsubsection{Possible strengthening of the lower bound}

1393:

1394: Examining the proof of Theorem~\ref{theorem-lower-bound},

1395: both the $\ell^2$ that appears in the bound

1396: (\ref{eq-lower-bound-two-sided}) for two-sided

1397: routing and the extra conditions imposed on the $\Delta$ distribution

1398: arise only as artifacts of our need to project each range $S$ onto

1399: $\{1\ldots|S|\}$ and thus reduce the problem to tracking a single

1400: parameter.  We believe that a more sophisticated argument that does

1401: not collapse ranges together would show a stronger result:

1402: \begin{conjecture}

1403: Let $G$, $\Delta$, and $\ell$

1404: be as in Theorem~\ref{theorem-lower-bound}.

1405: Consider a greedy routing trajectory starting at a point chosen

1406: uniformly from $1 \ldots n$ and ending at $0$.

1407:

1408: Then the expected time to reach $0$ is

1409: \begin{displaymath}

1410: \Omega\left(

1411:       \frac{\log^2 n}{\ell \log \log n}

1412: \right),

1413: \end{displaymath}

1414: with either one-sided or two-sided routing, and no constraints on the

1415: $\Delta$ distribution.

1416: \end{conjecture}

1417:

1418: We also believe that the bound continues to hold in higher dimensions

1419: than $1$.  Unfortunately, the fact that we can embed the line in, say,

1420: a two-dimensional grid is not enough to justify this belief;

1421: divergence to one side or the other of the line may change the

1422: distribution of boundaries between segments and break the proof of

1423: Theorem~\ref{theorem-lower-bound}.

1424: \subsection{Upper Bounds}

1425: \label{sec:UPPERBNDS}

1426:

1427: In this section, we present upper bounds on the

1428: delivery time of messages in a simple metric

1429: space: a one-dimensional real line. To simplify

1430: theoretical analysis, the system

1431: is set up as follows.

1432: \begin{itemize}

1433:   \item Nodes are embedded at grid points on the real

1434:         line.

1435:   \item Each node $u$ is connected to its nearest

1436:         neighbor on either side and to one or more

1437:         long-distance neighbors.

1438:   \item The long-distance neighbors are chosen as

1439:         per the inverse power-law distribution with

1440:         exponent $1$, i.e.,

1441:         each long-distance neighbor $v$ is chosen

1442:         with probability inversely proportional to

1443:         the distance between $u$ and $v$. Formally,

1444:         Pr[$v$ is the $i$th neighbor of $u$] =

1445:         $(\frac{1}{d(u,v)})/(\sum_{v'\neq u}\frac{1}{d(u,v')})$,

1446:         where $d(u,v)$ is the distance between nodes

1447:         $u$ and $v$ in the metric space.

1448:   \item Routing is done greedily by forwarding the

1449:         message to the neighbor closest to the target

1450:         node.

1451: \end{itemize}

1452:

1453: We analyze the performance for the cases of a single

1454: long-distance link and of multiple ones, both in a failure-free network

1455: and in a network with link and node failures. Note that when

1456: we say {\em node}, we actually refer to a vertex in the

1457: virtual overlay network and not a {\em physical} node as

1458: in the earlier sections.

1459:

1460:

1461: \subsubsection{Single Long-Distance Link}

1462: \label{sec:INVERSE}

1463:

1464: We first analyze the delivery time in an idealized model with no

1465: failures and with one long-distance link per node.

1466: Kleinberg \cite{KL99} proved that with $n^d$ nodes embedded at grid

1467: points in a $d$-dimensional grid, with each node $u$ connected to its

1468: immediate neighbors and one long-distance neighbor $v$ chosen with

1469: probability proportional to $1/d(u, v)^d$, any message can be

1470: delivered in time polynomial in $\log n$ using greedy routing.

1471: While this result can be directly applied to our model with $d=1$

1472: and $l=1$ to give a $O(\log^2 n)$ delivery time, we get a much simpler

1473: proof by use of Lemma~\ref{lemma-probabilistic-recurrence-ub}.

1474: We include the proof below for completeness.

1475:

1476: \begin{theorem}

1477: \label{thm:UPPER-SINGLE}

1478: Let each node be connected to its immediate neighbors (at distance 1)

1479: and $1$ long-distance neighbor chosen with probability inversely

1480: proportional to its distance from the node. Then the expected delivery

1481: time with $n$ nodes in the network is $T(n)=O(H_n^2)$.

1482: \end{theorem}

1483:

1484: \begin{proof}

1485: Let $\mu_k$ be the expected number of nodes crossed when the message is

1486: at a node that is at a distance $k$ from the destination.

1487: Clearly, $\mu_k$ is non-decreasing.

1488:

1489: \begin{figure}[htb]

1490: \centerline{\epsfig{figure=1D.eps, height=75pt}}

1491: \caption{All the possible distances that can be

1492: covered from source node $s$.}

1493: \end{figure}

1494:

1495: \noindent

1496: Let

1497: $$\mu_k = \frac{\sum_{i=1}^k \frac{1}{i} \cdot i}{S}

1498:         + \frac{\sum_{i=1}^{k-1} \frac{1}{2k-i} \cdot i}{S}

1499:         + \frac{\sum_{i=1}^{n_1-k} \frac{1}{i} \cdot 1}{S}

1500:         + \frac{\sum_{i=2k}^{n_2+k} \frac{1}{i} \cdot 1}{S},$$

1501: where

1502: $$

1503: S = \sum_{i=1}^{n_1-k} \frac{1}{i} + \sum_{i=1}^{n_2+k} \frac{1}{i}\\

1504:   = H_{n_1-k}+H_{n_2+k} < 2H_n.

1505: $$

1506: Then

1507: $$

1508: \mu_k > \frac{1}{S} [ k + 0 + H_{n_1-k} + H_{n_2+k} - H_{2k}]\\

1509:       > \frac{k}{S} > \frac{k}{2H_n}.\\

1510: $$

1511: Clearly, $\mu_k$ is non-decreasing, and thus

1512: using Lemma~\ref{lemma-probabilistic-recurrence-ub}, we get

1513: $$T(n) \leq \sum_{k=1}^n \frac{1}{\mu_k}

1514: = \sum_{k=1}^n \frac{2H_n}{k}= O(H_n^2).$$

1515: Thus with this distribution, the delivery time is

1516: logarithmic in the number of nodes.

1517: \end{proof}

1518:

1519:

1520: \subsubsection{Multiple Long-Distance Links}

1521: \label{sec:MULT-LINKS}

1522:

1523: The next interesting question is whether we can improve the $O(\log^2 n)$

1524: delivery time by using multiple links instead of a single one. In

1525: addition to improvement in performance, multiple links  also give the

1526: benefit of robustness in the face of failures. We first look at

1527: improvement in performance by using multiple links in the system

1528: and then go onto analysis of failures in Section~\ref{sec:LINK-FAIL}.

1529: Suppose that there are $\ell$ links from each node.

1530: We consider different strategies for generating links and routing

1531: depending on number of links $\ell$ in two ranges: $\ell \in [1,\lg n]$

1532: and $\ell \in (\lg n, n^c]$.

1533:

1534: In \cite{KL01}, Kleinberg uses a group structure to get a delivery time

1535: of $O(\log n)$ for the case of a polylogarithmic number of links.

1536: However, he uses a more complicated algorithm for routing while we

1537: obtain the same bound (for the case of a line) using only greedy routing.

1538:

1539: \begin{figure}[h]

1540: \begin{center}

1541: \input{qlinks.pstex_t}

1542: \caption{Multiple long-distance links for each node.}

1543: \end{center}

1544: \end{figure}

1545:

1546: \paragraph{Upper Bound}

1547: Let us first consider a randomized strategy for link distribution

1548: when $\ell \in [1, \lg n]$.

1549:

1550: \begin{theorem}

1551: \label{thm:UPPER-RANDOMIZED-MULTIPLE}

1552: Let each node be connected to its immediate neighbors (at distance 1)

1553: and $\ell$ long-distance neighbors chosen independently with replacement

1554: with probability proportional to their distances from the node.

1555: Let $\ell \in [1, \lg n]$.  Then the expected delivery time

1556: $T(n)=O(\log^2 n/\ell)$.

1557: \end{theorem}

1558:

1559: \begin{proof}

1560: The basic idea for this proof comes from Kleinberg's model~\cite{KL99}.

1561: Kleinberg considers a two-dimensional grid with nodes at every grid point.

1562: The delivery of the message is divided into phases. A message is said to

1563: be in phase $j$ if the distance from the current node to the destination

1564: node is between $2^j$ and $2^{j+1}$. There are at most ($\lg n+1$) such

1565: phases. He proves that the expected time spent in each phase is at most

1566: $O(\log n)$, thus giving a total upper bound of $O(\log^2 n)$ on the delivery

1567: time. We use the same phase structure in our model, and this proof

1568: is along similar lines.

1569:

1570: In our multiple-link model, each node has $\ell$ long-distance neighbors

1571: chosen with replacement.  The probability that $u$ chooses a node $v$ as its

1572: long-distance neighbor is

1573: $1-(1-q)^\ell$, where $q=\frac{d(u, v)^{-1}}{\sum_{u\ne v}d(u, v)^{-1}}$.

1574: We can get a lower bound on this probability as follows:

1575: \begin{eqnarray*}

1576: 1-(1-q)^\ell &>& 1 - (1 - q\ell + \frac{\ell(\ell-1)}{2}q^2)\\

1577: &=&q\ell - \frac{\ell(\ell-1)}{2}q^2 = q\ell\left[1-\frac{(\ell-1)q}{2}\right]\\

1578: &=&q\ell\left[1 -\frac{\ell q}{2} +\frac{q}{2}\right]\\

1579: &\geq&q\ell\left[1-\frac{\ell q}{2}\right].\\

1580: \end{eqnarray*}

1581:

1582: Notice that $\ell q < 1$, because $q < \frac{1}{\lg n}$ and $\ell \leq \lg n$.

1583: So, the probability that $u$ chooses $v$ as its long-distance

1584: neighbor is at least

1585:

1586: \begin{eqnarray*}

1587: q\ell\left[1-\frac{\ell q}{2}\right]

1588: &\geq&q\ell\left[1-\frac{1}{2}\right]=\frac{q\ell}{2}

1589: =\ell [2d(u,v)H_n]^{-1}.

1590: \end{eqnarray*}

1591:

1592: Now suppose that the message is currently in phase $j$.

1593: To end phase $j$ at this step, the message should enter a set of nodes $B_j$

1594: at a distance $\leq 2^j$ of the destination node $t$. There are at least $2^j$

1595: nodes in $B_j$, each within distance $2^{j+1} + 2^j < 2^{j+2}$ of $u$. So the

1596: message enters $B_j$ with probability

1597: $\geq 2^j\ell\frac{1}{2H_n2^{j+2}} = \frac{\ell}{8H_n}$

1598:

1599: Let $X_j$ be the total number of steps spent in phase $j$. Then

1600: $$

1601: E[X_j] = \sum_{i=1}^\infty Pr[X_j \geq i]

1602: \leq \sum_{i=1}^\infty\left( 1 - \frac{\ell}{8H_n} \right)^{i-1}

1603: = \frac{8H_n}{\ell}.

1604: $$

1605:

1606: Now if $X$ denotes the total number of steps, then

1607: $X=\sum_{j=0}^{\lg n}X_j$, and by linearity of expectation, we get

1608: $EX\leq(1+\lg n)(8H_n/\ell)=O(\log^2n/\ell)$.

1609: \end{proof}

1610:

1611:

1612: For $\ell \in (\lg n, n^c]$, we use a deterministic strategy. We represent

1613: the location of each node as a number in a base $b\geq 2$, and

1614: generate links to nodes at distances $1x, 2x, 3x, \ldots, (b-1)x$, for each

1615: $x \in \{b^0, b^1, \ldots, b^{\lceil\log_b n\rceil -1} \}$.

1616: Routing is

1617: done by eliminating the most significant digit of the distance at each step.

1618: As this distance can be at most $b^{\lceil\log_b n\rceil}$, we get

1619: $T(n)=O(\log_b n)$. This strategy is similar in spirit to Plaxton's

1620: algorithm \cite{PL97}.

1621:

1622: Some special cases are instructive.

1623: Let $\ell=O(\log n)$ and let each node link to nodes in both directions at

1624: distances $2^i, 1 \leq i \leq 2^{\log n-1}$, provided nodes are present at

1625: those distances. This gives $T(n)=O(\log n)$. Similarly let $\ell=O(\sqrt{n})$.

1626: Links are established in both directions to existing nodes at distances $1, 2,

1627: 3, \ldots, \sqrt{n}, 2\sqrt{n}, 3\sqrt{n}, \ldots, \sqrt{n}(\sqrt{n}-1)$,

1628: giving $T(n)=O(1)$. In fact, $T(n)=O(1)$ when $b={n^c}$, for any fixed $c$.

1629:

1630: \begin{theorem}

1631: \label{thm:UPPER-BOUND-DETERMINISTIC-MULTIPLE}

1632: Choose an integer $b>1$. With $\ell=(b-1)\lceil\log_b n\rceil$, let

1633: each node link

1634: to nodes at distances $1x, 2x, 3x, \ldots, (b-1)x$, for each $x \in \{b^0, b^1,

1635: \ldots, b^{\lceil \log_b n\rceil -1} \}$. Then the delivery time $T(n)

1636:  = O(\log_b n)$.

1637: \end{theorem}

1638:

1639: \begin{proof}

1640: Let $d_1, d_2, \ldots d_t$ be the distances of the successive nodes in the

1641: delivery path from the target $t$, where  $d_1$ is the distance of the source node

1642: and $d_t=0$.  For each $d_i, \exists k_i \in \{0, 1, \ldots,

1643: \lfloor \log_b n\rfloor\}$ such that

1644: $$b^{k_i} \leq d_i < b^{k_i+1}.$$

1645: Hence

1646: $$1 \leq \lfloor \frac{d_i}{b^{k_i}} \rfloor < b.$$

1647: Now each node is connected to the node at distance $b^{k_i} \lfloor

1648: \frac{d_i}{b^{k_i}} \rfloor$. We get

1649: $$

1650: d_{i+1} = d_i - b^{k_i} \lfloor \frac{d_i}{b^{k_i}} \rfloor

1651:         = d_i\mod b^{k_i}

1652:         < b^{k_i}.

1653: $$

1654: Thus $k_i$ drops by at least 1 at every step. As $k_1 \leq \lceil

1655: \log_b n\rceil$, we get

1656: $T(n)=O(\log_b n)$.

1657: \end{proof}

1658:

1659:

1660: \subsubsection{Failure of Links}

1661: \label{sec:LINK-FAIL}

1662:

1663: It appears that our

1664: linking strategies may fail to give the same delivery time

1665: in case the links fail. However, we show that we get reasonable

1666: performance even with link failures.  In our model, we assume

1667: that each link  is present independently with probability $p$.

1668: Let us first look at

1669: the randomized strategy for number of links $\ell \in [1, \lg n]$.\\

1670:

1671: \begin{figure}[htb]

1672: \centerline{\epsfig{figure=absent_link.eps, height=75pt}}

1673: \caption{Each long-distance link is present with probability $p$.}

1674: \end{figure}

1675:

1676: Our proof is along similar

1677: lines as our proof for the case of no failures.

1678: Intuitively, since some of the links fail, we

1679: expect to spend more time in each phase and this time

1680: should be inversely proportional to the probability with

1681: which the links are present. We prove that the expected time

1682: spent in one phase is $O(\log n/p\ell)$, which gives a total

1683: delivery time of $O(\log^2 n/p \ell)$. We assume that the links

1684: to the immediate neighbors are always present so that a message

1685: is always delivered even if it takes very long.

1686:

1687: \begin{theorem}

1688: Let the model be as in Theorem~\ref{thm:UPPER-RANDOMIZED-MULTIPLE}.

1689: Assume that the links to the immediate neighbors are always present.

1690: If the probability of a long-distance link being present is $p$,

1691: then the expected delivery time is $O(\log^2 n/p\ell)$.

1692: \end{theorem}

1693:

1694: \begin{proof}

1695: Recall that in case of no link failures, the probability that $u$

1696: chooses a node $v$ as its long-distance neighbor is at

1697: least $q\ell/2$

1698: where $q=\frac{d(u, v)^{-1}}{\sum_{u\ne v}d(u, v)^{-1}}$.

1699:

1700: Now when we consider link failures, given that $u$ chose

1701: $v$ as its long-distance neighbor, the probability that

1702: there is a link present between $u$ and $v$ is $p$.

1703: So, the probability that $u$ chooses a node $v$ as its long-distance neighbor

1704: is at least $pq\ell/2 = p\ell[2d(u,v)H_n]^{-1}$.

1705:

1706: The rest of the proof is the same as the proof for

1707: theorem~\ref{thm:UPPER-RANDOMIZED-MULTIPLE}. Let $X_j$ be the

1708: number of steps spent in phase $j$. Then

1709: $$E[X_j]=\sum_{i=1}^\infty Pr[X_j \geq i]

1710: = \frac{8H_n}{p\ell}.$$

1711:

1712: If $X$ denotes the total number of steps, then by linearity of

1713: expectation, we get

1714: $EX\leq(1+\lg n)(8H_n/p\ell)=O(\log^2n/p\ell)$.

1715: \end{proof}

1716:

1717:

1718:

1719: We turn to the deterministic strategy with $\ell \in (\lg n, n^c]$

1720: links. A similar intuition works for $\ell \in (\lg n, n^c]$. If a

1721: link fails, then the node has to take a shorter long-distance link,

1722: which will not take the message as close to the target as the initial

1723: failed link. Clearly as $p$ decreases, the message has to take

1724: shorter and shorter links which increases the delivery time.

1725:

1726: To make the analysis simpler, we

1727: change the link model a bit and let each node be

1728: connected to other nodes at distances $b^0, b^1, b^2, \ldots,

1729: b^{\lfloor \log_b n \rfloor}$.

1730: Once again, we compute the expected distance covered from the

1731: current node and use Lemma~\ref{lemma-probabilistic-recurrence-ub}

1732: to get a delivery time of $O(b \log n/p)$.  As $p$ decreases,

1733: the delivery time increases; whereas as $b$ decreases,

1734: the delivery time decreases, but the

1735: information stored at each node increases.

1736:

1737: \begin{theorem}

1738: Let the number of links be $O(\log_b n)$, and let each node have a link

1739: to distances $b^0, b^1, b^2, \ldots, b^{\lfloor \log_b n \rfloor}$.

1740: Assume that the links to

1741: the nearest neighbors are always present. If the probability of

1742: a link being present is $p$, then the delivery time

1743: $T(n)= O(bH_n/p)$.

1744: \end{theorem}

1745:

1746: \begin{proof}

1747: Let the distance of the current node

1748: from the destination be $k$. Let $\mu_k$ represent the distance covered

1749: starting from this node. Then with probability $p$, there will be a

1750: link covering distance $\flrbk{}$. If this link is absent with

1751: probability $q=1-p$, then we can cover a distance $\flrbk{-1}$

1752: with a single link with probability $pq$ and so on. In general,

1753: the average distance $\mu_k$ covered when the message is at distance $k$

1754: from the destination is

1755: \begin{eqnarray*}

1756: \mu_k&=&p\flrbk{} + pq\flrbk{-1} + \ldots

1757:            + pq^{\lfloor \log_b k \rfloor-1}b^1

1758:            + q^{\lfloor \log_b k \rfloor}b^0 \\

1759: &\geq& \sum_{i=0}^{\lfloor \log_b k \rfloor}

1760:        p\flrbk{-i}q^i\\

1761: &=&p\flrbk{} \sum_{i=0}^{\lfloor \log_b k \rfloor} \left(\frac{q}{b}\right)^i\\

1762: &=&p\flrbk{} \frac{1-\left(q/b\right)^{\lfloor \log_b k \rfloor+1}}{1-(q/b)}\\

1763: &=&\frac{p(\flrbk{+1}-q^{\lfloor \log_b k \rfloor+1})}{b-q}\\

1764: &\geq&\frac{p(bk/b-1)}{b-q}\\

1765: &\geq&\frac{p(k-1)}{2(b-q)}.\\

1766: \end{eqnarray*}

1767: Using Lemma~\ref{lemma-probabilistic-recurrence-ub}, we get

1768: $$

1769: T(n) \leq \sum_{k=1}^n\frac{1}{\mu_k}

1770: =1+\sum_{k=2}^n\frac{2(b-q)}{p(k-1)}

1771: =1+\frac{2(b-q)}{p}\left[\sum_{k=2}^n\frac{1}{(k-1)}\right]

1772: =O(bH_n/p).

1773: $$

1774: \end{proof}

1775:

1776:

1777: \subsubsection{Failure of Nodes}

1778: \label{sec:NODE-FAIL}

1779:

1780: We consider two different cases of node failures when  we study

1781: their effect on system performance. In the first case, as described in

1782: Section~\ref{sec:BIN-NODE-FAIL}, some of the nodes may fail

1783: and then the remaining nodes will link to each other as

1784: per the link distribution. In the second case, as explained

1785: in Section~\ref{sec:GEN-NODE-FAIL}, the nodes first link to

1786: their neighbors and then some of the nodes may fail.

1787:

1788: \paragraph{Binomially Distributed Nodes}

1789: \label{sec:BIN-NODE-FAIL}

1790:

1791: Let $p$ be the

1792: probability that a node is present at any point. Here also, each node is

1793: connected to its nearest neighbors and one long-distance neighbor. In

1794: addition, the probability of choosing a particular node as a long-distance

1795: neighbor is conditioned on the existence of that node.

1796:

1797: \begin{theorem}

1798: \label{thm:UPPER-BINOMIAL}

1799: Let the model be as in Theorem~\ref{thm:UPPER-SINGLE}.

1800: Let each node be present with probability $p$ and all nodes

1801: link only to existing nodes. Then the worst-case expected delivery time

1802: is $O(\log^2 n)$.

1803: \end{theorem}

1804:

1805: \begin{proof}

1806: We bound the expected drop $\mu_k$ as follows:

1807:

1808: \begin{eqnarray*}

1809: \mu_k &=& \frac{\sum_{i=1}^k \frac{1}{i} \cdot i \cdot p}{p \cdot S}

1810:        +  \frac{\sum_{i=1}^{k-1} \frac{1}{2k-i} \cdot i \cdot p}{p \cdot S}

1811:        +  \frac{\sum_{i=1}^{n_1-k} \frac{1}{i} \cdot 1 \cdot p}{p \cdot S}

1812:        +  \frac{\sum_{i=2k}^{n_2+k} \frac{1}{i} \cdot 1 \cdot p}{p \cdot S}\\

1813:       &>& \frac{1}{S} [ k + 0 + H_{n_1-k} + H_{n_2+k} - H_{2k}]\\

1814:       &>& \frac{k}{S} > \frac{k}{2H_n}.\\

1815: \end{eqnarray*}

1816:

1817: Using Lemma~\ref{lemma-probabilistic-recurrence-ub},

1818: we get $T(n)\leq \sum_{k=1}^n 1/\mu_k

1819: =O(H_n^2)$. This is exactly the same result that we get in

1820: Section~\ref{sec:INVERSE} where all the nodes are present.

1821: \end{proof}

1822:

1823: This result is not

1824: surprising because if nodes link only to other existing nodes, the

1825: only difference is that we get a smaller random graph. This does

1826: not affect the routing algorithm or the delivery time.

1827:

1828:

1829: \paragraph{General Failures}

1830: \label{sec:GEN-NODE-FAIL}

1831:

1832: We observe that the analysis for node failures is not as simple as that

1833: for link failures because we no longer

1834: have the important property of independence that we have

1835: in the latter case. In the case of link failures,

1836: the nodes first choose their neighbors and then it is possible that

1837: some of these links fail; thus, the event that a node is connected

1838: to another node is completely independent of the event that, say, its

1839: neighbor is connected to the same node. Each link fails independently, and

1840: so the accessibility of a target node from any other node depends only

1841: on the presence of the link between the two nodes in question.

1842:

1843: In case of node failures, this important independence

1844: property is no longer true. Suppose that a

1845: node $u$ cannot communicate with some other node $v$ (because $v$

1846: failed), even though there may be a functional link between $u$

1847: and $v$. Now the probability of some other node $w$ being able

1848: to communicate with $v$ is not independent of the probability

1849: that $u$ can communicate with $v$ because the probability of

1850: $v$ being absent is common for both the cases. This complicates

1851: the analysis of the performance because it is no longer the case

1852: that if one node cannot communicate with some other node, it has a

1853: good chance of doing so by passing the message to its neighbor.

1854:

1855: In order to analyze this situation, we consider jumps only to one

1856: phase lower rather than jumping over several phases.  The idea is

1857: that the jumps between phases are independent, so once we move from

1858: phase $j$ to phase $j-1$, further routing no longer depends on

1859: any nodes in phase $j$. We can condition on the number of nodes being

1860: alive in the lower phase and estimate the time spent in each phase.

1861: Intuitively, if a node is present with probability $p$, we would expect

1862: to wait for a time inversely proportional to $p$ in anticipation of

1863: finding a node in the lower phase to jump to.

1864:

1865: \begin{theorem}

1866: Let the model be as in Theorem~\ref{thm:UPPER-RANDOMIZED-MULTIPLE}

1867: and let each node fail with probability $p$.

1868: Then the expected delivery time is $O(\log^2n/(1-p))$.

1869: \end{theorem}

1870:

1871: \begin{proof}

1872: Let $T$ be the time taken to drop down from layer

1873: $j$ to layer $j-1$. Let $l$ out of $N$ nodes be alive

1874: in layer $j-1$ and let $q$ be the probability that

1875: a node in layer $j$ is connected to some node in

1876: layer $j-1$. Then the expected time to drop to

1877: layer $j-1$, given that there are $l$ live nodes

1878: in it, is given by

1879: \begin{eqnarray*}

1880: E[T|l] &=& 1 + \left[ (1-q) + \frac{q(N-l)}{N} \right] E[T|l]\\

1881: &=& \frac{N}{ql}.

1882: \end{eqnarray*}

1883:

1884: Now $l$ can vary between $1$ and $N$. (Note that $l$

1885: cannot be $0$ because if there are no live nodes in the

1886: lower layer, the routing fails at this point.)

1887: We get

1888: \begin{eqnarray*}

1889: E[T] &=&

1890: \sum_{l=1}^{N}\frac{N}{ql}\left[ p^{N-l}(1-p)^l{{N}\choose{l}} \right]\\

1891: &=&\frac{N}{q}\sum_{l=1}^{N}\frac{1}{l}p^{N-l}(1-p)^{l}{N\choose l}\\

1892: &\leq&\frac{N}{q}\sum_{l=1}^{N}\frac{2}{l+1}p^{N-l}(1-p)^{l}{N\choose l}\\

1893: &=&\frac{2N}{q(N+1)(1-p)}\sum_{l=1}^{N}p^{N-l}(1-p)^{l+1}{N+1\choose l+1}\\

1894: &\leq&\frac{2N}{q(N+1)(1-p)}\left[p+(1-p)\right]^{N+1}\\

1895: &=&\frac{2N}{q(N+1)(1-p)}.

1896: \end{eqnarray*}

1897:

1898: Not surprisingly, the expected waiting time in a layer

1899: is inversely proportional to the probability of being

1900: connected to a node in the lower layer and to the probability

1901: of such a node being alive.

1902:

1903: For our randomized routing strategy with $[1, \lg n]$ links,

1904: $q\approx 1/(H_n\ell)$. Since there are at most $(\lg n +1)$ layers,

1905: we get an expected delivery time of $O(\log^2 n/(1-p)\ell)$.

1906: \end{proof}

1907:

1908: In contrast, for our deterministic routing strategy, certain

1909: carefully chosen node failures can lead to dismal situations where a

1910: message can get stuck in a local neighborhood with no hope of getting

1911: out of it or eventually reaching the destination node. We conjecture

1912: that this should be a very low probability event, so its occurrence

1913: will not affect the delivery time considerably. We have not yet analyzed

1914: this situation formally.

1915:

1916: \section{Construction of Graphs}

1917: \label{sec:RANDOMGRAPHS}

1918:

1919: As the group of nodes present in the network changes, so does the

1920: graph of the virtual overlay network. In order for our routing

1921: techniques to be effective, the graph must always exhibit the

1922: property that the likelihood of any two vertices $v,u$ being connected

1923: is $\Omega(1/d(v,u))$. We describe a heuristic approach to

1924: construct and maintain a random graph with such an invariant.

1925:

1926: Since the choice of links leaving each vertex is independent of

1927: the choices of other vertices, we can assume that points

1928: in the metric space are added one at a time. Let $v$ be the $k$-th

1929: point to be added. Point $v$ chooses the sinks of its outgoing links

1930: according to the inverse power law distribution with exponent $1$

1931: and connects to them by

1932: running the search algorithm. If a desired sink $u$ is not present, $v$

1933: connects to $u$'s closest live neighbor. In effect, each of the

1934: $k-1$ points already present before $v$ is surrounded by a basin of

1935: attraction, collecting probability mass in proportion to its length.

1936: Since we assume the hash function populates the metric space evenly,

1937: and because of absolute symmetry, the basin length $L$ has the same

1938: distribution for all points. It is easy to see that with high probability,

1939: $L$ will not be much smaller than its expectation: $\prob{L \leq c

1940: \cdot k^{-1}}=1-(1-c\cdot k^{-1})^{k-1}$. A lower bound on the

1941: probability that the link $(v,u)$ is present is $c' \cdot k^{-1}

1942: \cdot d(v,f)^{-1}$, where $f$ is the point in $u$'s basin that is the

1943: farthest from $v$.\footnote{The constant $c'$ has absorbed $c$ and the

1944: normalizing constant for the distribution.} However, the bound holds

1945: only if $u$ is among the $k-1$ points added before $v$. Otherwise,

1946: the aforementioned probability is $0$, which means that we need to amend

1947: our linking strategy to transfer probability mass from the case of

1948: $u$ having arrived before $v$ to the case of $u$ having arrived after $v$.

1949: We describe next how to accomplish this task.

1950:

1951: Let $v$ be a new point.  We give earlier points the opportunity

1952: to obtain outgoing links to $v$ by having $v$ (1) calculate the

1953: number of incoming links it ``should'' have from points added before it

1954: arrived, and (2) choose such points according to the inverse

1955: power-law distribution with exponent 1.\footnote{All this can be easily

1956: calculated by $v$ since the link probabilities are symmetric.} If $\ell$

1957: is the number of outgoing links for each point, then $\ell$ will also be

1958: the expected number of incoming links that $v$ has to estimate in step

1959: (1).

1960: We approximate the number of links

1961: ending at $v$ by using a Poisson distribution with

1962: rate $\ell$, that is, the probability that $v$ has $k$ incoming links is

1963: $\frac{e^{-l}l^k}{k!}$, and the expectation of the distribution is $\ell$.

1964:

1965: After step (2) is completed by $v$, each chosen point $u$ responds to

1966: $v$'s request by choosing one of its existing links to be replaced by

1967: a link to $v$. The choice of the link to replace can vary. We use a

1968: strategy that

1969: builds on the work of Sarshar~\etal~\cite{SarsharR02}. In that work, the

1970: authors use ideas of Zhang~\etal~\cite{ZhangGG02} to build a graph where each

1971: node has a single long-distance link to a node at distance $d$ with probability

1972: $1/d$. When a node with a long-distance link at distance $d_1$ encounters a

1973: new node at distance $d_2$, either due to its arrival or due to a data request,

1974: it replaces its existing link with probability $p_2/(p_1+p_2)$, where

1975: $p_i=1/d_i$, and links to the new node. We extend this idea to our case of

1976: multiple long-distance links. Consider a node $u$ with $k$ neighbors at distances

1977: $d_1, d_2, \ldots, d_k$. When a new node $v$ at distance $d_{k+1}$

1978: requests an incoming link from $u$, $u$ replaces one of its existing links

1979: with a link to $v$ with probability $p_{k+1}/\sum_{j=1}^{k+1}p_j$. This is

1980: a trivial extension of the formula $p_2/(p_1+p_2)$ of \cite{SarsharR02}.

1981: However, this probability must now be distributed among $u$'s $k$ existing long-distance

1982: links since $u$ needs to choose one of them to redirect to $v$. We choose to

1983: do that according to the inverse power-law distribution with exponent 1, that is,

1984: $u$ chooses to replace its link to the node at distance $d_i$, $1\leq i \leq k$,

1985: with probability $(p_i/\sum_{j=1}^{k} p_j)$. Hence, the probability that $u$

1986: decides to link to $v$ and decides to replace its existing link to the node at

1987: distance $d_i$ with a link to $v$ is equal to $(p_i/\sum_{j=1}^{k} p_j) \cdot

1988: (p_{k+1}/\sum_{j=1}^{k+1}p_j)$. Notice that $u$ may decide not to redirect

1989: any of its existing links to $v$ with probability $1-p_{k+1}/\sum_{j=1}^{k+1}p_j$.

1990: The intuition for using such replacement strategy comes from the invariant that we

1991: want to maintain dynamically as new nodes arrive: $u$ has a link to a node

1992: $i$ at distance $d_i$ with probability inversely proportional to $d_i$; hence,

1993: conditioning on $u$ having $k$ long-distance links, the following equation must hold.

1994: \begin{eqnarray*}

1995: \prob{\mbox{$u$ replaces link to $i$ with link to $v$}} & = &

1996: \prob{\mbox{$u$ has a link to $i$ before $v$ arrives}} \\

1997: & - & \prob{\mbox{$u$ has a link to $i$ after $v$ arrives}} \\

1998: & = & \frac{p_i}{\sum_{j=1}^k p_j} - \frac{p_{i}}{\sum_{j=1}^{k+1} p_j} \\

1999: & = & \frac{p_i}{\sum_{j=1}^k p_j} \cdot \frac{p_{k+1}}{\sum_{j=1}^{k+1}p_j}. \\

2000: \end{eqnarray*}

2001: The same heuristic can be used for regeneration of links when a node crashes.

2002:

2003: To analyze the performance of the heuristic in practice, we used it to construct a

2004: network of $2^{14}$ nodes with $14$ links each, ten separate times. After

2005: averaging the results over the ten networks, we plotted the distribution of

2006: long-distance links derived from the heuristic, along with the ideal inverse

2007: power-law distribution with exponent 1, as shown in

2008: Figure~\ref{fig:DISTRIBUTION}. We see that the derived distribution tracks the

2009: ideal one very closely, with the largest absolute error being roughly equal to

2010: $0.022$ for links of length $2$, as shown in the graph of

2011: Figure~\ref{fig:ERROR}.

2012:

2013: We also performed experiments for an alternative link replacement strategy:

2014: a node chooses its {\em oldest} link to replace with a link to the new node.

2015: The performance of this strategy is almost as good as the performance

2016: of our replacement strategy described previously. We omit

2017: those results because it is difficult to distinguish between the results

2018: of the two strategies on the scale used for our graphs.

2019:

2020: There has also been other related work~\cite{PRU01} on how to construct,

2021: with the support of a central server, random graphs with many desirable

2022: properties, such as small diameter and guaranteed connectivity with

2023: high probability. Although it is not clear what kind of fault-tolerance

2024: properties this approach offers if the central server crashes, or how

2025: the constructed graph can be used for efficient routing, it is likely

2026: that similar techniques could be useful in our setting.

2027:

2028: \begin{figure}

2029: \centering

2030:   \mbox{\subfigure[The derived distribution.

2031:     \label{fig:DISTRIBUTION}]

2032:        {\epsfig{figure=dist.ps, width=0.45\textwidth}}

2033:        \subfigure[Absolute error.

2034:         \label{fig:ERROR}]

2035:        {\epsfig{figure=error.ps, width=0.45\textwidth}}}

2036: \caption{(a) The distribution of long-distance links produced by the

2037: inverse-distance

2038: heuristic (DERIVED) compared to the ideal inverse power-law distribution with

2039: exponent $1$ (IDEAL). (b) The absolute error between the derived

2040: distribution and the ideal inverse power-law distribution with exponent

2041: $1$.}

2042: \end{figure}

2043:

2044: \section{Experimental Results}

2045: \label{sec:EXPERIMENTS}

2046:

2047: We simulated a network of $n=2^{17}$ nodes at the application level. Each

2048: node is connected to its immediate neighbors and has $\lg n=17$ long-distance

2049: links chosen as per the inverse power law distribution with exponent $1$ as

2050: explained in Section~\ref{sec:UPPERBNDS}. Routing is done greedily by forwarding

2051: a message to the neighbor closest to its target node. In each simulation, the

2052: network is set up afresh, and a fraction $p$ of the nodes fail.

2053: We then repeatedly choose random source and destination nodes that have not

2054: failed and route a message between them. For each value of

2055: $p$, we ran $1000$ simulations, delivering $100$ messages

2056: in each simulation, and averaged the number of hops

2057: for successful searches and the number of failed searches.

2058:

2059: With node failures, a node may not be able to find a live

2060: neighbor that is closer to the target node than itself. We studied

2061: three possible strategies to overcome this problem as follows.

2062:

2063: \begin{enumerate}

2064:  \item Terminate the search.

2065:  \item Randomly choose another node, deliver the message to

2066:        this new node and then try to deliver the message from this

2067:        node to the original destination node (similar to

2068:        the hypercube routing strategy explained in~\cite{LV82}).

2069:  \item Keep track of a fixed number (in our simulations, $5$)

2070:        of nodes through which the message is last routed and backtrack.

2071:        When the search reaches a node from where it cannot proceed, it

2072:        backtracks to the most recently visited node from this list and

2073:        chooses the next best neighbor to route the message to.

2074: \end{enumerate}

2075:

2076: For all these strategies we note that once a node chooses its best neighbor,

2077: it does not send the message to any other link if it finds out that the best

2078: neighbor has failed.

2079:

2080: \begin{figure}

2081: \centering

2082:   \mbox{\subfigure[Fraction of failed searches.]

2083:        {\epsfig{figure=f.ps, width=0.45\textwidth}}\quad

2084:        \subfigure[Average delivery time for successful searches.]

2085:        {\epsfig{figure=o.ps, width=0.45\textwidth}}}

2086: \caption{(a) The fraction of messages that fail to be delivered

2087: as a function of the fraction of failed nodes. (b) The average delivery

2088: time for successful searches as a function of the fraction of

2089: failed nodes.}

2090: \label{fig:RESULTS}

2091: \end{figure}

2092:

2093: Figure~\ref{fig:RESULTS} shows the fraction of messages that

2094: fail to be delivered and the number of hops for successful

2095: searches versus the fraction of failed nodes. We see that the

2096: system behaves well even with a large number of failed nodes.

2097: In addition, backtracking

2098: gives a significant improvement in reducing the number of failures as

2099: compared to the other two methods, although it may take a longer time

2100: for delivery. We see that in the case of random rerouting,

2101: the average delivery time does not increase too much as the probability

2102: of node failure increases.

2103: This happens because quite a few of the searches fail, so the ones

2104: that succeed (with a few hops) lead to a small average delivery time.

2105:

2106: Our results may not be directly comparable to those of CAN\cite{SR01}

2107: and Chord\cite{CH01}, since they use different simulators for

2108: their experiments. However, to the extent that the results are comparable,

2109: our methods appear to perform as well as theirs.

2110: Even if we just terminate the search, we get less than $p$ fraction

2111: of failed searches with $p$ fraction of failed nodes. Chord\cite{CH01}

2112: has roughly the same performance {\em after} their network stabilizes

2113: using their repair mechanism. Further, with backtracking we see that with

2114: $80\%$ failed nodes, we still get less than $30\%$ failed searches.

2115: These results are very promising and it would be interesting to

2116: study backtracking analytically.

2117:

2118: We also compared the performance of the ideal network and that of the

2119: network constructed using the heuristics given in Section~\ref{sec:RANDOMGRAPHS}.

2120: We ran $10$ iterations of constructing a network of $16384$ nodes, both

2121: ideally as well as according to the heuristic, and delivered $1000$ messages

2122: between randomly chosen nodes.

2123: Figure~\ref{fig:COMPARE} shows the number of failed searches as the probability

2124: of node failure increases. We see that although the network constructed

2125: using the heuristic does not perform as well as the ideal network, the

2126: number of failed searches is comparable.

2127:

2128: \begin{center}

2129: \begin{figure}[ht]

2130:   \centerline{\mbox{\epsfig{figure=failed.ps, width=0.5\textwidth}}}

2131:   \caption{Fraction of failed searches.}

2132: \label{fig:COMPARE}

2133: \end{figure}

2134: \end{center}

2135:

2136: \section{Conclusions and Future Work}

2137: \label{sec:CONCLUSIONS}

2138:

2139: \begin{table}[ht]

2140: \begin{center}

2141: \begin{tabular}{|c|c|c|c|}

2142:

2143: \hline

2144: Model&

2145: Number of Links $\ell$

2146: &Upper Bound

2147: &Lower Bound\\

2148:

2149: \hline

2150: \multirow{3}*{No failures}

2151: &1\bigstrut

2152: &$O(\log^2 n)$\bigstrut

2153: &$\Omega(\frac{\log^2 n}{\log \log n})$\bigstrut\\

2154:

2155: &$[1, \lg n]$\bigstrut

2156: &$O(\frac{\log^2 n}{\ell})$\bigstrut

2157: &$\Omega(\frac{\log^2 n}{\ell \log \log n})$\bigstrut\\

2158:

2159: &$[\lg n, n^c]$\bigstrut

2160: &$O(\frac{\log n}{\log b})$\bigstrut

2161: &$\Omega(\frac{\log n}{\log \ell})$\bigstrut\\

2162:

2163: \hline

2164: \hline

2165:

2166: \multirow{2}*{Pr[Link present]=$p$}

2167: &$[1, \lg n]$\bigstrut

2168: &$O(\frac{\log^2 n}{p\ell})$\bigstrut

2169: &-\bigstrut\\

2170:

2171: &$[\lg n, n^c]$\bigstrut

2172: &$O(\frac{b\log n}{p})$\bigstrut

2173: &-\bigstrut\\

2174:

2175: \hline

2176: \hline

2177:

2178: \multirow{2}*{Pr[Node present]=$p$}

2179: &\multirow{2}*{$[1, \lg n]$}

2180: &\multirow{2}*{$O(\frac{\log^2 n}{p\ell})$}

2181: &\multirow{2}*{-}\\

2182:

2183: &&&\\

2184:

2185: \hline

2186: \end{tabular}

2187: \end{center}

2188: \caption{Summary of upper and lower bounds for routing.\protect\footnotemark}

2189: \label{table-results}

2190: \end{table}

2191: \footnotetext{In the upper bound with

2192: $(\lg n, n^c]$ links, the number of links

2193: $\ell=O(b\log_b n)$. Also, the deterministic strategy

2194: used for links $\ell \in (\lg n, n^c]$,

2195: with link failures is

2196: slightly different that the one with no failures,

2197: and $\ell=O(\log_b n)$.

2198: In the lower bound column, the bound for $[1,\lg n]$ links is for

2199: one-sided routing.}

2200:

2201: Table~\ref{table-results} summarizes our upper and lower bounds.

2202: We have shown that greedy routing in an overlay network organized as a

2203: random graph in a metric space can be a nearly optimal mechanism for

2204: searching in a peer-to-peer system, even in the presence of

2205: many faults.  We see this as an important first step in the design of

2206: efficient algorithms for such networks, but many issues still need to

2207: be addressed.  Our results mostly apply to one-dimensional metric

2208: spaces like the line or a circle.  One interesting possibility is

2209: whether similar strategies would work for higher-dimensional spaces,

2210: particularly ones in which some of the dimensions represent the actual

2211: physical distribution of the nodes in real space; good

2212: network-building and search mechanisms for this model might allow

2213: efficient location of nearby instances of a resource without having to

2214: resort to local flooding (as in~\cite{KKD01}).

2215: Another promising direction would be to study the security properties

2216: of greedy routing schemes to see how they can be adapted to provide

2217: desirable properties like anonymity or robustness against Byzantine

2218: failures.

2219:

2220:

2221: \section{Acknowledgments}

2222:

2223: The authors are grateful to Ben Reichardt for pointing out an error in

2224: an earlier version of Lemma~\ref{lemma-log-drop}.

2225:

2226: \bibliographystyle{abbrv}

2227: \bibliography{paper}

2228:

2229: \end{document}

2230: