cs0302022/arxiv.tex
1: \documentclass[11pt]{article}
2: 
3: \usepackage{epsfig}
4: \usepackage{multirow}
5: \usepackage{amsmath}
6: \usepackage{amssymb}
7: \usepackage{comment}
8: \usepackage{fullpage}
9: \usepackage{bigstrut}
10: \usepackage{subfigure}
11: \usepackage{color}
12: 
13: \setcounter{secnumdepth}{5}
14: \setcounter{tocdepth}{5} 
15: 
16: \newcommand{\flrtwok}[1]{2^{\lfloor \lg k\rfloor #1}}
17: \newcommand{\flrbk}[1]{b^{\lfloor \log_b k\rfloor #1}}
18: 
19: \newcommand{\alert}[1]{\typeout{ALERT: #1}\textbf{[[[ #1 ]]]}}
20: \newcommand{\buzz}[1]{\emph{#1}}
21: 
22: \newtheorem{theorem}{Theorem}
23: \newtheorem{lemma}[theorem]{Lemma}
24: \newtheorem{corollary}[theorem]{Corollary}
25: \newtheorem{definition}[theorem]{Definition}
26: \newtheorem{conjecture}[theorem]{Conjecture}
27: 
28: \renewcommand{\multirowsetup}{\centering}
29: 
30: \newcommand{\prob}[1]{{\rm Prob}[#1]}
31: 
32: \newcommand{\newloglike}[2]{\newcommand{#1}{\mathop{\rm #2}\nolimits}}
33: \newloglike{\E}{E}
34: \newloglike{\sgn}{sgn}
35: 
36: \newcommand{\calF}{{\cal F}}
37: 
38: \newcommand{\etal}[1]{{\it et al.\/}}
39: 
40: \newenvironment{proof}{\noindent\par{\bf Proof: }}{\nopagebreak\rule{1 ex}{0.8 em}\medskip} 
41: 
42: \newcommand{\ceil}[1]{\left\lceil{#1}\right\rceil}
43: \newcommand{\floor}[1]{\left\lfloor{#1}\right\rfloor}
44: 
45: \interfootnotelinepenalty=10000
46: 
47: 
48: 
49: \begin{document}
50: 
51: \title{Fault-tolerant Routing in Peer-to-peer Systems\footnote{This
52: is an extended version of the paper appearing in the proceedings of the
53: \emph{Twenty-First ACM Symposium on Principles of Distributed Computing}, 
54: 2002}}
55: 
56: \author{James Aspnes\thanks{
57: Department of Computer Science, Yale University,
58: New Haven, CT 06520-8285, USA.
59: Email: {\tt aspnes@cs.yale.edu}.
60: Supported by NSF grants CCR-9820888 and CCR-0098078.}
61: \and Zo\"{e} Diamadi\thanks{
62: Department of Computer Science, Yale University,
63: New Haven, CT 06520-8285, USA.
64: Email: {\tt diamadi@cs.yale.edu}.
65: Supported in part by ONR grant N00014-01-1-0795.}
66: \and Gauri Shah\thanks{
67: Department of Computer Science, Yale University,
68: New Haven, CT 06520-8285, USA.
69: Email: {\tt gauri.shah@yale.edu}.
70: Supported by NSF grants CCR-9820888 and CCR-0098078.}
71: }
72: 
73: \maketitle
74: 
75: \begin{abstract}
76: We consider the problem of designing an overlay network and routing
77: mechanism that permits finding resources efficiently in a peer-to-peer
78: system. We argue that many existing approaches to this problem can be
79: modeled as the construction of a random graph embedded in a metric 
80: space whose points represent resource identifiers, where the 
81: probability of a connection between two nodes depends only on the 
82: distance between them in the metric space.  We study the performance of 
83: a peer-to-peer system where nodes are embedded at grid points in a simple 
84: metric space: a one-dimensional real line. We prove upper and lower bounds 
85: on the message complexity of locating particular resources in such a system, 
86: under a variety of assumptions about failures of either nodes or the 
87: connections between them. Our lower bounds in particular show that the 
88: use of inverse power-law distributions in routing, as suggested by
89: Kleinberg~\cite{KL99}, is close to optimal. We also give efficient 
90: heuristics to dynamically maintain such a system as new nodes arrive and 
91: old nodes depart. Finally, we give experimental results that suggest 
92: promising directions for future work.
93: \end{abstract}
94: 
95: 
96: 
97: 
98: \section{Introduction}
99: \label{sec:INTRODCUTION}
100: 
101: Peer-to-peer systems are distributed systems without any central
102: authority and with varying computational power at each machine.
103: We study the problem of locating resources in such a large network
104: of heterogeneous machines that are subject to crash failures. We
105: describe how to construct distributed data structures that have
106: certain desirable properties and allow efficient resource location.
107: 
108: Decentralization is a critical feature of such a system
109: as any central server not only provides a vulnerable point of
110: failure but also wastes the power of the clients. Equally important
111: is scalability: the cost borne by each node must not depend too much
112: on the network size and should ideally be proportional, within
113: polylogarithmic factors, to the amount of data the node seeks or
114: provides. Since we expect nodes to arrive and depart at a high rate,
115: the system should be resilient to both link and node failures.
116: Furthermore, disruptions to parts of the data structure should
117: self-heal to provide self-stabilization.
118: 
119: Our approach provides a hash table-like functionality, based on 
120: keys that uniquely identify the system resources. To accomplish this, 
121: we map resources to points in a metric space either directly from 
122: their keys or from the keys' hash values. This mapping dictates an 
123: assignment of nodes to metric-space points. We construct and maintain 
124: a random graph linking these points and use greedy routing to 
125: traverse its edges to find data items. The principle we
126: rely on is that failures leave behind yet another (smaller) random 
127: graph, ensuring that the system is robust even in the face of 
128: considerable damage. Another compelling advantage of random graphs 
129: is that they eliminate the need for global coordination. Thus, we 
130: get a fully-distributed, egalitarian, scalable system with no 
131: bottlenecks.
132: 
133: We measure performance in terms of the number of messages sent by 
134: the system for a search or an insert operation. The self-repair 
135: mechanism may generate additional traffic, but we expect to amortize 
136: these costs over the search and insert operations.  Given the growing 
137: storage capacity of machines, we are less concerned with minimizing 
138: the storage at each node; but in any case
139: the space requirements are small. The 
140: information stored at a node consists only of a network address for
141: each neighbor.
142: 
143: The rest of the paper is organized as follows. 
144: Section~\ref{sec:APPROACH} explains our abstract model in detail,
145: and Section~\ref{sec:RELATED} describes some existing 
146: peer-to-peer systems. We prove our results for routing in 
147: Section~\ref{sec:ROUTING}. In Section~\ref{sec:RANDOMGRAPHS}, 
148: we present a heuristic method for constructing the random graph and
149: provide experimental results that show its performance in practice.
150: Section~\ref{sec:EXPERIMENTS} describes results of experiments
151: we performed to test the routing performance of our constructed 
152: distributed data structure. Conclusions and future work 
153: are discussed in Section~\ref{sec:CONCLUSIONS}.
154: 
155: 
156: \section{Our approach}
157: \label{sec:APPROACH}
158: 
159: The idea underlying our approach consists of three basic parts:
160: (1) embed resources as points in a metric space, (2) construct a
161: random graph by appropriately linking these points, and
162: (3) efficiently locate resources by routing greedily along the
163: edges of the graph. Let $R$ be a set of resources spread over a large,
164: heterogeneous network $N$. For each resource $r \in R$,
165: $owner(r)$ denotes the node in $N$ that provides $r$ and
166: $key(r)$ denotes the resource's key. Let $K$ be the set of all
167: possible keys.
168: We assume a hash function $h: K \rightarrow V$ such that
169: resource $r$ maps to the point
170: $v=h(key(r))$ in a metric space $(V,d)$, where $V$ is the point set
171: and $d$ is the distance metric as shown in Figure~\ref{mapping}.
172: The hash function is assumed to populate the metric space evenly.
173: Note that via this resource embedding, a node
174: $n$ is mapped onto the set $V_n=\{v \in V: \exists r \in R,
175: \: v=h(key(r)) \wedge (owner(r)=n)\}$, namely the set of
176: metric-space points assigned to the resources the node provides.
177: \begin{figure}
178:    \centerline{\epsfig{figure=net.eps, width=400pt}}
179:    \caption{An example of the metric-space embedding.}
180:    \label{mapping}
181: \end{figure}
182: 
183: Our next step is to carefully construct a directed random graph
184: from the points embedded in $V$.
185: We assume that each newly-arrived node $n$ is initially connected
186: to some other node in $N$.
187: Each node $n$ generates the outgoing links for each vertex $v
188: \in V_n$ independently.
189: A link $(v,u) \in V_n \times V_m$ simply denotes that $n$ knows that
190: $m$ is the network node that provides the resource mapped to
191: $u$; hence, we can view the graph as a virtual overlay network  
192: of information, pieces of which are stored locally at each node.
193: Node $n$ constructs each link by executing the search algorithm to locate
194: the resource that is mapped to the sink of that link. If the metric
195: space is not populated densely enough, the choice of a sink may
196: result in a vertex corresponding to an absent resource. In that
197: case, $n$ chooses the neighbor present closest to the original sink.
198: Moving to nearby vertices will introduce some bias in the link 
199: distribution, but the magnitude of error does not appear to be large.
200: A more detailed description of the graph construction
201: is given in Section~\ref{sec:RANDOMGRAPHS}.
202: 
203: Having constructed the overlay network of information, we 
204: can now use it for resource location. As new nodes arrive, 
205: old nodes depart, and existing ones alter the set of 
206: resources they provide or even crash, the resources available 
207: in the distributed database change. At any time $t$, let
208: $R^t \subseteq R$ be the set of available resources and $I^t$ be
209: the corresponding overlay network.  A request by node
210: $n$ to locate resource $r$ at time $t$ is served in a simple,
211: localized manner: $n$ calculates the metric-space point $v$ that   
212: corresponds to $r$, and a request message is then routed over  
213: $I^t$ from the vertex in $V_n$ that is closest to $v$ to $v$   
214: itself.\footnote{Note that since $R^t$ generally changes with 
215: time, and may specifically change while the request is being 
216: served, the request message may be routed over a series of 
217: different overlay networks $I^{t_1},\:I^{t_2},\: \ldots,\:
218: I^{t_k}$.}
219: Each node needs only local information, namely its set
220: of neighbors in $I^t$, to participate in the resource location.
221: Routing is done greedily by forwarding the message to the node mapped
222: to a metric-space point as close to $v$ as possible. The problem of
223: resource location is thus translated into routing on random graphs
224: embedded in a metric space.
225: 
226: To a first approximation, our approach is similar to the 
227: ``small-world" routing work by Kleinberg~\cite{KL99}, in which points 
228: in a two-dimensional grid are connected by links drawn from a normalized
229: power-law distribution (with exponent 2), and routing is done by having 
230: each node route a packet to its neighbor closest to the packet's 
231: destination.
232: Kleinberg's approach is somewhat brittle because it assumes a 
233: constant number of links leaving each node. Getting good 
234: performance using his technique
235: depends both on having a complete two-dimensional
236: grid of nodes and on very carefully adjusting the exponent of
237: the random link distribution. We are not as interested in keeping
238: the degree down and accept a larger degree to get more 
239: robustness. We also cannot assume a complete grid: since 
240: fault-tolerance is one of our main objectives, and since nodes
241: are mapped to points in the metric space based on what resources
242: they provide, there may be missing nodes.
243: 
244: The use of random graphs is partly motivated by a desire to keep 
245: the data structure scalable and the routing algorithm as 
246: decentralized as possible, as random graphs can be constructed 
247: locally without global coordination. Another important reason 
248: is that random graphs are by nature robust against node failures: 
249: a node-induced subgraph of a random graph is generally still a 
250: random graph; therefore, the disappearance of a vertex, along with 
251: all its incident links (due to failure of one of the machines 
252: implementing the data structure) will still allow routing
253: while the repair mechanism is trying to heal the damage. 
254: The repair mechanism also benefits from the use of random graphs, 
255: since most random structures require less work to maintain their 
256: much weaker invariants compared to more organized data structures.
257: 
258: Embedding the graph in a metric space has the very important
259: property that the only information needed to locate a resource
260: is the location of its corresponding metric-space point. That
261: location is permanent, both in the sense of being unaffected by
262: disruption of the data structure, and easily computable by any
263: node that seeks the resource. So, while the pattern of links
264: between nodes may be damaged or destroyed by failure of nodes or
265: of the underlying communication network, the metric space forms
266: an invulnerable foundation over which to build the ephemeral
267: parts of the data structure.
268: 
269: \section{Current peer-to-peer systems}
270: \label{sec:RELATED}
271: 
272: Most of the peer-to-peer systems in widespread use are
273: not scalable. Napster~\cite{NP} has a central server that services
274: requests for shared resources even though the actual
275: resource transfer takes place between the peer requesting
276: the resource and the peer providing it, without involving
277: the central authority. However, this has several
278: disadvantages including a vulnerable single point of
279: failure, wasted computational power of the clients as
280: well as not being scalable. Gnutella~\cite{GN} 
281: floods the network to locate a resource.  Flooding creates a
282: trade-off between overloading every node in the network
283: for each request and cutting off searches before completion.
284: While the use of super-peers \cite{MOR} ameliorates the problem
285: somewhat in practice, it does not improve performance in the limit.
286: 
287: Some of these first-generation systems
288: have inspired the development of more sophisticated ones
289: like CAN~\cite{SR01}, Chord~\cite{CH01} and Tapestry~\cite{TP01}.
290: CAN partitions a $d$-dimensional metric space into {\em zones}.
291: Each key is mapped to a point in some zone and stored at the node
292: that owns the zone.
293: Each node stores $O(d)$ information, and resource location,
294: done by greedy routing, takes $O(dn^{1/d})$ time. Chord maps nodes
295: to identities of $m$ bits placed around a {\em modulo
296: $2^m$ identifier circle}. Resources are stored at existing
297: {\em successor} nodes of the nodes they are mapped to. Each node
298: stores a routing table with $m$ entries such that the $i$-th entry 
299: stores the key of the first node succeeding it by at least
300: $2^{i-1}$ on the identifier circle. Each resource is also 
301: mapped onto the identifier circle and stored at the first
302: node succeeding the location that it maps to. Routing is 
303: done greedily to the farthest possible node in the 
304: routing table, and it is not hard to see that this gives an
305: $O(\log n)$ delivery time with $n$ nodes in the system.
306: Tapestry uses Plaxton's algorithm~\cite{PL97}, a form of
307: suffix-based, hypercube routing, as the routing mechanism:
308: in this algorithm, the message is forwarded deterministically 
309: to a node whose identifier is one digit closer to the 
310: target identifier. To this end, each node maintains $O(\log n)$ 
311: pieces of information and delivery time is also $O(\log n)$.
312: 
313: Although these systems seem vastly different, there is a recurrent
314: underlying theme in the use of some variant of an overlay
315: metric space in which the nodes are embedded. The location
316: of a resource in this metric space is determined by its key.
317: Each node maintains some information about its neighbors
318: in the metric space, and routing is then simply done by
319: forwarding packets to neighbors {\em closer} to the target
320: node with respect to the metric.
321: In CAN, the metric space is explicitly defined 
322: as the coordinate space which is covered by the zones and 
323: the distance metric used is simply the Euclidean distance.
324: In Chord, the nodes can be thought of being 
325: embedded on grid points on a real circle, with distances measured 
326: along the circumference of the circle providing the required 
327: distance metric. In Tapestry, we can think of the nodes being
328: embedded on a real line and the identifiers are simply the
329: locations of the nodes on the real line. Euclidean distance
330: is used as the metric distance for greedy forwarding
331: to nodes with identifiers closest to the target node.
332: This inherent common structure leads to similar results
333: for the performance of such networks. In this paper, we 
334: explain why most of these systems achieve similar
335: performance guarantees by 
336: describing a general setting for such overlay metric spaces,
337: although most of our results apply only in one-dimensional
338: spaces.
339: 
340: In general, the fault-tolerance properties of these systems are
341: not well-defined. Each system provides a repair mechanism for
342: failures but makes no performance guarantees till this mechanism
343: kicks in.  For large systems, where nodes appear and
344: leave frequently, resilience to repeated and concurrent failures
345: is a desirable and important property. Our experiments show that with
346: our overlay space and linking strategies, the system performs
347: reasonably well even with a large number of failures.
348: 
349: 
350: 
351: \section{Routing}
352: \label{sec:ROUTING}
353: 
354: In this section, we present our lower and upper bounds on routing.
355: We consider greedy routing in a graph embedded in a line where  
356: each node is connected to its immediate neighbors and to multiple
357: long-distance neighbors chosen according to a fixed link distribution. 
358: We give lower bounds for greedy routing for \buzz{any} link 
359: distribution satisfying certain properties 
360: (Theorem~\ref{theorem-lower-bound}). We also present upper bounds 
361: in the same model where the long-distance links are chosen as per 
362: the inverse power-law distribution with exponent $1$ and analyze
363: the effects on performance in the presence of failures. 
364: 
365: \subsection{Tools}
366: 
367: Some of our upper bounds will be proved using 
368: a well-known upper bound of Karp~\etal~\cite{KarpUW1988} 
369: on probabilistic recurrence relations.  
370: We will restate this bound as
371: Lemma~\ref{lemma-probabilistic-recurrence-ub}, and then show how
372: a similar technique can be used to get \emph{lower bounds} with some
373: additional conditions in Theorem~\ref{theorem-mean-lower-bound}.
374: 
375: \begin{lemma}[\cite{KarpUW1988}]
376: \label{lemma-probabilistic-recurrence-ub}
377: The time $T(X_0)$ 
378: needed for a nonincreasing real-valued Markov chain
379: $X_0, X_1, X_2, X_3\ldots$ to drop to $1$ is bounded by
380: \begin{equation}
381: \label{eq-karp}
382: T(X_0) \le \int_{1}^{X_0} \frac{1}{\mu_z} dz,
383: \end{equation}
384: when $\mu_z = \E[X_t - X_{t+1} : X_t = z]$ is a nondecreasing
385: function of $z$.
386: \end{lemma}
387: 
388: This bound has a nice physical interpretation.  If it takes one
389: second to jump down $\mu_x$ meters from $x$, then we are traveling at a
390: rate of $\mu_x$ meters per second during that interval.  When we zip
391: past some position $z$, we are traveling at the average speed $\mu_x$
392: determined by our starting point $x \ge z$ for the interval.  Since
393: $\mu$ is nondecreasing, using $\mu_z$ as our estimated speed
394: underestimates our actual speed when passing $z$.  The integral
395: computes the time to get all the way to zero if we use
396: $\mu_z$ as our instantaneous speed when passing position $z$.  Since
397: our estimate of our speed is low (on average), our estimate of our time
398: will be high, giving an upper bound on the actual expected time.
399: 
400: We would like to get lower bounds on such processes in
401: addition to upper bounds, and we will not necessarily be able to
402: guarantee that $\mu_z$, as defined in
403: Lemma~\ref{lemma-probabilistic-recurrence-ub}, will be a
404: nondecreasing function of $z$.  But we will still use the same basic
405: intuition: The average speed at which we pass $z$ is at most the
406: maximum average speed of any jump that takes us past $z$.  We can find
407: this maximum speed by taking the maximum over all $x > z$;
408: unfortunately, this may give us too large an estimate.  Instead, we
409: choose a threshold $U$ for ``short'' jumps,
410: compute the maximum speed of short jumps of at most $U$ for 
411: all $x$ between $z$ and $z+U$, and
412: handle the (hopefully rare) long jumps of more than $U$ by
413: conditioning against them.  Subject to this conditioning, we can
414: define an upper bound $m_z$ on the average speed passing $z$, and
415: use essentially the same integral as in (\ref{eq-karp}) to get a lower
416: bound on the time.  Some additional tinkering to account for the
417: effect of the conditioning then gives us our real lower bound,
418: which appears in Theorem~\ref{theorem-mean-lower-bound} below, as 
419: Inequality (\ref{eq-mean-lower-bound}).
420: 
421: \newcommand{\dft}{f(X_t) - f(X_{t+1})}
422: \newcommand{\dyt}{Y_t - Y_{t+1}}
423: \newcommand{\dzt}{Z_t - Z_{t+1}}
424: \newcommand{\ftat}{\calF_t, A_t}
425: \newcommand{\ef}[1]{\E\left[{#1}:\ftat\right]}
426: \newcommand{\muf}{\mu_{f(X_t)}}
427: \newcommand{\ydenom}{\epsilon Y_0 + (1-\epsilon)}
428: \newcommand{\yydenom}{\left(\epsilon Y_0 + (1-\epsilon)\right)}
429: 
430: \begin{theorem}
431: \label{theorem-mean-lower-bound}
432: Let $X_0, X_1, X_2, \ldots$ be 
433: Markov process with state space $S$, where
434: $X_0$ is a constant.
435: Let $f$ be a non-negative real-valued function on $S$
436: such that, for all $t$,
437: \begin{equation}
438: \label{eq-nonincreasing}
439: \Pr[\dft \ge 0 : X_t] = 1.
440: \end{equation}
441: Let $U$ and $\epsilon$ be constants such that for any $x > 0$,
442: \begin{equation}
443: \label{eq-mean-lower-bound-U-epsilon}
444: \Pr[\dft \ge U : X_t = x] \le \epsilon.
445: \end{equation}
446: Let 
447: \begin{equation}
448: \label{eq-mean-lower-bound-tau}
449: \tau = \min \{ t: f(X_t) = 0 \}.
450: \end{equation}
451: For each $x$ with $f(x) > 0$, let $\mu_x > 0$ satisfy
452: \begin{equation}
453: \label{eq-mean-lower-bound-mu}
454: \mu_x \ge \E[\dft : X_t = x, \dft < U].
455: \end{equation}
456: Now define
457: \begin{equation}
458: \label{eq-mean-lower-bound-m}
459: m_z = \sup \left\{ \mu_x: x\in S, f(x) \in [z, z+U) \right\},
460: \end{equation}
461: and define
462: \begin{equation}
463: \label{eq-mean-lower-bound-T}
464: T(x) = \int_{0}^{f(x)} \frac{1}{m_z} dz.
465: \end{equation}
466: Then
467: \begin{equation}
468: \label{eq-mean-lower-bound}
469: \E[\tau] \ge \frac{T(X_0)}{\epsilon T(X_0) + (1-\epsilon)}.
470: \end{equation}
471: \end{theorem}
472: \begin{proof}
473: Define
474: \begin{equation}
475: \label{eq-mean-lower-bound-y}
476: Y_t = \left\{
477: \begin{array}{cl}
478: T(X_t) & \mbox{, if $f(X_{t'}) - f(X_{t'+1}) < U$ for all $t' < t$,} \\
479: 0      & \mbox{, otherwise.}
480: \end{array}
481: \right.
482: \end{equation}
483: The idea is that $Y_t$ drops to zero immediately if a long jump
484: occurs.  We will show that even with such overeager jumping, $Y_t$
485: does not drop too quickly on average.  The intuition is that the chance of a 
486: long jump reduces
487: $Y_t$ by at most an expected $\epsilon Y_t \le \epsilon Y_0$, while
488: the effect of short jumps can be bounded by applying the definition of
489: $T$.
490: 
491: Let $\calF_t$ be the $\sigma$-algebra
492: generated by $X_0, X_1, \ldots X_t$.  Let $A_t$ be the event that
493: $\dft < U$, that is, that the jump from $f(X_t)$ to $f(X_{t+1})$ is a
494: short jump.
495: Now compute
496: \begin{eqnarray}
497: E\left[\dyt:\calF_t\right]
498: &=&
499: \Pr\left[\,\overline{A_t}:\calF_t\right] (Y_t - 0)
500: + (1 - \Pr\left[\,\overline{A_t}:\calF_t\right]) \ef{\dyt}\nonumber
501: \\
502: &\le&
503: \Pr\left[\,\overline{A_t}:\calF_t\right] Y_0
504: + (\epsilon - \Pr\left[\,\overline{A_t}:\calF_t\right]) Y_0
505: + (1-\epsilon) \ef{\dyt}\nonumber
506: \\
507: &=&
508: \epsilon Y_0 + (1-\epsilon) \ef{\dyt}.\label{eq-mean-lower-bound-dyt}
509: \end{eqnarray}
510: 
511: Now let us bound $\ef{\dyt}$.  Expanding the definitions
512: (\ref{eq-mean-lower-bound-T}) and (\ref{eq-mean-lower-bound-y})
513: gives
514: \begin{equation}
515: \label{eq-mean-lower-bound-integral-expansion}
516: \ef{\dyt}
517: =
518: \ef{\int_{f(X_{t+1})}^{f(X_t)} \frac{1}{m_z} dz}.
519: \end{equation}
520: 
521: Now, conditioning on $A_t$ means that 
522: $f_(X_{t+1}) > f(X_t) - U$
523: and thus
524: $z > f(X_t) - U$ for the entire range of the integral. 
525: It follows that $f(X_t)$ lies in the half-open interval $[z,z+U)$ for
526: each such $z$, from which we have $m_z \ge \muf$
527: from (\ref{eq-mean-lower-bound-m}).
528: Inverting gives $\frac{1}{m_z} \le \frac{1}{\muf}$,
529: and plugging this inequality into
530: (\ref{eq-mean-lower-bound-integral-expansion}) gives
531: \begin{eqnarray}
532: \ef{\dyt}
533: &\le&
534: \ef{\int_{f(X_{t+1})}^{f(X_t)} \frac{1}{\muf} dz}
535: \nonumber
536: \\
537: &=&
538: \frac{1}{\muf} \ef{\dft}
539: \nonumber
540: \\
541: &\le&
542: \frac{1}{\muf} \muf
543: \nonumber
544: \\
545: &=&
546: 1.
547: \label{eq-mean-lower-bound-dytat}
548: \end{eqnarray}
549: 
550: Applying (\ref{eq-mean-lower-bound-dytat})
551: to
552: (\ref{eq-mean-lower-bound-dyt}) gives
553: \begin{equation}
554: \label{eq-mean-lower-bound-dyt-final}
555: \E[\dyt : \calF_t ]
556: \le
557: \ydenom.
558: \end{equation}
559: 
560: We have now shown that $Y_t$ drops slowly on average.  To turn this
561: into a lower bound on the time at which it first reaches zero, define
562: $Z_t = Y_t + \min(t, \tau) \yydenom$.
563: Conditioning on $t < \tau$, observe that
564: \begin{eqnarray*}
565: \E[\dzt:\calF_t, t < \tau]
566: &=&
567: \E[\dyt:\calF_t, t < \tau] - \yydenom
568: \\
569: &\le&
570: \yydenom - \yydenom 
571: \\
572: &=&
573: 0.
574: \end{eqnarray*}
575: 
576: Alternatively, if $t \ge \tau$ we have
577: \begin{displaymath}
578: \E[\dzt:\calF_t, t \ge \tau] = 0.
579: \end{displaymath}
580: 
581: In either case, $\E[\dzt:\calF_t] \le 0$, implying
582: $Z_t \le \E[Z_{t+1}:\calF_t]$.
583: In other words, $\{Z_t, \calF_t\}$ is a submartingale.
584: 
585: Because $\{Z_t, \calF_t\}$ is a submartingale, and $\tau$ is a
586: stopping time relative to $\{\calF_t\}$, we have
587: $Z_0 = Y_0 
588: \le \E[Z_\tau] 
589: = \E\left[ 0 + \tau \yydenom \right]
590: = \yydenom \E[\tau]$.
591: Solving for $\E[\tau]$ then gives
592: \begin{displaymath}
593: \E[\tau] \ge \frac{Y_0}{\ydenom} 
594: = \frac{T(X_0)}{\epsilon T(X_0) + (1-\epsilon)}.
595: \end{displaymath}
596: \end{proof}
597: 
598: 
599: \subsection{Lower bounds on greedy routing}
600: 
601: We will now show a lower bound on the expected time taken by greedy
602: routing on a random graph embedded in a line. Each node in the
603: graph has expected outdegree at most $\ell$ and is connected to its 
604: immediate neighbor on either side. For polylogarithmic values of $\ell$,
605: we consider two variants of the greedy routing algorithm and derive lower 
606: bounds for them equal to $\Omega(\log^2 n / (\ell^2 \log \log n))$ and to
607: $\Omega(\log^2 n / (\ell \log \log n))$, as stated in 
608: Theorem~\ref{theorem-lower-bound}.
609: The routing variants, along with the machinery and proofs of the
610: associated lower bounds, are presented in Sections~\ref{Section-lower-bound} 
611: through \ref{Section-putting-the-pieces-together}. For large values
612: of $\ell$, a lower bound of $\Omega(\frac{\lg n}{\lg \ell})$
613: on the worst-case routing time can be 
614: derived very simply, as follows.
615: 
616: \begin{theorem}
617: \label{theorem-tree-lower-bound}
618: Let $\ell \in (\lg n, n^c]$. Then for any link distribution and any
619: routing strategy, the
620: delivery time $T = \Omega(\frac{\log n}{\log \ell})$.
621: \end{theorem}
622: \begin{proof}
623: With $\ell$ links for each node, we can reach at 
624: most $\ell^k$ nodes at step $k$. Assuming that the minimum time to 
625: reach all $n$ nodes is T, $\ell^T = n$. This gives a lower bound of 
626: $\Omega(\frac{\log n}{\log \ell})$ on $T$.
627: \end{proof}
628: 
629: 
630: 
631: \subsubsection{Lower bound for polylogarithmic number of links}
632: \label{Section-lower-bound}
633: 
634: We consider the case of the expected outdegree of each node falling in
635: the range $[1,\lg n]$. The probability that a node at
636: position $x$ is connected to positions $x-\Delta_1, x-\Delta_2,
637: \ldots, x-\Delta_k$ depends only on the set $\Delta=\{\Delta_1, \ldots,
638: \Delta_k\}$ and not on $x$ and is independent of the choice of outgoing
639: links for other nodes.\footnote{We assume that nodes are labeled by
640: integers and identify each node with its label to avoid excessive
641: notation.} Since we assume that each node is connected to its immediate
642: neighbors, we require that $\pm 1$ appears in $\Delta$. 
643: 
644: We consider two variants of the greedy routing algorithm. Without
645: loss of generality, we assume that the target of the search is labeled
646: $0$. In \buzz{one-sided greedy routing}, the algorithm never traverses a 
647: link that would take it past its target.  So if the algorithm is currently
648: at $x$ and is trying to reach $0$, it will move to the node $x-\Delta_i$ 
649: with the smallest non-negative label.  In \buzz{two-sided greedy routing}, 
650: the algorithm chooses a link that minimizes the distance to the target, 
651: without regard to which side of the target the other end of the link is.  
652: In the two-sided case the algorithm will move to a node $x-\Delta_i$ 
653: whose label has the smallest absolute value, with ties broken 
654: arbitrarily. One-sided greedy routing can be thought of as modeling 
655: algorithms on a graph with a boundary when the target lies on the 
656: boundary, or algorithms where all links point in only one direction 
657: (as in Chord).
658: 
659: Our results are stronger for the one-sided case than for the two-sided
660: case.  With one-sided greedy routing, we show a lower bound of
661: $\Omega(\log^2 n / (\ell \log \log n))$ on the time to reach $0$ from a 
662: point chosen uniformly from the range $1$ to $n$ that applies to any link
663: distribution.  For two-sided routing, we show a lower bound of
664: $\Omega(\log^2 n / (\ell^2 \log \log n))$, with some constraints on the 
665: distribution.  We conjecture that these constraints are unnecessary, and 
666: that $\Omega(\log^2 n / (\ell \log \log n))$ is the correct lower bound 
667: for both models. A formal statement of these results appears as 
668: Theorem~\ref{theorem-lower-bound} in 
669: Section~\ref{Section-putting-the-pieces-together}, but before we can 
670: prove it we must develop machinery that will be useful in the
671: proofs of both the one-sided and two-sided lower bounds.
672: 
673: \subsubsection{Link sets: notation and distributions}
674: 
675: First we describe some notation for $\Delta$ sets.
676: Write each $\Delta$ as 
677: \[\{\Delta_{-s}, \ldots \Delta_{-2}, \Delta_{-1} = -1, 
678:  \Delta_{1} = 1, \Delta_{2}, \ldots  \Delta_{t}\},\]
679: where $\Delta_{i} < \Delta_{j}$ whenever $i < j$.
680: Each $\Delta$ is a random variable drawn from some distribution on
681: finite sets; the individual $\Delta_i$ are thus in general \emph{not}
682: independent.
683: Let $\Delta^-$ consist of the $s$ negative elements of $\Delta$
684: and $\Delta^+$ consist of the $t$ positive elements.
685: Formally define $\Delta_{-i} = -\infty$ when $i > s$ 
686: and $\Delta_{i} = +\infty$ when $i > t$.
687: 
688: For one-sided routing, we make no assumptions about the distribution
689: of $\Delta$ except that $|\Delta|$ must have finite expectation and
690: $\Delta$ always contains $1$.  For two-sided routing, we assume that
691: $\Delta$ is generated by including each possible $\delta$ in $\Delta$
692: with probability $p_\delta$, where $p$ is symmetric about the origin
693: (i.e., $p_\delta = p_{-\delta}$ for all $\delta$),
694: $p_1 = p_{-1} = 1$, and $p$ is
695: unimodal, i.e. nonincreasing for positive $\delta$ and nondecreasing
696: for negative $\delta$.\footnote{These constraints imply
697: that $p_0 = 1$;
698: formally, we imagine that $0$ is present in each $\Delta$ but is
699: ignored by the routing algorithm.}  We also require that the events
700: $[\delta \in \Delta]$ 
701: and
702: $[\delta' \in \Delta]$
703: are pairwise independent for distinct $\delta,\delta'$.
704: 
705: \subsubsection{The aggregate chain $S^t$}
706: 
707: For a fixed distribution on $\Delta$, the trajectory
708: of a single initial point $X^0$ is a Markov chain $X^0, X^1, X^2, \ldots$,
709: with $X^{t+1} = s(X^t, \Delta^t)$,
710: where $\Delta^t$ determines the outgoing links from the node reached
711: at time $t$ and $s$ is a \buzz{successor function} that selects the next node
712: $X^{t+1} = X^t - \Delta^t_i$
713: according to the routing algorithm.
714: Note that the chain is Markov, because the presence of $\pm 1$ links
715: guarantees that no node ever appears twice in the sequence, and so
716: each new node corresponds to a new choice of links.
717: 
718: \newcommand{\Di}{{\Delta i}}
719: \newcommand{\Dis}{{\Delta i \sigma}}
720: 
721: From the $X^t$ chain we can derive an \buzz{aggregate chain}
722: that describes the
723: collective behavior of all nodes in some
724: range.  
725: Each state of the aggregate chain is a contiguous sets of nodes whose
726: labels all have the same sign;
727: we define the sign of the state to be the common sign of all of its
728: elements.
729: For one-sided routing each state is either $\{0\}$ or an interval
730: of the form $\{1\ldots k\}$ for some $k$.  For two-sided routing the
731: states are more general
732: The aggregate states are characterized formally in
733: Lemma~\ref{lemma-aggregate-ranges}.
734: 
735: Given a contiguous set of nodes $S$ and a set $\Delta$,
736: define
737: \begin{displaymath}
738: S_{\Di} = \{ x \in S : s(x, \Delta) = x - \Delta_i \}.
739: \end{displaymath}
740: The intuition is that $S_{\Di}$ consists of all those nodes for which
741: the algorithm will choose $\Delta_i$ as the outgoing link.
742: Note that $S_{\Di}$ will always be a contiguous range because of the
743: greediness of the algorithm.
744: Now define, for each $\sigma \in \{-, 0, +\}$:
745: \begin{displaymath}
746: S_{\Dis} = \{ x \in S_{\Di} : \sgn s(x, \Delta) = \sigma \}.
747: \end{displaymath}
748: Here we have simply split $S_{\Di}$ into those nodes with
749: negative, zero, or positive successors.
750: 
751: For any set $A$ and integer $\delta$ write $A-\delta$
752: for $\{x-\delta : x \in A\}$.
753: 
754: We will now build our aggregate chain by letting 
755: the successors of a range $S$ be the ranges $S_{\Dis}-\Delta_i$ 
756: for all possible
757: $\Delta$, $i$, and $\sigma$.  
758: As a special case, we define $S^{t+1} = \{0\}$ when $S^{t} = \{0\}$;
759: once we arrive at the target, we do not leave it.
760: For all other $S^t$, we let
761: \begin{equation}
762: \label{eq-stdis-prob}
763: \Pr\left[S^{t+1} = S^t_{\Dis} - \Delta_i : \Delta\right] 
764: = \frac{|S^t_{\Dis}|}{|S^t|},
765: \end{equation}
766: and define the unconditional transition probabilities by averaging
767: over all $\Delta$.
768: 
769: Lemma~\ref{lemma-aggregate-chain-works} shows that moving to the
770: aggregate chain does not misrepresent the underlying single-point
771: chain:
772: 
773: \begin{lemma}
774: \label{lemma-aggregate-chain-works}
775: Let $X^0$ be drawn uniformly from the range $S^0$.  Let $Y^t$ be a
776: uniformly chosen element of $S^t$.  Then for all $x$ and $t$,
777: $\Pr[X^t = x] = \Pr[Y^t = x]$.
778: \end{lemma}
779: \begin{proof}
780: Clearly the lemma holds for $t=0$.
781: Fix $S^{t-1}$, and consider two methods for generating $Y^{t}$.  
782: The first generates $Y^t$ directly from $Y^{t-1}$ and
783: shows that $Y^t$ generated in this way has the same distribution as
784: $X^t$.
785: The second generates $Y^t$ from $S^t$ as describe in the lemma
786: and produces the same
787: distribution on $Y^t$ as the first.
788: 
789: In the first method, 
790: we choose $Y^{t-1}$ uniformly from $S^{t-1}$, choose a
791: random $\Delta^{t-1}$, and compute $s(Y^{t-1}, \Delta^{t-1}$.
792: Here the transition rule applied to $Y^{t-1}$ is the same as for
793: $X^{t-1}$, so under the induction hypothesis that $Y^{t-1}$ and
794: $X^{t-1}$ are equal in distribution, so are $Y^t$ and $X^t$.
795: 
796: In the second method, we again choose a random $\Delta^{t-1}$ 
797: and then choose $S^{t}$ by choosing some $S^{t-1}_{\Dis}$ in proportion
798: to its size, let $S^{t} = S^{t-1}_\Dis - \Delta_i$, and then let $Y^t$
799: be a uniformly chosen element of $S^t$.
800: We can implement the choice of $S^{t-1}_\Dis$ by choosing some $Y^{t-1}$
801: uniformly from $S^{t-1}$ and picking $S^{t-1}_\Dis$ as the subrange
802: that contains $Y^{t-1}$; and we can simplify the task of choosing
803: $Y^{t}$ by setting it equal to $Y^{t-1} - \Delta_i$, since
804: conditioning on $Y^{t-1} \in S^{t-1}_\Dis$ leaves $Y^{t-1}$ with a
805: uniform distribution.  But by implementing the second method in this
806: way, we have reduced it to the first, and the lemma is proved.
807: \end{proof}
808: 
809: Lemma~\ref{lemma-aggregate-ranges} justifies our earlier
810: characterization of the aggregate state spaces:
811: 
812: \begin{lemma}
813: \label{lemma-aggregate-ranges}
814: Let $S^0 = \{ 1 \ldots n \}$ for some $n$.
815: Then with one-sided routing,
816: every $S^t$ is either $\{0\}$ or of the form $\{1\ldots k\}$ for some
817: $k$; 
818: and with two-sided routing,
819: every $S^t$ is an interval of integers in which every element has the
820: same sign.
821: \end{lemma}
822: \begin{proof}
823: By induction on $t$.  For one-sided routing, observe that
824: $S^{t-1}_{\Di -}$ is always empty, as the routing algorithm is not
825: allowed to jump to negative nodes.  If $S^t = S^{t-1}_{\Di 0} -
826: \Delta_i$, then 
827: $S^t = \{\Delta_i\} - \Delta_i = \{0\}$.
828: Otherwise $S^t = S^{t-1}_{\Di +} - \Delta_i$; but since 
829: $S^{t-1} = \{1 \ldots k\}$ for some $k$,
830: if it contains any point $x$ greater than $\Delta_i$ it must contain
831: $\Delta_i + 1$; thus $\min(S^{t-1}_{\Di +} = \Delta_i + 1$
832: and so $\min(S^t)$ becomes $1$.
833: 
834: The result for the two-sided case is immediate from the fact that
835: $S^{t} = S^{t-1}_\Dis - \Delta_i$
836: combined with the definition of $S^{t-1}_\Dis$.
837: \end{proof}
838: 
839: The advantage of the aggregate chain over the single-point chain is
840: that, while we cannot do much to bound the progress of a single point
841: with an arbitrary distribution on $\Delta$, we can show that the size
842: of $S^t$ does not drop too quickly given a bound $\ell$ on
843: $\E[|\Delta|]$.
844: The intuition is that each successor
845: set of size $a^{-1} |S^t|$ or less occurs
846: with probability at most $a^{-1}$, and there are at most $3\ell$ such
847: sets on average.
848: 
849: \newcommand{\Prsta}{\Pr\left[|S^{t+1}| \le a^{-1} |S^t| : S^t\right]}
850: \begin{lemma}
851: \label{lemma-aggregate-max-drop}
852: Let $\E[|\Delta|] \le \ell$.  Then for any $a \ge 1$,
853: in either the one-sided or two-sided model,
854: \begin{equation}
855: \label{eq-aggregate-max-drop}
856: \Prsta \le 3\ell a^{-1}.
857: \end{equation}
858: \end{lemma}
859: \begin{proof}
860: \begin{sloppypar}
861: Fix $S^t$.
862: First note that if $a^{-1} |S^t| < 1$, then $\Prsta = 0$.
863: So we can assume that $a^{-1} |S^t| \ge 1$ and in particular that
864: $a \le |S^t|$.
865: \end{sloppypar}
866: 
867: Conditioning on $\Delta$, there are at most $3|\Delta|$ non-empty sets
868: $S^t_{\Dis}$.  
869: If $|S^t_\Dis| \le a^{-1} |S^t|$, then $|S^t_\Dis|$ is chosen with
870: probability at most $a^{-1}$ by (\ref{eq-stdis-prob}).
871: Thus the probability of choosing any of the at most $3|\Delta|$ sets
872: $S^t_\Dis$ of size at most $a^{-1}|S^t|$ is at most $3|\Delta|a^{-1}$.
873: 
874: Now observe that
875: \begin{eqnarray*}
876: \Prsta &\le&
877:     \sum_d
878:         \Pr\left[ |\Delta| = d \right] 3da^{-1} \\
879:     &=& 3a^{-1} \E\left[|\Delta|\right] \\
880:     &\le& 3 \ell a^{-1}.
881: \end{eqnarray*}
882: \end{proof}
883: 
884: \begin{sloppypar}
885: Another way to write (\ref{eq-aggregate-max-drop}) is to say that
886: $\Pr\left[ \ln |S^t| - \ln |S^{t+1}| \ge \ln a : S^t \right] \le 3 \ell
887: a^{-1}$, which will give the bound 
888: (\ref{eq-mean-lower-bound-U-epsilon}) on the probability of large
889: jumps when it comes time to apply
890: Theorem~\ref{theorem-mean-lower-bound}.
891: \end{sloppypar}
892: 
893: \subsubsection{Boundary points}
894: \label{section-boundary-points}
895: 
896: Lemma~\ref{lemma-aggregate-max-drop} says that $|S^t|$ seldom drops by
897: too large a ratio at once, but it doesn't tell us much about how
898: quickly $|S^t|$ drops in short hops.  To bound this latter quantity,
899: we need to get a bound on how many subranges $S^t$ splinters into
900: through the action of $s(\cdot, \Delta)$.
901: We will do so by showing that only certain points can appear as the
902: boundaries of these subranges in the direction of $0$.
903: 
904: For fixed $\Delta$, define for each $i > 0$
905: \begin{displaymath}
906: \beta_i = \ceil{\frac{\Delta_i+\Delta_{i+1}}{2}}
907: \end{displaymath}
908: and
909: \begin{displaymath}
910: \beta_{-i} = \floor{\frac{\Delta_{-i}+\Delta_{-i-1}}{2}}.
911: \end{displaymath}
912: Let $\beta$ be the set of all finite $\beta_i$ and $\beta_{-i}$.
913: 
914: \begin{lemma}
915: \label{lemma-boundary-points}
916: Fix $S$ and $\Delta$ and let $\beta$ be defined as above.
917: Suppose that $S$ is positive.
918: Let $M = \{ \min(S_\Dis) : S_\Dis \ne \emptyset \}$ be the set of
919: minimum elements of subranges $S_\Dis$ of $S$.
920: Then $M$ is a subset of $S$ and contains no elements other than
921: \begin{enumerate}
922: \item $\min(S)$,
923: \item $\Delta_i$ for each $i > 0$, 
924: \item $\Delta_i+1$ for each $i > 0$, and
925: \item at most one of $\beta_i$ or $\beta_i+1$ for each $i > 0$,
926: \end{enumerate}
927: where the last case holds only with two-sided routing.
928: 
929: If $S$ is negative, the symmetric condition holds for
930: $M = \{ \max(S_\Dis) : S_\Dis \ne \emptyset \}$.
931: \end{lemma}
932: \begin{proof}
933: Consider some subrange $S_\Dis$ of $S$.  If $S_\Dis$ contains
934: $\min(S)$, the first case holds.  Otherwise:
935: (a) if $S_\Dis = S_{\Di 0}$, the second
936: case holds; (b) if $S_\Dis = S_{\Di +}$, the third case holds;
937: (c) if $S_\Dis = S_{\Di -}$, the fourth case holds, with $\min(S_{\Di
938: -}) = \beta_{i-1}$ if $\Delta_{i-1} + \Delta_i$ is odd, and either
939: $\beta_{i-1}$ or $\beta_{i-1}+1$ if $\Delta_{i-1} + \Delta_i$ is even,
940: depending on whether the tie-breaking rule assigns $\beta_{i-1}$ to
941: $S_{\Delta(i-1)+}$ or $S_{\Di -}$.
942: \end{proof}
943: 
944: We will call the elements of $M$ \buzz{boundary points} of $S$.
945: 
946: \subsubsection{Bounding changes in $\ln |S^t|$}
947: 
948: Now we would like to use Lemmas~\ref{lemma-aggregate-max-drop} and
949: Lemma~\ref{lemma-boundary-points} to get an upper bound on the rate at
950: which $\ln |S^t|$ drops as a function of the $\Delta$ distribution.
951: 
952: The following lemma is used to bound a sum that arises in
953: Lemma~\ref{lemma-log-drop}.
954: 
955: \begin{lemma}
956: \label{lemma-conditional-convex}
957: Let $c \ge 0$.
958: Let $\sum_{i=1}^{n} x_i = M$ where each $x_i \ge 0$ and at least one
959: $x_i$ is greater than $c$
960: Let $B$ be the set of all $i$ for which $x_i$ is greater than $c$.
961: Then
962: \begin{equation}
963: \label{eq-condition-convex}
964: \frac{
965:   \sum_{i \in B} x_i \ln x_i
966: }{
967:   \sum_{i \in B} x_i
968: }
969: \ge
970: \ln\left( \max \left(c, \frac{M}{n}\right)\right).
971: \end{equation}
972: \end{lemma}
973: \begin{proof}
974: If $\frac{M}{n} < c$, 
975: we still have $x_i > c$ for all
976: $i \in B$, so the left-hand side cannot be
977: less than $\ln c$.
978: So the interesting
979: case is when $\frac{M}{n} > c$.
980: 
981: Let $B$ have $b$ elements.  Then $\sum_{i \notin B} x_i < (n-b)c$
982: and $\sum_{i \in B} \ge M - (n-b)c = M-nc+bc$.
983: Because $x_i \ln x_i$ is convex, its sum over $B$ is minimized for fixed
984: $\sum_{i\in B} x_i$ by setting all such $x_i$ equal, in which case the
985: left-hand side of (\ref{eq-condition-convex}) becomes simply
986: $\ln(x_i)$ for any $i \in B$.
987: 
988: Now observe that setting all $x_i$ in $B$ equal gives
989: $x_i = \frac{M-nc+bc}{b} 
990: = \frac{M-nc}{b} + c 
991: \ge \frac{M-nc}{n} + c
992: = \frac{M}{n}$.
993: \end{proof}
994: 
995: \newcommand{\aS}{a^{-1}|S|}
996: \newcommand{\lndrop}{\ln|S^t|-\ln|S^{t+1}|}
997: \newcommand{\constdrop}{\ln\frac{1}{1 - a^{-1}}}
998: \begin{lemma}
999: \label{lemma-log-drop}
1000: Fix $a > 1$, and 
1001: let $S = S^{t}$ be a positive range with $|S| \ge a$.
1002: Define $\beta$ as in Lemma~\ref{lemma-boundary-points}.
1003: Let $S' = [\min(S) + \ceil{\aS} - 1, \max(S)-1]$.
1004: Let $A$ be the event $\left[\lndrop < \ln a\right]$.
1005: Then
1006: \begin{equation}
1007: \label{eq-log-drop}
1008: \E \left[ \lndrop : S^t, A \right]
1009: \le
1010: \constdrop + \frac{\ln \E[1+Z : S^t]}{\Pr[A: S^t]},
1011: \end{equation}
1012: where $Z = 2|\Delta \cap S'|$ with one-sided routing
1013: and $Z=2|\Delta \cap S'| + |\beta \cap S'|$ with two-sided routing.
1014: \end{lemma}
1015: \begin{proof}
1016: Call a subrange $S_\Dis$ \buzz{large} if $|S_\Dis| > \aS$ and
1017: \buzz{small} otherwise; the intent is that the large ranges are
1018: precisely those that yield $\lndrop < \ln a$.
1019: Observe that for any large $S_\Dis$, $|S_\Dis| > \aS \ge 1$, 
1020: implying any large set has at least two elements.
1021: 
1022: For any large $S_\Dis$, 
1023: $\max(S_\Dis) 
1024:  \ge \min(S) + \ceil{\aS} - 1$.
1025: Similarly
1026: $\min(S_\Dis)
1027:  \le \max(S) - 1$.
1028: So any large $S_\Dis$ intersects $S'$ in at least one point.
1029: 
1030: Let $T = \{T_1, T_2, \ldots, T_k\}$ 
1031: be the set of subranges $S_\Dis$, large or small, that
1032: intersect $S'$.  It is immediate from this definition
1033: that $\bigcup T \supseteq S'$ and thus $\sum |T_j| \ge |S'|$.
1034: 
1035: Using Lemma~\ref{lemma-boundary-points}, we can characterize the
1036: elements of $T$ as follows.
1037: \begin{enumerate}
1038: \item There is at most one set $T_j$ that contains $\min(T_j)$.
1039: \item There is at most one set $T_j$ that has $\min(T_j) = \Delta_i$ for each
1040: $\Delta_i$ in $S'$.
1041: \item There is at most one set $T_j$ that has $\min(T_j) = \Delta_i+1$ for
1042: each $\Delta_i$ in $S'$.
1043: \item With two-sided routing, 
1044: there is at most one set $T_j$ that has $\min(T_j) = \beta_i$ or
1045: $\min(T_j) = \beta_i+1$ for each $\beta_i$ in $S'$.  Note that there
1046: may be a set whose minimum element is $\beta_i+1$ where $\beta_i =
1047: \min(S') - 1$, but this set is already accounted for by the first
1048: case.
1049: \end{enumerate}
1050: 
1051: Thus $T$ has at most $1+Z = 1+2|\Delta \cap S'|$ elements with one-sided
1052: routing and at most $1+Z = 1+2|\Delta \cap S'| + |\beta \cap S'|$ elements
1053: with two-sided routing.
1054: 
1055: Conditioning on $|S^{t+1}| > \aS$, 
1056: $|S^{t+1}|$ is equal to $|S_\Dis|$ for some large $S_\Dis$ and thus
1057: for some large $T_j \in T$.
1058: Which large $T_j$ is chosen is proportional to its size, so
1059: for fixed $T$, we have
1060: \begin{eqnarray*}
1061: \E[\ln S^{t+1} : T, A] &=&
1062: \frac{
1063:   \sum_{j=1}^{|T|} |T_j| \ln |T_j|
1064: }{
1065:   \sum_{j=1}^{|T|} |T_j|
1066: } \\
1067: &\ge& \ln\left(\max\left(\aS, \frac{|\bigcup T|}{|T|}\right)\right) \\
1068: &\ge& \ln\left(\frac{|S'|}{|T|}\right),
1069: \end{eqnarray*}
1070: where the first inequality follows from
1071: Lemma~\ref{lemma-conditional-convex}.
1072: 
1073: Now let us compute
1074: \begin{eqnarray*}
1075: \E[\lndrop : S^t, A ]
1076: &=& \ln|S^t| - \E[\ln|S^{t+1}| : S^t, A] \\
1077: &\le& \ln|S^t| - \E[\ln |S'| - \ln |T| : S^t, A] \\
1078: &=& \ln \frac{|S^t|}{|S'|} + \E[\ln |T| : S^t, A] \\
1079: &\le& \ln \frac{|S^t|}{|S'|} + \frac{\E[\ln |T| : S^t]}{\Pr[A: S^t]} \\
1080: &\le& \constdrop + \frac{\ln \E[|T| : S^t]}{\Pr[A: S^t]}.
1081: \end{eqnarray*}
1082: In the second-to-last step, we use
1083: $\E[\ln |T| : S^t, A] \le \E[\ln |T| : S^t] / \Pr[A: S^t]$,
1084: which follows from
1085: $\E[\ln |T| : S^t] 
1086: = 
1087:  \E[\ln |T| : S^t, A] \Pr[A: S^t]
1088: +\E[\ln |T| : S^t, \neg A] \Pr[\neg A: S^t]$.
1089: In the last step, we use $\E[\ln |T| : S^t, A] \le \ln E[|T| : S^t, A]$,
1090: which follows from the concavity of $\ln$ and Jensen's inequality.
1091: \end{proof}
1092: 
1093: \subsubsection{Putting the pieces together}
1094: \label{Section-putting-the-pieces-together}
1095: 
1096: We now have all the tools we need to prove our lower bound.
1097: 
1098: \newcommand{\ZZ}{\mathbf{Z}} % the integers
1099: \newcommand{\bh}{\hat{\beta}}
1100: \newloglike{\roundfromzero}{absceil}
1101: \newcommand{\rfz}[1]{\roundfromzero\left({#1}\right)}
1102: \begin{theorem}
1103: \label{theorem-lower-bound}
1104: Let $G$ be a random graph whose nodes are labeled by the integers.
1105: Let $\Delta_x$ for each $x$ be a set of integer offsets chosen
1106: independently from some common distribution, subject to the constraint
1107: that $-1$ and $+1$ are present in every $\Delta_x$,
1108: and let node $x$ have an outgoing link to $x-\delta$ for each
1109: $\delta\in\Delta_x$.  Let $\ell = \E[|\Delta|]$.
1110: Consider a greedy routing trajectory in $G$ starting at a point chosen
1111: uniformly from $1 \ldots n$ and ending at $0$.
1112: 
1113: With one-sided routing, the expected time to reach $0$ is
1114: \begin{equation}
1115: \label{eq-lower-bound-one-sided}
1116: \Omega\left(
1117:       \frac{\log^2 n}{\ell \log \log n}
1118: \right).
1119: \end{equation}
1120: 
1121: With two-sided routing, the expected time to reach $0$ is
1122: \begin{equation}
1123: \label{eq-lower-bound-two-sided}
1124: \Omega\left(
1125:       \frac{\log^2 n}{\ell^2 \log \log n}
1126: \right),
1127: \end{equation}
1128: provided $\Delta$ is generated by including each
1129: $\delta$ in $\Delta$ with probability $p_\delta$, where (a) $p$ is
1130: unimodal, (b) $p$ is symmetric about $0$, and (c) the choices to
1131: include particular $\delta, \delta'$ are pairwise independent.
1132: \end{theorem}
1133: \begin{proof}
1134: Let $S^0 = \{ 1 \ldots n \}$.
1135: 
1136: We are going to apply Theorem~\ref{theorem-mean-lower-bound} to the
1137: sequence $S^0, S^1, S^2, \ldots$ with $f(S) = \ln |S|$.
1138: We have chosen $f$ so that when we reach the target, $f(S)=0$; so that
1139: a lower bound on $\tau$ gives a lower bound on the expected time of
1140: the routing algorithm.
1141: To apply the theorem,
1142: we need to
1143: show that (a) the probability that $\ln |S|$ drops by a large amount
1144: is small, and (b) that the integral in
1145: (\ref{eq-mean-lower-bound-T}) is large.
1146: 
1147: \begin{sloppypar}
1148: Let $a = 3 \ell \ln^3 n$.
1149: By Lemma~\ref{lemma-aggregate-max-drop},
1150: for all $t$,
1151: $\Prsta \le 3 \ell a^{-1} = \ln^{-3} n$,
1152: and thus
1153: $\Pr[\lndrop \ge \ln a : S^t] \le \ln^{-3} n$.
1154: This satisfies (\ref{eq-mean-lower-bound-U-epsilon})
1155: with $U = \ln a$ and $\epsilon = \ln^{-3} n$.
1156: \end{sloppypar}
1157: 
1158: For the second step,
1159: Theorem~\ref{theorem-mean-lower-bound} requires that we bound the
1160: speed of the change in $f(S)$ solely as a function of $f(S)$.  For
1161: one-sided routing this is not a problem, as
1162: Lemma~\ref{lemma-aggregate-ranges} shows that $f(S)$, which reveals
1163: $|S|$, characterizes $S$ exactly except when $|S| = 1$ and the lower
1164: bound argument is done.  For two-sided routing, the situation is more
1165: complicated; there may be some $S^t$ which is not of the form
1166: $\{1\ldots |S^t|\}$ or $\{0\}$, and we need a bound on the speed at
1167: which $\ln |S^t|$ drops that applies equally to all sets of the same
1168: size.
1169: 
1170: \begin{sloppypar}
1171: It is for this purpose (and only for this purpose)
1172: that we use our conditions on $\Delta$ for
1173: two-sided routing.
1174: Suppose that each $\delta$
1175: appears in $\Delta$ with probability $p_\delta$, that these
1176: probabilities are pairwise-independent, and that the sequence $p$ is
1177: symmetric and unimodal.
1178: Let $\bh = \left\{ \rfz{\frac{x+y}{2}} : x, y \in \Delta, x \ne y \right\}$,
1179: where $\rfz{z}$, the \buzz{absolute ceiling} of $z$,
1180: is $\ceil{z}$ when $z \ge 0$ and $\floor{z}$ when $z \le 0$.
1181: Observe that $\bh \supseteq \beta$; in effect, we are counting in
1182: $\bh$ all
1183: midpoints of pairs of distinct elements of $\delta$ without regard to 
1184: whether the elements are adjacent.
1185: For each $k$, the expected number of distinct 
1186: pairs $x$, $y$ with $x+y=z$ and
1187: $x,y \in \Delta$ is at most
1188: $b_k = \sum_{i=-\infty}^{\infty} p_{k-i} p_i$;
1189: this is a convolution of the non-negative, symmetric, and unimodal 
1190: $p$ sequence with itself and so it is also symmetric and unimodal.  
1191: It follows that for all $0 \le k < k'$, $b_k \ge b_{k'}$, and similarly
1192: $b_{-k} \ge b_{-k'}$.
1193: \end{sloppypar}
1194: 
1195: Now for the punch line: for each $\delta \ne 0$, 
1196: $q_\delta = b_{2\delta - \sgn \delta} + b_{2\delta}$
1197: is an upper bound on the expected number of distinct pairs $x,y$ that
1198: put $\delta$ in $\beta$, which is in turn an upper bound on 
1199: $\Pr[\delta \in \beta]$, and from the unimodularity of $b$ we have
1200: that $q_\delta \ge q_{\delta'}$ and $q_{-\delta} \ge q_{-\delta'}$
1201: whenever $0 < \delta < \delta'$.  Though $q$ grossly over counts
1202: the elements of $\beta$ (in particular, it gives a bound on $\E[|\beta|]$
1203: of $\ell^2$), its ordering property means that we can bound the
1204: expected number of elements of $\beta$ that appear in some subrange 
1205: of any positive $S^t$
1206: by using $q$ to bound the expected number of elements that
1207: appear in the corresponding subrange of $\{ 1 \ldots |S^t| \}$, and
1208: similarly for negative $S^t$ and $\{-1 \ldots - |S^t| \}$.
1209: Because $p_i$ already satisfies a similar ordering property, we
1210: can thus bound the number of elements of both $\Delta$ and $\beta$
1211: that hit a fixed subrange of $S^t$ given only $|S^t|$.  We do this next.
1212: 
1213: For convenience, formally define $p_i = \Pr[i \in \Delta]$ and 
1214: $q_i=0$ for one-sided routing.
1215: We will simplify some of the summations by first summing the $p_i$ and $q_i$
1216: over certain pre-defined intervals.
1217: For each integer $i > 0$ let 
1218: $A_i = \{ k \in \ZZ : a^i-1 \le k < a^{i+1}-1\}
1219:  = \{k \in \ZZ : \floor{\ln_a k+1} = i \}$.
1220: Let $\gamma_i = \sum_{k \in A_i} 2p_i +
1221: q_i$.  Note that $\gamma_i \ge 2\E[|A_i \cap \Delta|]$
1222: for one-sided routing and 
1223: $\gamma_i \ge 2\E[|A_i \cap \Delta|] + \E[|A_i \cap \beta|]$ 
1224: for two-sided routing.  
1225: Observe also that
1226: $\sum_{i=0}^{\infty} \gamma_i$ is at most
1227: $2\ell$ for one-sided routing and at most $2\ell + \ell^2$ for two-sided
1228: routing.
1229: 
1230: Consider some $S=S^t$.  
1231: Let $A$ be the event $\left[\lndrop < \ln a\right]$.
1232: If $|S| \ge a$,
1233: then by Lemma~\ref{lemma-log-drop} we have
1234: \begin{equation}
1235: \label{eq-log-drop-revisited}
1236: \E \left[ \lndrop : S^t, A \right]
1237: \le \constdrop + \frac{\ln \E\left[1+Z : S^t\right]}{\Pr[A: S^t]},
1238: \end{equation}
1239: where
1240: $Z = 2|\Delta \cap S'|$ with one-sided routing and
1241: $Z = 2|\Delta \cap S'| + |\beta \cap S'|$ with two-sided routing,
1242: with $S' = [\min(S) + \ceil{\aS} -1 , \max(S)-1]$ in each case, as in
1243: Lemma~\ref{lemma-log-drop}.
1244: 
1245: As we observed earlier, our choice of $a$ and
1246: Lemma~\ref{lemma-aggregate-max-drop} imply
1247: $\Pr[\lndrop \ge \ln a : S^t] \le \ln^{-3} n$, so 
1248: $\Pr[A: S^t] = 1-\Pr[\lndrop \ge \ln a: S^t] 
1249: \ge 1 - \ln^{-3} n \ge \frac{1}{2}$ for sufficiently
1250: large $n$.
1251: So we can replace (\ref{eq-log-drop-revisited}) with
1252: \begin{equation}
1253: \label{eq-log-drop-revisited-simple}
1254: \E \left[ \lndrop : S^t, A \right]
1255: \le \constdrop + 2 \ln \E\left[1+Z : S^t\right],
1256: \end{equation}
1257: 
1258: Let us now obtain a bound on $\ln \E[1+Z]$ in terms of $|S|$ and
1259: the $p_i$ and $q_i$.
1260: For one-sided routing, we use the fact that $|S| > 1$ implies
1261: $S=\{1\ldots|S|\}$.  For two-sided routing, we use monotonicity of the
1262: $p_i$ and $q_i$ to replace $S$ with $\{1\ldots|S|\}$;
1263: in particular, to replace a sum of $2p_i+q_i$ over a subrange of $S$
1264: with a sum over subrange of $\{1\ldots|S|\}$ that is at least as
1265: large. 
1266: In either case, we get that
1267: \begin{equation}
1268: \label{eq-bound-one-plus-z}
1269: \ln \E[1+Z] \le \ln\left(1 + \sum_{i = \ceil{\aS}-1}^{|S|-1} 2p_i + q_i\right),
1270: \end{equation}
1271: and thus
1272: $\E \left[ \lndrop : S^t, A \right]$ is bounded by
1273: \begin{equation}
1274: \label{eq-lower-bound-mu}
1275: \mu_{\ln |S|} = 
1276: \constdrop + 
1277: 2\ln\left(1 + \sum_{i = \ceil{\aS}-1}^{|S|-1} 2p_i + q_i\right),
1278: \end{equation}
1279: provided $|S| \ge a$.
1280: For $|S| < a$, set $\mu_{\ln|S|} = \ln a$.
1281: 
1282: \newcommand{\gzs}{\gamma_{z'} + \gamma_{z'+1} + \gamma_{z'+2}}
1283: \begin{sloppypar}
1284: Let us now compute $m_z$, as defined in (\ref{eq-mean-lower-bound-m}).
1285: For $z < \ln a$, $m_z = \ln a$.
1286: For larger $z$, observe that
1287: $m_z = \sup \left\{ m_{\ln |S|} : e^z \le |S| < a e^z \right\}$.
1288: Now if $e^z \le |S| < a e^z$, then the bounds on the sum in
1289: (\ref{eq-lower-bound-mu}) both lie between $\ceil{a^{-1} e^z}-1$ and
1290: $a e^z -1$, so that
1291: \begin{eqnarray*}
1292: \label{eq-lower-bound-m}
1293: m_z &\le& 
1294: \constdrop +
1295: 2\ln\left(1 + \sum_{i = \ceil{a^{-1}e^z}-1}^{\floor{ae^z-1}} 2p_i + q_i\right)
1296: \\
1297: &\le&
1298: \constdrop +
1299: 2\ln(1 + \gzs),
1300: \end{eqnarray*}
1301: where $z' = \floor{z/\ln a} - 1$.
1302: \end{sloppypar}
1303: 
1304: Finally, compute
1305: \begin{eqnarray*}
1306: T(\ln n)  &=&
1307: \int_{0}^{\ln n} \frac{1}{m_z} dz \\
1308: &\ge&
1309: \int_{\ln a}^{\ln n} \frac{1}{\constdrop+2\ln(1+\gzs)} dz \\
1310: &\ge&
1311:   \sum_{i = 0}^{\floor{\ln n / \ln a} - 1}
1312:     \frac{\ln a}{\constdrop+2\ln(1+\gamma_i + \gamma_{i+1} + \gamma_{i+2})}.
1313: \end{eqnarray*}
1314: 
1315: To get a lower bound on the sum, 
1316: note that
1317: \[\sum_{i = 0}^{\floor{\ln n / \ln a} - 1}
1318:   (\gamma_i + \gamma_{i+1} + \gamma_{i+2})
1319:  \le 3 \sum_{i=0}^{\floor{\ln n / \ln a} + 1} \gamma_i
1320:  \le 3 \sum_{i=0}^{\infty} \gamma_i,
1321:  \]
1322: which is at most $L = 6\ell$ for one-sided routing and at most
1323: $L = 6\ell+3\ell^2$ for two-sided routing.
1324: In either case, because $\frac{1}{c+2\ln(1+x)}$ is convex and decreasing,
1325: we have
1326: \begin{eqnarray}
1327: T(\ln n) &\ge&
1328:   \sum_{i = 0}^{\floor{\ln n / \ln a} - 1}
1329:     \frac{\ln a}{\constdrop + 2\ln(1+\gamma_i + \gamma_{i+1} + \gamma_{i+2})}
1330: \nonumber\\
1331: &\ge&
1332: \sum_{i = 0}^{\floor{\ln n / \ln a} - 1}
1333:     \frac{\ln a}{\constdrop+2\ln\left(1 + \frac{L}{\floor{\ln n / \ln a}}\right)}
1334: \nonumber\\
1335: &=&
1336: \frac{ \ln a
1337:     \floor{\ln n / \ln a}
1338: }{
1339:     \constdrop +
1340:     2\ln\left(1 + \frac{L}{\floor{\ln n / \ln a}}\right)}.
1341:     \label{eq-mean-lower-bound-ugly-T}
1342: \end{eqnarray}
1343: 
1344: We will now rewrite our bound on $T(\ln n)$ in a more convenient asymptotic
1345: form.  We will ignore the $1$ and concentrate on the large fraction.
1346: Recall that $a = 3 \ell \ln^3 n$,
1347: so $\ln a = \Theta(\ln \ell + \ln \ln n)$.
1348: Unless $\ell$ is polynomial in $n$, we have $\ln n / \ln a =
1349: \omega(1)$ and the numerator simplifies to $\Theta(\ln n)$.
1350: 
1351: Now let us look at the denominator.
1352: Consider first the term $\constdrop$.
1353: We can rewrite this term as $-\ln(1-a^{-1})$; since $a^{-1}$ goes to
1354: zero as $\ell$ and $n$ grow we have 
1355: $-\ln(1-a^{-1}) = \Theta(a^{-1}) = \Theta(\ell^{-1} \ln^{-3} n)$.
1356: It is unlikely that this term will contribute much.
1357: 
1358: \begin{sloppypar}
1359: Turning to the second term, let us use the fact that 
1360: $\ln(1+x) \le x$ for $x \ge 0$.
1361: Thus
1362: \begin{eqnarray*}
1363: 2\ln\left(1+\frac{L}{\floor{\ln n / \ln a}}\right)
1364: &\le& 2\,\frac{L}{\floor{\ln n/\ln a}}\\
1365: &=& O\left(\frac{L(\log l + \log \log n)}{\log n}\right),
1366: \end{eqnarray*}
1367: and the bound in (\ref{eq-mean-lower-bound-ugly-T}) simplifies to
1368: $\Omega\left(\log^2 n / \left( L (\log \ell + \log \log n)\right)\right)$.
1369: We can further assume that $\ell = O(\log^2 n)$, since otherwise the
1370: bound degenerates to $\Omega(1)$, and
1371: rewrite it simply as $\Omega\left(\log^2 n / \left(L \log \log n\right)\right).$
1372: \end{sloppypar}
1373: 
1374: For large $L$, the approximation
1375: $\ln(1+x) \le 1+\ln x$ for $x \ge 0.59$ is more useful.
1376: In this case (\ref{eq-mean-lower-bound-ugly-T}) simplifies to 
1377: $T(\ln n) = \Omega(\ln n / \ln \ell)$, which has a natural
1378: interpretation in terms of the tree of successor nodes of some single
1379: starting node and gives essentially the same bound as 
1380: Theorem~\ref{theorem-tree-lower-bound}. 
1381: 
1382: We are not quite done with Theorem~\ref{theorem-mean-lower-bound} yet,
1383: as we still need to plug our $T$ and $\epsilon$ into
1384: (\ref{eq-mean-lower-bound}) to get a lower bound on $\E[\tau]$.
1385: But here we can simply observe that 
1386: $\epsilon T = O(1/\log n)$, so the denominator in
1387: (\ref{eq-mean-lower-bound}) goes rapidly to $1$.
1388: Our stated bounds are thus finally obtained by substituting $O(\ell)$
1389: or $O(\ell^2)$ for $L$.
1390: \end{proof}
1391: 
1392: \subsubsection{Possible strengthening of the lower bound}
1393: 
1394: Examining the proof of Theorem~\ref{theorem-lower-bound}, 
1395: both the $\ell^2$ that appears in the bound
1396: (\ref{eq-lower-bound-two-sided}) for two-sided
1397: routing and the extra conditions imposed on the $\Delta$ distribution
1398: arise only as artifacts of our need to project each range $S$ onto
1399: $\{1\ldots|S|\}$ and thus reduce the problem to tracking a single
1400: parameter.  We believe that a more sophisticated argument that does
1401: not collapse ranges together would show a stronger result:
1402: \begin{conjecture}
1403: Let $G$, $\Delta$, and $\ell$
1404: be as in Theorem~\ref{theorem-lower-bound}.
1405: Consider a greedy routing trajectory starting at a point chosen
1406: uniformly from $1 \ldots n$ and ending at $0$.
1407: 
1408: Then the expected time to reach $0$ is
1409: \begin{displaymath}
1410: \Omega\left(
1411:       \frac{\log^2 n}{\ell \log \log n}
1412: \right),
1413: \end{displaymath}
1414: with either one-sided or two-sided routing, and no constraints on the
1415: $\Delta$ distribution.
1416: \end{conjecture}
1417: 
1418: We also believe that the bound continues to hold in higher dimensions
1419: than $1$.  Unfortunately, the fact that we can embed the line in, say,
1420: a two-dimensional grid is not enough to justify this belief;
1421: divergence to one side or the other of the line may change the
1422: distribution of boundaries between segments and break the proof of
1423: Theorem~\ref{theorem-lower-bound}.
1424: \subsection{Upper Bounds}
1425: \label{sec:UPPERBNDS}
1426: 
1427: In this section, we present upper bounds on the 
1428: delivery time of messages in a simple metric 
1429: space: a one-dimensional real line. To simplify
1430: theoretical analysis, the system 
1431: is set up as follows.
1432: \begin{itemize}
1433:   \item Nodes are embedded at grid points on the real
1434:         line.
1435:   \item Each node $u$ is connected to its nearest 
1436:         neighbor on either side and to one or more
1437:         long-distance neighbors.
1438:   \item The long-distance neighbors are chosen as
1439:         per the inverse power-law distribution with
1440:         exponent $1$, i.e.,
1441:         each long-distance neighbor $v$ is chosen 
1442:         with probability inversely proportional to 
1443:         the distance between $u$ and $v$. Formally,
1444:         Pr[$v$ is the $i$th neighbor of $u$] = 
1445:         $(\frac{1}{d(u,v)})/(\sum_{v'\neq u}\frac{1}{d(u,v')})$,
1446:         where $d(u,v)$ is the distance between nodes
1447:         $u$ and $v$ in the metric space.
1448:   \item Routing is done greedily by forwarding the 
1449:         message to the neighbor closest to the target 
1450:         node.
1451: \end{itemize}
1452: 
1453: We analyze the performance for the cases of a single 
1454: long-distance link and of multiple ones, both in a failure-free network
1455: and in a network with link and node failures. Note that when
1456: we say {\em node}, we actually refer to a vertex in the 
1457: virtual overlay network and not a {\em physical} node as 
1458: in the earlier sections. 
1459: 
1460: 
1461: \subsubsection{Single Long-Distance Link}
1462: \label{sec:INVERSE}
1463: 
1464: We first analyze the delivery time in an idealized model with no 
1465: failures and with one long-distance link per node.
1466: Kleinberg \cite{KL99} proved that with $n^d$ nodes embedded at grid 
1467: points in a $d$-dimensional grid, with each node $u$ connected to its 
1468: immediate neighbors and one long-distance neighbor $v$ chosen with 
1469: probability proportional to $1/d(u, v)^d$, any message can be 
1470: delivered in time polynomial in $\log n$ using greedy routing. 
1471: While this result can be directly applied to our model with $d=1$
1472: and $l=1$ to give a $O(\log^2 n)$ delivery time, we get a much simpler 
1473: proof by use of Lemma~\ref{lemma-probabilistic-recurrence-ub}. 
1474: We include the proof below for completeness.
1475: 
1476: \begin{theorem}
1477: \label{thm:UPPER-SINGLE}
1478: Let each node be connected to its immediate neighbors (at distance 1)
1479: and $1$ long-distance neighbor chosen with probability inversely 
1480: proportional to its distance from the node. Then the expected delivery 
1481: time with $n$ nodes in the network is $T(n)=O(H_n^2)$.
1482: \end{theorem}
1483: 
1484: \begin{proof}
1485: Let $\mu_k$ be the expected number of nodes crossed when the message is 
1486: at a node that is at a distance $k$ from the destination. 
1487: Clearly, $\mu_k$ is non-decreasing.
1488: 
1489: \begin{figure}[htb]
1490: \centerline{\epsfig{figure=1D.eps, height=75pt}}
1491: \caption{All the possible distances that can be
1492: covered from source node $s$.}
1493: \end{figure}
1494: 
1495: \noindent
1496: Let
1497: $$\mu_k = \frac{\sum_{i=1}^k \frac{1}{i} \cdot i}{S}
1498:         + \frac{\sum_{i=1}^{k-1} \frac{1}{2k-i} \cdot i}{S}
1499:         + \frac{\sum_{i=1}^{n_1-k} \frac{1}{i} \cdot 1}{S}
1500:         + \frac{\sum_{i=2k}^{n_2+k} \frac{1}{i} \cdot 1}{S},$$
1501: where
1502: $$
1503: S = \sum_{i=1}^{n_1-k} \frac{1}{i} + \sum_{i=1}^{n_2+k} \frac{1}{i}\\
1504:   = H_{n_1-k}+H_{n_2+k} < 2H_n.
1505: $$
1506: Then
1507: $$
1508: \mu_k > \frac{1}{S} [ k + 0 + H_{n_1-k} + H_{n_2+k} - H_{2k}]\\
1509:       > \frac{k}{S} > \frac{k}{2H_n}.\\
1510: $$
1511: Clearly, $\mu_k$ is non-decreasing, and thus
1512: using Lemma~\ref{lemma-probabilistic-recurrence-ub}, we get
1513: $$T(n) \leq \sum_{k=1}^n \frac{1}{\mu_k} 
1514: = \sum_{k=1}^n \frac{2H_n}{k}= O(H_n^2).$$
1515: Thus with this distribution, the delivery time is 
1516: logarithmic in the number of nodes. 
1517: \end{proof}
1518: 
1519: 
1520: \subsubsection{Multiple Long-Distance Links}
1521: \label{sec:MULT-LINKS}
1522: 
1523: The next interesting question is whether we can improve the $O(\log^2 n)$
1524: delivery time by using multiple links instead of a single one. In 
1525: addition to improvement in performance, multiple links  also give the
1526: benefit of robustness in the face of failures. We first look at 
1527: improvement in performance by using multiple links in the system
1528: and then go onto analysis of failures in Section~\ref{sec:LINK-FAIL}.
1529: Suppose that there are $\ell$ links from each node. 
1530: We consider different strategies for generating links and routing
1531: depending on number of links $\ell$ in two ranges: $\ell \in [1,\lg n]$ 
1532: and $\ell \in (\lg n, n^c]$. 
1533: 
1534: In \cite{KL01}, Kleinberg uses a group structure to get a delivery time
1535: of $O(\log n)$ for the case of a polylogarithmic number of links.
1536: However, he uses a more complicated algorithm for routing while we 
1537: obtain the same bound (for the case of a line) using only greedy routing. 
1538: 
1539: \begin{figure}[h]
1540: \begin{center}
1541: \input{qlinks.pstex_t}
1542: \caption{Multiple long-distance links for each node.}
1543: \end{center}
1544: \end{figure}
1545: 
1546: \paragraph{Upper Bound}
1547: Let us first consider a randomized strategy for link distribution 
1548: when $\ell \in [1, \lg n]$.
1549: 
1550: \begin{theorem}
1551: \label{thm:UPPER-RANDOMIZED-MULTIPLE}
1552: Let each node be connected to its immediate neighbors (at distance 1)
1553: and $\ell$ long-distance neighbors chosen independently with replacement
1554: with probability proportional to their distances from the node.
1555: Let $\ell \in [1, \lg n]$.  Then the expected delivery time 
1556: $T(n)=O(\log^2 n/\ell)$.
1557: \end{theorem}
1558: 
1559: \begin{proof}
1560: The basic idea for this proof comes from Kleinberg's model~\cite{KL99}.
1561: Kleinberg considers a two-dimensional grid with nodes at every grid point.
1562: The delivery of the message is divided into phases. A message is said to 
1563: be in phase $j$ if the distance from the current node to the destination 
1564: node is between $2^j$ and $2^{j+1}$. There are at most ($\lg n+1$) such 
1565: phases. He proves that the expected time spent in each phase is at most
1566: $O(\log n)$, thus giving a total upper bound of $O(\log^2 n)$ on the delivery 
1567: time. We use the same phase structure in our model, and this proof 
1568: is along similar lines.
1569: 
1570: In our multiple-link model, each node has $\ell$ long-distance neighbors
1571: chosen with replacement.  The probability that $u$ chooses a node $v$ as its 
1572: long-distance neighbor is
1573: $1-(1-q)^\ell$, where $q=\frac{d(u, v)^{-1}}{\sum_{u\ne v}d(u, v)^{-1}}$.
1574: We can get a lower bound on this probability as follows:
1575: \begin{eqnarray*}
1576: 1-(1-q)^\ell &>& 1 - (1 - q\ell + \frac{\ell(\ell-1)}{2}q^2)\\
1577: &=&q\ell - \frac{\ell(\ell-1)}{2}q^2 = q\ell\left[1-\frac{(\ell-1)q}{2}\right]\\
1578: &=&q\ell\left[1 -\frac{\ell q}{2} +\frac{q}{2}\right]\\
1579: &\geq&q\ell\left[1-\frac{\ell q}{2}\right].\\
1580: \end{eqnarray*}
1581: 
1582: Notice that $\ell q < 1$, because $q < \frac{1}{\lg n}$ and $\ell \leq \lg n$.
1583: So, the probability that $u$ chooses $v$ as its long-distance 
1584: neighbor is at least 
1585: 
1586: \begin{eqnarray*}
1587: q\ell\left[1-\frac{\ell q}{2}\right]
1588: &\geq&q\ell\left[1-\frac{1}{2}\right]=\frac{q\ell}{2}
1589: =\ell [2d(u,v)H_n]^{-1}.
1590: \end{eqnarray*}
1591: 
1592: Now suppose that the message is currently in phase $j$. 
1593: To end phase $j$ at this step, the message should enter a set of nodes $B_j$
1594: at a distance $\leq 2^j$ of the destination node $t$. There are at least $2^j$
1595: nodes in $B_j$, each within distance $2^{j+1} + 2^j < 2^{j+2}$ of $u$. So the
1596: message enters $B_j$ with probability 
1597: $\geq 2^j\ell\frac{1}{2H_n2^{j+2}} = \frac{\ell}{8H_n}$
1598: 
1599: Let $X_j$ be the total number of steps spent in phase $j$. Then
1600: $$
1601: E[X_j] = \sum_{i=1}^\infty Pr[X_j \geq i]
1602: \leq \sum_{i=1}^\infty\left( 1 - \frac{\ell}{8H_n} \right)^{i-1}
1603: = \frac{8H_n}{\ell}.
1604: $$
1605: 
1606: Now if $X$ denotes the total number of steps, then
1607: $X=\sum_{j=0}^{\lg n}X_j$, and by linearity of expectation, we get 
1608: $EX\leq(1+\lg n)(8H_n/\ell)=O(\log^2n/\ell)$.
1609: \end{proof}
1610: 
1611: 
1612: For $\ell \in (\lg n, n^c]$, we use a deterministic strategy. We represent 
1613: the location of each node as a number in a base $b\geq 2$, and 
1614: generate links to nodes at distances $1x, 2x, 3x, \ldots, (b-1)x$, for each 
1615: $x \in \{b^0, b^1, \ldots, b^{\lceil\log_b n\rceil -1} \}$.
1616: Routing is 
1617: done by eliminating the most significant digit of the distance at each step. 
1618: As this distance can be at most $b^{\lceil\log_b n\rceil}$, we get 
1619: $T(n)=O(\log_b n)$. This strategy is similar in spirit to Plaxton's
1620: algorithm \cite{PL97}.
1621: 
1622: Some special cases are instructive.
1623: Let $\ell=O(\log n)$ and let each node link to nodes in both directions at 
1624: distances $2^i, 1 \leq i \leq 2^{\log n-1}$, provided nodes are present at
1625: those distances. This gives $T(n)=O(\log n)$. Similarly let $\ell=O(\sqrt{n})$. 
1626: Links are established in both directions to existing nodes at distances $1, 2, 
1627: 3, \ldots, \sqrt{n}, 2\sqrt{n}, 3\sqrt{n}, \ldots, \sqrt{n}(\sqrt{n}-1)$, 
1628: giving $T(n)=O(1)$. In fact, $T(n)=O(1)$ when $b={n^c}$, for any fixed $c$.
1629: 
1630: \begin{theorem}
1631: \label{thm:UPPER-BOUND-DETERMINISTIC-MULTIPLE}
1632: Choose an integer $b>1$. With $\ell=(b-1)\lceil\log_b n\rceil$, let 
1633: each node link 
1634: to nodes at distances $1x, 2x, 3x, \ldots, (b-1)x$, for each $x \in \{b^0, b^1, 
1635: \ldots, b^{\lceil \log_b n\rceil -1} \}$. Then the delivery time $T(n) 
1636:  = O(\log_b n)$.
1637: \end{theorem}
1638: 
1639: \begin{proof}
1640: Let $d_1, d_2, \ldots d_t$ be the distances of the successive nodes in the 
1641: delivery path from the target $t$, where  $d_1$ is the distance of the source node 
1642: and $d_t=0$.  For each $d_i, \exists k_i \in \{0, 1, \ldots, 
1643: \lfloor \log_b n\rfloor\}$ such that 
1644: $$b^{k_i} \leq d_i < b^{k_i+1}.$$
1645: Hence
1646: $$1 \leq \lfloor \frac{d_i}{b^{k_i}} \rfloor < b.$$
1647: Now each node is connected to the node at distance $b^{k_i} \lfloor 
1648: \frac{d_i}{b^{k_i}} \rfloor$. We get
1649: $$
1650: d_{i+1} = d_i - b^{k_i} \lfloor \frac{d_i}{b^{k_i}} \rfloor
1651:         = d_i\mod b^{k_i}
1652:         < b^{k_i}.
1653: $$
1654: Thus $k_i$ drops by at least 1 at every step. As $k_1 \leq \lceil
1655: \log_b n\rceil$, we get 
1656: $T(n)=O(\log_b n)$.
1657: \end{proof}
1658: 
1659: 
1660: \subsubsection{Failure of Links}
1661: \label{sec:LINK-FAIL}
1662: 
1663: It appears that our
1664: linking strategies may fail to give the same delivery time
1665: in case the links fail. However, we show that we get reasonable
1666: performance even with link failures.  In our model, we assume
1667: that each link  is present independently with probability $p$.
1668: Let us first look at
1669: the randomized strategy for number of links $\ell \in [1, \lg n]$.\\
1670: 
1671: \begin{figure}[htb]
1672: \centerline{\epsfig{figure=absent_link.eps, height=75pt}}
1673: \caption{Each long-distance link is present with probability $p$.}
1674: \end{figure}
1675: 
1676: Our proof is along similar
1677: lines as our proof for the case of no failures. 
1678: Intuitively, since some of the links fail, we
1679: expect to spend more time in each phase and this time 
1680: should be inversely proportional to the probability with
1681: which the links are present. We prove that the expected time
1682: spent in one phase is $O(\log n/p\ell)$, which gives a total
1683: delivery time of $O(\log^2 n/p \ell)$. We assume that the links
1684: to the immediate neighbors are always present so that a message
1685: is always delivered even if it takes very long.
1686: 
1687: \begin{theorem}
1688: Let the model be as in Theorem~\ref{thm:UPPER-RANDOMIZED-MULTIPLE}.
1689: Assume that the links to the immediate neighbors are always present.
1690: If the probability of a long-distance link being present is $p$,
1691: then the expected delivery time is $O(\log^2 n/p\ell)$.
1692: \end{theorem}
1693: 
1694: \begin{proof}
1695: Recall that in case of no link failures, the probability that $u$ 
1696: chooses a node $v$ as its long-distance neighbor is at 
1697: least $q\ell/2$ 
1698: where $q=\frac{d(u, v)^{-1}}{\sum_{u\ne v}d(u, v)^{-1}}$.
1699: 
1700: Now when we consider link failures, given that $u$ chose
1701: $v$ as its long-distance neighbor, the probability that
1702: there is a link present between $u$ and $v$ is $p$.
1703: So, the probability that $u$ chooses a node $v$ as its long-distance neighbor 
1704: is at least $pq\ell/2 = p\ell[2d(u,v)H_n]^{-1}$.
1705: 
1706: The rest of the proof is the same as the proof for 
1707: theorem~\ref{thm:UPPER-RANDOMIZED-MULTIPLE}. Let $X_j$ be the
1708: number of steps spent in phase $j$. Then
1709: $$E[X_j]=\sum_{i=1}^\infty Pr[X_j \geq i]
1710: = \frac{8H_n}{p\ell}.$$
1711: 
1712: If $X$ denotes the total number of steps, then by linearity of
1713: expectation, we get 
1714: $EX\leq(1+\lg n)(8H_n/p\ell)=O(\log^2n/p\ell)$.
1715: \end{proof}
1716: 
1717: 
1718: 
1719: We turn to the deterministic strategy with $\ell \in (\lg n, n^c]$
1720: links. A similar intuition works for $\ell \in (\lg n, n^c]$. If a 
1721: link fails, then the node has to take a shorter long-distance link,
1722: which will not take the message as close to the target as the initial 
1723: failed link. Clearly as $p$ decreases, the message has to take
1724: shorter and shorter links which increases the delivery time.
1725: 
1726: To make the analysis simpler, we
1727: change the link model a bit and let each node be
1728: connected to other nodes at distances $b^0, b^1, b^2, \ldots, 
1729: b^{\lfloor \log_b n \rfloor}$.
1730: Once again, we compute the expected distance covered from the
1731: current node and use Lemma~\ref{lemma-probabilistic-recurrence-ub}
1732: to get a delivery time of $O(b \log n/p)$.  As $p$ decreases,
1733: the delivery time increases; whereas as $b$ decreases,
1734: the delivery time decreases, but the
1735: information stored at each node increases.
1736: 
1737: \begin{theorem}
1738: Let the number of links be $O(\log_b n)$, and let each node have a link
1739: to distances $b^0, b^1, b^2, \ldots, b^{\lfloor \log_b n \rfloor}$.
1740: Assume that the links to
1741: the nearest neighbors are always present. If the probability of 
1742: a link being present is $p$, then the delivery time 
1743: $T(n)= O(bH_n/p)$.
1744: \end{theorem}
1745: 
1746: \begin{proof}
1747: Let the distance of the current node
1748: from the destination be $k$. Let $\mu_k$ represent the distance covered 
1749: starting from this node. Then with probability $p$, there will be a 
1750: link covering distance $\flrbk{}$. If this link is absent with 
1751: probability $q=1-p$, then we can cover a distance $\flrbk{-1}$ 
1752: with a single link with probability $pq$ and so on. In general,
1753: the average distance $\mu_k$ covered when the message is at distance $k$ 
1754: from the destination is
1755: \begin{eqnarray*}
1756: \mu_k&=&p\flrbk{} + pq\flrbk{-1} + \ldots 
1757:            + pq^{\lfloor \log_b k \rfloor-1}b^1
1758:            + q^{\lfloor \log_b k \rfloor}b^0 \\
1759: &\geq& \sum_{i=0}^{\lfloor \log_b k \rfloor}
1760:        p\flrbk{-i}q^i\\
1761: &=&p\flrbk{} \sum_{i=0}^{\lfloor \log_b k \rfloor} \left(\frac{q}{b}\right)^i\\
1762: &=&p\flrbk{} \frac{1-\left(q/b\right)^{\lfloor \log_b k \rfloor+1}}{1-(q/b)}\\
1763: &=&\frac{p(\flrbk{+1}-q^{\lfloor \log_b k \rfloor+1})}{b-q}\\
1764: &\geq&\frac{p(bk/b-1)}{b-q}\\
1765: &\geq&\frac{p(k-1)}{2(b-q)}.\\
1766: \end{eqnarray*}
1767: Using Lemma~\ref{lemma-probabilistic-recurrence-ub}, we get
1768: $$
1769: T(n) \leq \sum_{k=1}^n\frac{1}{\mu_k}
1770: =1+\sum_{k=2}^n\frac{2(b-q)}{p(k-1)}
1771: =1+\frac{2(b-q)}{p}\left[\sum_{k=2}^n\frac{1}{(k-1)}\right]
1772: =O(bH_n/p).
1773: $$
1774: \end{proof}
1775: 
1776: 
1777: \subsubsection{Failure of Nodes}
1778: \label{sec:NODE-FAIL}
1779: 
1780: We consider two different cases of node failures when  we study 
1781: their effect on system performance. In the first case, as described in 
1782: Section~\ref{sec:BIN-NODE-FAIL}, some of the nodes may fail 
1783: and then the remaining nodes will link to each other as 
1784: per the link distribution. In the second case, as explained
1785: in Section~\ref{sec:GEN-NODE-FAIL}, the nodes first link to 
1786: their neighbors and then some of the nodes may fail. 
1787: 
1788: \paragraph{Binomially Distributed Nodes}
1789: \label{sec:BIN-NODE-FAIL}
1790: 
1791: Let $p$ be the 
1792: probability that a node is present at any point. Here also, each node is
1793: connected to its nearest neighbors and one long-distance neighbor. In
1794: addition, the probability of choosing a particular node as a long-distance 
1795: neighbor is conditioned on the existence of that node. 
1796: 
1797: \begin{theorem}
1798: \label{thm:UPPER-BINOMIAL}
1799: Let the model be as in Theorem~\ref{thm:UPPER-SINGLE}.
1800: Let each node be present with probability $p$ and all nodes
1801: link only to existing nodes. Then the worst-case expected delivery time 
1802: is $O(\log^2 n)$.
1803: \end{theorem}
1804: 
1805: \begin{proof}
1806: We bound the expected drop $\mu_k$ as follows:
1807: 
1808: \begin{eqnarray*}
1809: \mu_k &=& \frac{\sum_{i=1}^k \frac{1}{i} \cdot i \cdot p}{p \cdot S}
1810:        +  \frac{\sum_{i=1}^{k-1} \frac{1}{2k-i} \cdot i \cdot p}{p \cdot S}
1811:        +  \frac{\sum_{i=1}^{n_1-k} \frac{1}{i} \cdot 1 \cdot p}{p \cdot S}
1812:        +  \frac{\sum_{i=2k}^{n_2+k} \frac{1}{i} \cdot 1 \cdot p}{p \cdot S}\\
1813:       &>& \frac{1}{S} [ k + 0 + H_{n_1-k} + H_{n_2+k} - H_{2k}]\\
1814:       &>& \frac{k}{S} > \frac{k}{2H_n}.\\
1815: \end{eqnarray*}
1816: 
1817: Using Lemma~\ref{lemma-probabilistic-recurrence-ub}, 
1818: we get $T(n)\leq \sum_{k=1}^n 1/\mu_k
1819: =O(H_n^2)$. This is exactly the same result that we get in
1820: Section~\ref{sec:INVERSE} where all the nodes are present. 
1821: \end{proof}
1822: 
1823: This result is not
1824: surprising because if nodes link only to other existing nodes, the
1825: only difference is that we get a smaller random graph. This does
1826: not affect the routing algorithm or the delivery time. 
1827: 
1828: 
1829: \paragraph{General Failures}
1830: \label{sec:GEN-NODE-FAIL}
1831: 
1832: We observe that the analysis for node failures is not as simple as that
1833: for link failures because we no longer
1834: have the important property of independence that we have 
1835: in the latter case. In the case of link failures, 
1836: the nodes first choose their neighbors and then it is possible that
1837: some of these links fail; thus, the event that a node is connected
1838: to another node is completely independent of the event that, say, its
1839: neighbor is connected to the same node. Each link fails independently, and
1840: so the accessibility of a target node from any other node depends only
1841: on the presence of the link between the two nodes in question.
1842: 
1843: In case of node failures, this important independence
1844: property is no longer true. Suppose that a
1845: node $u$ cannot communicate with some other node $v$ (because $v$
1846: failed), even though there may be a functional link between $u$
1847: and $v$. Now the probability of some other node $w$ being able
1848: to communicate with $v$ is not independent of the probability 
1849: that $u$ can communicate with $v$ because the probability of 
1850: $v$ being absent is common for both the cases. This complicates 
1851: the analysis of the performance because it is no longer the case 
1852: that if one node cannot communicate with some other node, it has a 
1853: good chance of doing so by passing the message to its neighbor. 
1854: 
1855: In order to analyze this situation, we consider jumps only to one 
1856: phase lower rather than jumping over several phases.  The idea is 
1857: that the jumps between phases are independent, so once we move from 
1858: phase $j$ to phase $j-1$, further routing no longer depends on
1859: any nodes in phase $j$. We can condition on the number of nodes being 
1860: alive in the lower phase and estimate the time spent in each phase. 
1861: Intuitively, if a node is present with probability $p$, we would expect 
1862: to wait for a time inversely proportional to $p$ in anticipation of 
1863: finding a node in the lower phase to jump to.
1864: 
1865: \begin{theorem}
1866: Let the model be as in Theorem~\ref{thm:UPPER-RANDOMIZED-MULTIPLE}
1867: and let each node fail with probability $p$.
1868: Then the expected delivery time is $O(\log^2n/(1-p))$.
1869: \end{theorem}
1870: 
1871: \begin{proof}
1872: Let $T$ be the time taken to drop down from layer
1873: $j$ to layer $j-1$. Let $l$ out of $N$ nodes be alive 
1874: in layer $j-1$ and let $q$ be the probability that
1875: a node in layer $j$ is connected to some node in 
1876: layer $j-1$. Then the expected time to drop to
1877: layer $j-1$, given that there are $l$ live nodes
1878: in it, is given by
1879: \begin{eqnarray*}
1880: E[T|l] &=& 1 + \left[ (1-q) + \frac{q(N-l)}{N} \right] E[T|l]\\
1881: &=& \frac{N}{ql}.
1882: \end{eqnarray*}
1883: 
1884: Now $l$ can vary between $1$ and $N$. (Note that $l$
1885: cannot be $0$ because if there are no live nodes in the
1886: lower layer, the routing fails at this point.) 
1887: We get
1888: \begin{eqnarray*}
1889: E[T] &=& 
1890: \sum_{l=1}^{N}\frac{N}{ql}\left[ p^{N-l}(1-p)^l{{N}\choose{l}} \right]\\
1891: &=&\frac{N}{q}\sum_{l=1}^{N}\frac{1}{l}p^{N-l}(1-p)^{l}{N\choose l}\\
1892: &\leq&\frac{N}{q}\sum_{l=1}^{N}\frac{2}{l+1}p^{N-l}(1-p)^{l}{N\choose l}\\
1893: &=&\frac{2N}{q(N+1)(1-p)}\sum_{l=1}^{N}p^{N-l}(1-p)^{l+1}{N+1\choose l+1}\\
1894: &\leq&\frac{2N}{q(N+1)(1-p)}\left[p+(1-p)\right]^{N+1}\\
1895: &=&\frac{2N}{q(N+1)(1-p)}.
1896: \end{eqnarray*}
1897: 
1898: Not surprisingly, the expected waiting time in a layer
1899: is inversely proportional to the probability of being
1900: connected to a node in the lower layer and to the probability
1901: of such a node being alive.
1902: 
1903: For our randomized routing strategy with $[1, \lg n]$ links,
1904: $q\approx 1/(H_n\ell)$. Since there are at most $(\lg n +1)$ layers, 
1905: we get an expected delivery time of $O(\log^2 n/(1-p)\ell)$.
1906: \end{proof}
1907: 
1908: In contrast, for our deterministic routing strategy, certain 
1909: carefully chosen node failures can lead to dismal situations where a
1910: message can get stuck in a local neighborhood with no hope of getting 
1911: out of it or eventually reaching the destination node. We conjecture 
1912: that this should be a very low probability event, so its occurrence 
1913: will not affect the delivery time considerably. We have not yet analyzed
1914: this situation formally.
1915: 
1916: \section{Construction of Graphs}
1917: \label{sec:RANDOMGRAPHS}
1918: 
1919: As the group of nodes present in the network changes, so does the
1920: graph of the virtual overlay network. In order for our routing  
1921: techniques to be effective, the graph must always exhibit the 
1922: property that the likelihood of any two vertices $v,u$ being connected
1923: is $\Omega(1/d(v,u))$. We describe a heuristic approach to 
1924: construct and maintain a random graph with such an invariant.
1925: 
1926: Since the choice of links leaving each vertex is independent of 
1927: the choices of other vertices, we can assume that points
1928: in the metric space are added one at a time. Let $v$ be the $k$-th
1929: point to be added. Point $v$ chooses the sinks of its outgoing links
1930: according to the inverse power law distribution with exponent $1$ 
1931: and connects to them by
1932: running the search algorithm. If a desired sink $u$ is not present, $v$
1933: connects to $u$'s closest live neighbor. In effect, each of the 
1934: $k-1$ points already present before $v$ is surrounded by a basin of 
1935: attraction, collecting probability mass in proportion to its length. 
1936: Since we assume the hash function populates the metric space evenly, 
1937: and because of absolute symmetry, the basin length $L$ has the same 
1938: distribution for all points. It is easy to see that with high probability, 
1939: $L$ will not be much smaller than its expectation: $\prob{L \leq c 
1940: \cdot k^{-1}}=1-(1-c\cdot k^{-1})^{k-1}$. A lower bound on the 
1941: probability that the link $(v,u)$ is present is $c' \cdot k^{-1} 
1942: \cdot d(v,f)^{-1}$, where $f$ is the point in $u$'s basin that is the 
1943: farthest from $v$.\footnote{The constant $c'$ has absorbed $c$ and the 
1944: normalizing constant for the distribution.} However, the bound holds 
1945: only if $u$ is among the $k-1$ points added before $v$. Otherwise, 
1946: the aforementioned probability is $0$, which means that we need to amend 
1947: our linking strategy to transfer probability mass from the case of 
1948: $u$ having arrived before $v$ to the case of $u$ having arrived after $v$.
1949: We describe next how to accomplish this task.
1950: 
1951: Let $v$ be a new point.  We give earlier points the opportunity
1952: to obtain outgoing links to $v$ by having $v$ (1) calculate the
1953: number of incoming links it ``should'' have from points added before it 
1954: arrived, and (2) choose such points according to the inverse 
1955: power-law distribution with exponent 1.\footnote{All this can be easily 
1956: calculated by $v$ since the link probabilities are symmetric.} If $\ell$ 
1957: is the number of outgoing links for each point, then $\ell$ will also be 
1958: the expected number of incoming links that $v$ has to estimate in step 
1959: (1).  
1960: We approximate the number of links 
1961: ending at $v$ by using a Poisson distribution with 
1962: rate $\ell$, that is, the probability that $v$ has $k$ incoming links is 
1963: $\frac{e^{-l}l^k}{k!}$, and the expectation of the distribution is $\ell$. 
1964: 
1965: After step (2) is completed by $v$, each chosen point $u$ responds to 
1966: $v$'s request by choosing one of its existing links to be replaced by 
1967: a link to $v$. The choice of the link to replace can vary. We use a 
1968: strategy that 
1969: builds on the work of Sarshar~\etal~\cite{SarsharR02}. In that work, the 
1970: authors use ideas of Zhang~\etal~\cite{ZhangGG02} to build a graph where each 
1971: node has a single long-distance link to a node at distance $d$ with probability 
1972: $1/d$. When a node with a long-distance link at distance $d_1$ encounters a 
1973: new node at distance $d_2$, either due to its arrival or due to a data request, 
1974: it replaces its existing link with probability $p_2/(p_1+p_2)$, where
1975: $p_i=1/d_i$, and links to the new node. We extend this idea to our case of 
1976: multiple long-distance links. Consider a node $u$ with $k$ neighbors at distances 
1977: $d_1, d_2, \ldots, d_k$. When a new node $v$ at distance $d_{k+1}$ 
1978: requests an incoming link from $u$, $u$ replaces one of its existing links
1979: with a link to $v$ with probability $p_{k+1}/\sum_{j=1}^{k+1}p_j$. This is 
1980: a trivial extension of the formula $p_2/(p_1+p_2)$ of \cite{SarsharR02}.
1981: However, this probability must now be distributed among $u$'s $k$ existing long-distance
1982: links since $u$ needs to choose one of them to redirect to $v$. We choose to
1983: do that according to the inverse power-law distribution with exponent 1, that is,
1984: $u$ chooses to replace its link to the node at distance $d_i$, $1\leq i \leq k$, 
1985: with probability $(p_i/\sum_{j=1}^{k} p_j)$. Hence, the probability that $u$ 
1986: decides to link to $v$ and decides to replace its existing link to the node at 
1987: distance $d_i$ with a link to $v$ is equal to $(p_i/\sum_{j=1}^{k} p_j) \cdot
1988: (p_{k+1}/\sum_{j=1}^{k+1}p_j)$. Notice that $u$ may decide not to redirect 
1989: any of its existing links to $v$ with probability $1-p_{k+1}/\sum_{j=1}^{k+1}p_j$.
1990: The intuition for using such replacement strategy comes from the invariant that we
1991: want to maintain dynamically as new nodes arrive: $u$ has a link to a node 
1992: $i$ at distance $d_i$ with probability inversely proportional to $d_i$; hence, 
1993: conditioning on $u$ having $k$ long-distance links, the following equation must hold.
1994: \begin{eqnarray*}
1995: \prob{\mbox{$u$ replaces link to $i$ with link to $v$}} & = &
1996: \prob{\mbox{$u$ has a link to $i$ before $v$ arrives}} \\
1997: & - & \prob{\mbox{$u$ has a link to $i$ after $v$ arrives}} \\
1998: & = & \frac{p_i}{\sum_{j=1}^k p_j} - \frac{p_{i}}{\sum_{j=1}^{k+1} p_j} \\
1999: & = & \frac{p_i}{\sum_{j=1}^k p_j} \cdot \frac{p_{k+1}}{\sum_{j=1}^{k+1}p_j}. \\
2000: \end{eqnarray*}
2001: The same heuristic can be used for regeneration of links when a node crashes.
2002: 
2003: To analyze the performance of the heuristic in practice, we used it to construct a 
2004: network of $2^{14}$ nodes with $14$ links each, ten separate times. After 
2005: averaging the results over the ten networks, we plotted the distribution of 
2006: long-distance links derived from the heuristic, along with the ideal inverse 
2007: power-law distribution with exponent 1, as shown in 
2008: Figure~\ref{fig:DISTRIBUTION}. We see that the derived distribution tracks the 
2009: ideal one very closely, with the largest absolute error being roughly equal to 
2010: $0.022$ for links of length $2$, as shown in the graph of 
2011: Figure~\ref{fig:ERROR}. 
2012: 
2013: We also performed experiments for an alternative link replacement strategy:
2014: a node chooses its {\em oldest} link to replace with a link to the new node. 
2015: The performance of this strategy is almost as good as the performance 
2016: of our replacement strategy described previously. We omit 
2017: those results because it is difficult to distinguish between the results 
2018: of the two strategies on the scale used for our graphs.
2019: 
2020: There has also been other related work~\cite{PRU01} on how to construct,
2021: with the support of a central server, random graphs with many desirable 
2022: properties, such as small diameter and guaranteed connectivity with 
2023: high probability. Although it is not clear what kind of fault-tolerance 
2024: properties this approach offers if the central server crashes, or how 
2025: the constructed graph can be used for efficient routing, it is likely 
2026: that similar techniques could be useful in our setting.
2027: 
2028: \begin{figure}
2029: \centering
2030:   \mbox{\subfigure[The derived distribution.
2031:     \label{fig:DISTRIBUTION}]
2032:        {\epsfig{figure=dist.ps, width=0.45\textwidth}}
2033:        \subfigure[Absolute error.
2034:         \label{fig:ERROR}]
2035:        {\epsfig{figure=error.ps, width=0.45\textwidth}}}
2036: \caption{(a) The distribution of long-distance links produced by the 
2037: inverse-distance
2038: heuristic (DERIVED) compared to the ideal inverse power-law distribution with 
2039: exponent $1$ (IDEAL). (b) The absolute error between the derived 
2040: distribution and the ideal inverse power-law distribution with exponent 
2041: $1$.}
2042: \end{figure}
2043: 
2044: \section{Experimental Results}
2045: \label{sec:EXPERIMENTS}
2046: 
2047: We simulated a network of $n=2^{17}$ nodes at the application level. Each 
2048: node is connected to its immediate neighbors and has $\lg n=17$ long-distance 
2049: links chosen as per the inverse power law distribution with exponent $1$ as 
2050: explained in Section~\ref{sec:UPPERBNDS}. Routing is done greedily by forwarding 
2051: a message to the neighbor closest to its target node. In each simulation, the 
2052: network is set up afresh, and a fraction $p$ of the nodes fail. 
2053: We then repeatedly choose random source and destination nodes that have not 
2054: failed and route a message between them. For each value of 
2055: $p$, we ran $1000$ simulations, delivering $100$ messages
2056: in each simulation, and averaged the number of hops 
2057: for successful searches and the number of failed searches.
2058: 
2059: With node failures, a node may not be able to find a live
2060: neighbor that is closer to the target node than itself. We studied
2061: three possible strategies to overcome this problem as follows.
2062: 
2063: \begin{enumerate}
2064:  \item Terminate the search.
2065:  \item Randomly choose another node, deliver the message to 
2066:        this new node and then try to deliver the message from this
2067:        node to the original destination node (similar to 
2068:        the hypercube routing strategy explained in~\cite{LV82}).
2069:  \item Keep track of a fixed number (in our simulations, $5$) 
2070:        of nodes through which the message is last routed and backtrack. 
2071:        When the search reaches a node from where it cannot proceed, it 
2072:        backtracks to the most recently visited node from this list and 
2073:        chooses the next best neighbor to route the message to. 
2074: \end{enumerate}
2075: 
2076: For all these strategies we note that once a node chooses its best neighbor, 
2077: it does not send the message to any other link if it finds out that the best 
2078: neighbor has failed.
2079: 
2080: \begin{figure}
2081: \centering
2082:   \mbox{\subfigure[Fraction of failed searches.]
2083:        {\epsfig{figure=f.ps, width=0.45\textwidth}}\quad
2084:        \subfigure[Average delivery time for successful searches.]
2085:        {\epsfig{figure=o.ps, width=0.45\textwidth}}}
2086: \caption{(a) The fraction of messages that fail to be delivered
2087: as a function of the fraction of failed nodes. (b) The average delivery
2088: time for successful searches as a function of the fraction of
2089: failed nodes.}
2090: \label{fig:RESULTS}
2091: \end{figure}
2092: 
2093: Figure~\ref{fig:RESULTS} shows the fraction of messages that 
2094: fail to be delivered and the number of hops for successful 
2095: searches versus the fraction of failed nodes. We see that the 
2096: system behaves well even with a large number of failed nodes.
2097: In addition, backtracking 
2098: gives a significant improvement in reducing the number of failures as 
2099: compared to the other two methods, although it may take a longer time 
2100: for delivery. We see that in the case of random rerouting,
2101: the average delivery time does not increase too much as the probability 
2102: of node failure increases.
2103: This happens because quite a few of the searches fail, so the ones
2104: that succeed (with a few hops) lead to a small average delivery time.
2105: 
2106: Our results may not be directly comparable to those of CAN\cite{SR01}
2107: and Chord\cite{CH01}, since they use different simulators for 
2108: their experiments. However, to the extent that the results are comparable, 
2109: our methods appear to perform as well as theirs.
2110: Even if we just terminate the search, we get less than $p$ fraction
2111: of failed searches with $p$ fraction of failed nodes. Chord\cite{CH01} 
2112: has roughly the same performance {\em after} their network stabilizes
2113: using their repair mechanism. Further, with backtracking we see that with
2114: $80\%$ failed nodes, we still get less than $30\%$ failed searches.
2115: These results are very promising and it would be interesting to 
2116: study backtracking analytically.
2117: 
2118: We also compared the performance of the ideal network and that of the
2119: network constructed using the heuristics given in Section~\ref{sec:RANDOMGRAPHS}.
2120: We ran $10$ iterations of constructing a network of $16384$ nodes, both
2121: ideally as well as according to the heuristic, and delivered $1000$ messages
2122: between randomly chosen nodes.
2123: Figure~\ref{fig:COMPARE} shows the number of failed searches as the probability 
2124: of node failure increases. We see that although the network constructed
2125: using the heuristic does not perform as well as the ideal network, the
2126: number of failed searches is comparable. 
2127: 
2128: \begin{center}
2129: \begin{figure}[ht]
2130:   \centerline{\mbox{\epsfig{figure=failed.ps, width=0.5\textwidth}}}
2131:   \caption{Fraction of failed searches.}
2132: \label{fig:COMPARE}
2133: \end{figure}
2134: \end{center}
2135: 
2136: \section{Conclusions and Future Work}
2137: \label{sec:CONCLUSIONS}
2138: 
2139: \begin{table}[ht]
2140: \begin{center}
2141: \begin{tabular}{|c|c|c|c|}
2142: 
2143: \hline
2144: Model&
2145: Number of Links $\ell$
2146: &Upper Bound 
2147: &Lower Bound\\
2148: 
2149: \hline
2150: \multirow{3}*{No failures}
2151: &1\bigstrut
2152: &$O(\log^2 n)$\bigstrut
2153: &$\Omega(\frac{\log^2 n}{\log \log n})$\bigstrut\\
2154: 
2155: &$[1, \lg n]$\bigstrut
2156: &$O(\frac{\log^2 n}{\ell})$\bigstrut
2157: &$\Omega(\frac{\log^2 n}{\ell \log \log n})$\bigstrut\\
2158: 
2159: &$[\lg n, n^c]$\bigstrut
2160: &$O(\frac{\log n}{\log b})$\bigstrut
2161: &$\Omega(\frac{\log n}{\log \ell})$\bigstrut\\
2162: 
2163: \hline
2164: \hline
2165: 
2166: \multirow{2}*{Pr[Link present]=$p$}
2167: &$[1, \lg n]$\bigstrut
2168: &$O(\frac{\log^2 n}{p\ell})$\bigstrut
2169: &-\bigstrut\\
2170: 
2171: &$[\lg n, n^c]$\bigstrut
2172: &$O(\frac{b\log n}{p})$\bigstrut
2173: &-\bigstrut\\
2174: 
2175: \hline
2176: \hline
2177: 
2178: \multirow{2}*{Pr[Node present]=$p$}
2179: &\multirow{2}*{$[1, \lg n]$}
2180: &\multirow{2}*{$O(\frac{\log^2 n}{p\ell})$}
2181: &\multirow{2}*{-}\\
2182: 
2183: &&&\\
2184: 
2185: \hline
2186: \end{tabular}
2187: \end{center}
2188: \caption{Summary of upper and lower bounds for routing.\protect\footnotemark}
2189: \label{table-results}
2190: \end{table}
2191: \footnotetext{In the upper bound with 
2192: $(\lg n, n^c]$ links, the number of links
2193: $\ell=O(b\log_b n)$. Also, the deterministic strategy 
2194: used for links $\ell \in (\lg n, n^c]$, 
2195: with link failures is 
2196: slightly different that the one with no failures, 
2197: and $\ell=O(\log_b n)$.
2198: In the lower bound column, the bound for $[1,\lg n]$ links is for
2199: one-sided routing.}
2200: 
2201: Table~\ref{table-results} summarizes our upper and lower bounds.
2202: We have shown that greedy routing in an overlay network organized as a
2203: random graph in a metric space can be a nearly optimal mechanism for
2204: searching in a peer-to-peer system, even in the presence of
2205: many faults.  We see this as an important first step in the design of
2206: efficient algorithms for such networks, but many issues still need to
2207: be addressed.  Our results mostly apply to one-dimensional metric
2208: spaces like the line or a circle.  One interesting possibility is
2209: whether similar strategies would work for higher-dimensional spaces,
2210: particularly ones in which some of the dimensions represent the actual
2211: physical distribution of the nodes in real space; good
2212: network-building and search mechanisms for this model might allow
2213: efficient location of nearby instances of a resource without having to
2214: resort to local flooding (as in~\cite{KKD01}).
2215: Another promising direction would be to study the security properties
2216: of greedy routing schemes to see how they can be adapted to provide
2217: desirable properties like anonymity or robustness against Byzantine
2218: failures.
2219: 
2220: 
2221: \section{Acknowledgments}
2222: 
2223: The authors are grateful to Ben Reichardt for pointing out an error in
2224: an earlier version of Lemma~\ref{lemma-log-drop}.
2225: 
2226: \bibliographystyle{abbrv}
2227: \bibliography{paper}
2228: 
2229: \end{document}
2230: