cs0206012/race.tex
1: \documentclass[11pt]{article}
2: 
3: \usepackage{cite}
4: \usepackage{xspace}
5: 
6: \newtheorem{theorem}{Theorem}
7: \newtheorem{lemma}[theorem]{Lemma}
8: \newtheorem{corollary}[theorem]{Corollary}
9: \newenvironment{proof}{\noindent\par{\bf Proof:
10: }}{\nopagebreak\rule{1 ex}{0.8 em}\medskip}
11: 
12: \newcommand{\algname}{{\sc{lean-consensus}}\xspace}
13: 
14: \newcommand{\bigN}{\mbox{\bf N}}
15: 
16: \newcommand{\newloglike}[2]{\newcommand{#1}{\mathop{\rm #2}\nolimits}}
17: \newloglike{\E}{E}
18: 
19: \newcommand{\remove}[1]{}
20: 
21: \newcommand{\floor}[1]{\left\lfloor{#1}\right\rfloor}
22: 
23: \newcommand{\rmax}{r_{\max}}
24: 
25: \begin{document}
26: 
27: \title{Fast Deterministic Consensus in a Noisy Environment}
28: 
29: \author{James Aspnes\thanks{
30: Yale University, Department~of
31: Computer Science, 51 Prospect Street/P.O. Box 208285, New
32: Haven CT 06520-8285.
33: Email: \texttt{aspnes@cs.yale.edu}.
34: This work was supported in part by NSF grants CCR-9820888 and
35: CCR-0098078.}}
36: 
37: \maketitle
38: 
39: \begin{abstract}
40: It is well known that the consensus problem cannot be solved
41: deterministically in an asynchronous environment, but that
42: randomized solutions are possible.
43: We propose a new model, called \emph{noisy scheduling}, in which an
44: adversarial schedule is perturbed randomly, and show
45: that in this model randomness in the environment can substitute for
46: randomness in the algorithm.
47: In particular, we show that
48: a simplified, \emph{deterministic} version of 
49: Chandra's wait-free shared-memory consensus algorithm
50: (PODC, 1996, pp.~166--175)
51: solves consensus 
52: in time at most logarithmic in the number of \emph{active} processes.
53: The proof of termination
54: is based on showing that a race between independent
55: delayed renewal processes produces a winner quickly.
56: In addition, we show that the protocol finishes in constant time using
57: quantum and priority-based scheduling on a uniprocessor, suggesting
58: that it is robust against the choice of model over a wide range.
59: \end{abstract}
60: 
61: \section{Introduction}
62: 
63: Perhaps the single most dramatic result in the theory of distributed
64: computing is Fischer, Lynch, and Paterson's proof of the impossibility
65: of deterministic consensus in an asynchronous environment with
66: failures \cite{FischerLP85}.  This result and its extensions 
67: \cite{LouiA1987,DolevDS87}
68: show that the consensus
69: problem, in which a group of processes must collectively agree on a
70: bit, cannot be solved deterministically in an asynchronous message-passing
71: or shared-memory model if an unrestricted adversary
72: controls scheduling.  
73: Solutions to the shared-memory version of this fundamental
74: problem have
75: thus taken the approach of restricting the adversary, either by
76: allowing randomization that limits the adversary's knowledge
77: \cite{ChorIL1994,Abrahamson1988,AttiyaDS1989,AspnesH90,SaksSW91,Aspnes93,BrachaR1991,Chandra1996,AumannB1996,Aumann1997} 
78: or by
79: imposing timing constraints that limit the adversary's control
80: \cite{DolevDS87,DworkLS88,AlurAT1997}.  
81: As a corollary to granting less power to the
82: adversary, these solutions often involve granting more power to the
83: algorithm, in the form of the ability to obtain random
84: bits or explicitly delay steps.
85: By using these additional powers an algorithm can escape the FLP bound
86: and reach agreement.
87: 
88: These additional powers come at a cost.  Randomization alone is
89: not powerful enough to allow sublinear consensus protocols
90: \cite{Aspnes1998}, so
91: efficient randomized solutions have required additional constraints on
92: the ability of the adversary to observe the arguments to operations and
93: the contents of unread memory locations
94: \cite{Chandra1996,AumannB1996,Aumann1997}.
95: These algorithms carefully manage common pools of unread
96: random bits for future use, a clever but odd-looking practice that
97: is justified primarily by the specific details of the model.  The delay-based
98: algorithm of \cite{AlurAT1997} is less convoluted, but still depends on
99: using explicit delays that at the minimum require 
100: that a process has the power to
101: invoke them and at worst may add unnecessary delay when
102: few processes participate.
103: 
104: As an alternative to designing an algorithm specifically to exploit
105: the weaknesses of a particular adversary model, we consider the
106: approach of using a simple algorithm that guarantees agreement but
107: relies on good luck to terminate.
108: Our \algname{} algorithm, described in Section
109: \ref{section-algorithm}, is obtained by removing all of the
110: randomized parts of a similar algorithm due to Chandra
111: \cite{Chandra1996}.  
112: The essential idea (which is the core of many consensus protocols in the
113: literature) is to stage a race between those processes that
114: prefer $0$ and those that prefer $1$, with the rule that
115: if a slow process sees that
116: faster processes are all in agreement it adopts their common
117: preference.
118: The race is implemented using two arrays of atomic read/write bits.
119: The algorithm terminates when the fastest processes are all in
120: agreement and can decide on their preferred value
121: safely, knowing that other processes will
122: adopt the same preference before they catch up.
123: As shown in Section~\ref{section-agreement}, this mechanism is enough
124: to ensure that if any one process
125: decides then all other processes soon decide on the same value,
126: no matter how the adversary arranges the schedule.
127: 
128: In effect, the race framework allows the processes to detect agreement
129: once it occurs.
130: But unlike other consensus algorithms, \algname{} makes no attempt to
131: cajole the processes into reaching agreement--- it relies
132: entirely on the hope that some process eventually pulls ahead of the
133: others.  In order to dash this hope, the adversary must exercise enough
134: control to ensure that the fastest processes run in lockstep.  We
135: believe that in many natural system models it will be difficult for the
136: adversary to exercise this much control.
137: 
138: One such model is what we call the
139: \emph{noisy scheduling} model, 
140: described in Section
141: \ref{section-noisy}.  In this model, the
142: adversary proposes a schedule 
143: that specifies the order in which read and write operations occur,
144: but this schedule is perturbed by random
145: noise drawn from some arbitrary non-constant distribution.
146: This noise corresponds to random factors in a system that might not
147: be
148: strongly correlated with the algorithm's behavior, such as network
149: delays, clock skew, or bus or memory contention.
150: 
151: We show in Section~\ref{section-noisy-termination} 
152: that, in the noisy scheduling model, \algname{} terminates with expected
153: $\Theta(\log n)$
154: work per process, where $n$ is the number of active
155: processes.
156: This result is distribution-independent, in the sense that the
157: algorithm's asymptotic performance does not depend on the noise
158: distribution in the model (though the constant factor does), and it 
159: holds even if processes are subject to random halting failures.
160: Because the algorithm's performance depends only on the
161: number of processes actually executing the protocol and not on 
162: the total number of processes in the system, 
163: it is \emph{adaptive} in the sense of \cite{AttiyaF1998},
164: which implies
165: it is \emph{fast} in the
166: sense of 
167: \cite{Lamport87,AfekDT95}.
168: Thus it is well-suited to situations where only one or a few processes
169: attempt to run the algorithm at the same time.
170: 
171: Our noisy scheduling model
172: is similar to the model used by Gafni and Mitzenmacher
173: \cite{GafniM1999}
174: in their analysis of mutual exclusion protocols with random timing, but is
175: extended to include constant delays inserted by the adversary in
176: addition to random delays.
177: Another source of inspiration is Koutsoupias and Papadimitriou's
178: \emph{diffuse adversary}
179: \cite{KoutsoupiasP94}, 
180: which chooses a distribution over executions in which no branch at any
181: decision point can occur with probability more than some fixed
182: $\epsilon$.
183: Our model is not the first in which an adversary chooses
184: parameters for a stochastic process that then controls scheduling; a
185: sophisticated model of this type, based on asynchronous PRAMs, has been
186: proposed by Cole and Zajicek \cite{ColeZ95}.
187: 
188: To give support to our intuition that many possible restrictions
189: on the adversary make \algname{} work, we also consider what happens
190: with a hybrid quantum and priority-based scheduler on a uniprocessor,
191: following the approach of \cite{AndersonM1999}.  (The details of this
192: model, which subsumes both quantum scheduling and priority-based
193: scheduling, are sketched in Section \ref{section-hybrid}.)  We show in
194: Section \ref{section-hybrid-termination} that \algname{} terminates in
195: $O(1)$ steps in the hybrid-scheduling
196: model, as long as the quantum is at least 8.
197: The restriction to a uniprocessor is
198: necessary because \cite{AndersonM1999} shows that no
199: deterministic algorithm can solve consensus with multiple processors,
200: even with hybrid scheduling,
201: without using stronger primitives than atomic read/write registers.
202: 
203: Our basic consensus algorithm requires infinitely long arrays.
204: Obviously this is
205: undesirable in a real system.  In order to bound the required space,
206: we adopt a technique from \cite{Chandra1996} and
207: cut off the algorithm after consuming $O(\log^2 n)$ bits
208: of space,
209: using the preference each undecided process has at that point as input to a
210: more expensive, 
211: bounded-memory consensus algorithm satisfying the validity
212: property.\footnote{An early example of this approach 
213: is found in
214: the bounded-rounds randomized Byzantine agreement
215: protocol of Goldreich and Petrank~\cite{GoldreichP1990}, which
216: switches from a randomized to a deterministic protocol if the
217: randomized protocol does not terminate quickly enough.}
218: Since the more expensive algorithm is only run with low probability,
219: its higher costs do not increase the expected time for the algorithm as
220: a whole by more than a small constant factor.
221: Details are given in Section~\ref{section-bounded-memory}.
222: 
223: Section~\ref{section-simulations} 
224: describes some simulation results
225: that show that the constant factors in the noisy scheduling
226: analysis are in fact quite small for plausible noise distributions,
227: suggesting that the good theoretical performance of \algname{} might actually
228: translate into fast execution in a real system.
229: 
230: In Section \ref{section-extensions}, we suggest a number of directions
231: in which the current work could be extended, including extensions to
232: the noisy scheduling model.
233: One interesting possibility is the inclusion of adaptive crash
234: failures.  We
235: argue briefly that 
236: because \algname{} recovers quickly from such failures,
237: it terminates in at most $O(f \log n)$ work per process even if up to
238: $f$ processes fail.  However, there remains an interesting open
239: question whether noisy scheduling is enough to get $O(\log n)$
240: performance even with $\Theta(n)$ crash failures.
241: 
242: \section{The Consensus Problem}
243: 
244: In the \emph{binary consensus problem}, a group of $n$
245: processes, possibly subject to halting failures,
246: must agree on a bit.\footnote{Some authors 
247: consider the stronger problem of \emph{id consensus},
248: in which the decision value is the id of
249: some active process.  In many cases, id consensus can be
250: solved in a natural way using a $(\lg n)$-depth tree of binary
251: consensus protocols; examples of this approach can be found in
252: \cite{Chandra1996,Aumann1997}.}
253: A \emph{consensus protocol} is a distributed algorithm in which each
254: non-faulty
255: process starts with an input bit and eventually terminates by deciding
256: on an output bit.  
257: It must satisfy
258: the following three conditions with probability 1:
259: 
260: \begin{itemize}
261: \item \emph{Agreement.} All non-faulty processes decide on the same bit.
262: \item \emph{Termination.} All non-faulty processes finish the protocol in
263: a finite number of steps.
264: \item \emph{Validity.} If all processes start with the same input bit,
265: all non-faulty processes decide on that bit.\footnote{
266: Some definitions of consensus replace the validity condition with a
267: weaker \emph{non-triviality} condition that says that there must exist
268: executions in which different decision values occur.  
269: }
270: \end{itemize}
271: 
272: \section{Model}
273: 
274: We assume a shared-memory system consisting of an unbounded number of
275: processes that communicate only through shared atomic read/write
276: registers.
277: We use the usual interleaving model, in which operations are assumed to
278: occur in a sequence $\pi_1,\pi_2,\ldots$, and in which each read
279: operation returns the value of the last previous write to the same
280: location.  The order in which operations occur is determined by a
281: stochastic process that is partially under the control of an
282: adversary (Section \ref{section-noisy}), or directly by the adversary
283: subject to certain regularity constraints (Section
284: \ref{section-hybrid}).
285: 
286: \subsection{Noisy Scheduling}
287: \label{section-noisy}
288: 
289: In the \emph{noisy scheduling} model, we assume that the adversary
290: specifies when operations occur (subject to an upper bound on the time
291: between successive operations by the same process), but that this
292: specification is perturbed by random noise.
293: 
294: Formally, the adversary chooses:
295: 
296: \begin{enumerate}
297: \item An arbitrary starting time $\Delta_{i0}$ for each process $p_i$,
298: 
299: \item A non-negative 
300: delay $\Delta_{ij}$ between process $p_i$'s $(j-1)$-th and $j$-th
301: operations, bounded by some fixed constant $M$, and
302: 
303: \item A fixed common distribution $F_{\pi}$ of the random delay 
304: added to
305: each type of operation
306: $\pi$ (e.g., read or write).
307: If process $p_i$'s $j$-th operation is of type $\pi$,
308: it suffers an additional delay $X_{ij}$ whose distribution is $F_\pi$.
309: There is no restriction on the choice of 
310: the $F_\pi$,
311: except that they must not be concentrated on a point and must produce
312: only non-negative values
313: $X_{ij}$.\footnote{In fact, the $F_\pi$ distributions can be quite
314: bizarre; it is not required, for example, that the $X_{ij}$ have finite
315: expectation.}
316: \end{enumerate}
317: 
318: The time of process $p_i$'s $j$-th operation is given by
319: \begin{displaymath}
320: S_{ij} = 
321: \Delta_{i0} + \sum_{k = 1}^{j} \left(\Delta_{ik} + X_{ik}\right).
322: \end{displaymath}
323: 
324: Since we are using interleaving semantics, the effect of executing two
325: operations at exactly the same time is not well-defined.  To avoid
326: ill-defined executions, we
327: impose the additional technical constraint on the adversary's choices that
328: the probability that any two operations occur
329: simultaneously must be zero.  This is automatic if, for example, the
330: noise distributions $F_\pi$ are continuous.  Alternatively, it can
331: be arranged by dithering the starting times of each process by some
332: small epsilon.  This technical constraint does not qualitatively change
333: our results.
334: 
335: Below we discuss the unfairness 
336: of noisy scheduling and extensions to
337: allow random failures.
338: 
339: \subsubsection{Unfairness}
340: 
341: The upper bound on the $\Delta_{ij}$
342: and the common distribution on the $X_{ij}$ might suggest that the
343: noisy scheduling model produces fair schedules.  This is not entirely
344: true for sufficiently pathological distributions.
345: \begin{theorem}
346: There exists a choice of $F_\pi$ and $\Delta_{ij}$ such that for any
347: distinct processes $p_i$ and $p_{i'}$, and any operation $j$,
348: the expected number of operations $p_{i'}$ completes between
349: $p_i$'s $j$-th and $(j+1)$-th operations is infinite.
350: \end{theorem}
351: \begin{proof}
352: Set each $F_\pi$ so that
353: $X_{ij}$ takes on the value $2^{k^2}$ with probability
354: $2^{-k}$ for $k = 1, 2, \ldots$.  For simplicity, let us suppose 
355: that $\Delta_{ij}=0$ for $j > 0$.  We will also assume that $A$ and $B$ 
356: execute no operations before time $0$.
357: 
358: Let $X$ be the number of operations completed by $p_{i'}$ between
359: $S_{ij}$ and $S_{i,j+1}$.  We will show that the expectation of $X$ is
360: infinite conditioned on the value of
361: $t = \lceil S_{ij} \rceil$ (the ceiling is so that we have countably
362: many cases).
363: 
364: The idea is this: for each $k$ we have probability $2^{-k}$ that
365: $S_{i,j+1} \ge X_{i,j+1} = 2^{k^2}$.  
366: Condition on this event occurring for some particular $k$ and
367: consider how many operations $p_{i'}$ must execute to reach time
368: $2^{k^2}$.  Either (a) one of these operations
369: takes time $2^{k^2}$ or more (with
370: probability $2^{-k+1}$ per operation); or 
371: (b) a total of at least $2^{2k-1}$ faster
372: operations, each of which
373: takes at most $2^{(k-1)^2}$ time, must occur.
374: If we wait only for event (a), we expect to see $2^{k-1}$ operations;
375: to get the actual expected number, we must subtract off the expected
376: number of operations until (a) occurs after (b) occurs ($2^{k-1}$
377: again) multiplied by the probability that (b) occurs.  This latter
378: probability is at most 
379: $(1-\frac{1}{2^{k-1}})^{2k-1}$, which goes to $e^{-2}$ in the limit as
380: $k$ grows; it follows that $p_{i'}$ executes $\Omega(2^k)$ operations
381: on average before time $2^{k^2}$.  Of these, at most $t/2$ can occur
382: before time $S_{ij}$, so if $k \gg \lg t$, we have $\Omega(2^k)$
383: operations on average between $t$ and $2^{k^2}$, and thus also between
384: $S_{ij}$ and $S_{i,j+1}$, since $S_{ij} \le t < 2^{k^2} \le S_{i,j+1}$.
385: 
386: To get the full result, we must remove two layers of conditioning.
387: First compute the expectation conditioned only on $t$ by
388: summing $2^{-k} \Omega(2^k)$ for each of the infinitely many
389: sufficiently large $k$.  It is
390: not difficult to see that this sum diverges and the expectation is
391: infinite.  Summing over all values of $t$ doesn't make it any less
392: infinite, and we are done.
393: \end{proof}
394: 
395: \subsubsection{Failures}
396: 
397: We can extend the noisy scheduling model to allow halting failures.
398: For each $i$ and each $j > 0$ let $H_{ij} = \infty$ if process $p_i$
399: halts before its $j$-th operation and $0$ otherwise.  Define
400: \begin{displaymath}
401: S'_{ij} = 
402: \Delta_{i0} + \sum_{k = 1}^{j} \left(\Delta_{ik} + X_{ik} +
403: H_{ik}\right),
404: \end{displaymath}
405: with the usual convention for the extended real line 
406: that $x+\infty = \infty+x = \infty$ for any finite $x$.
407: If $S'_{ij} = \infty$, $p_i$'s $j$-th operation does not occur.
408: 
409: We do not include failures in the noise distributions $F_\pi$ because
410: these distributions do not depend on $n$, and a constant probability of
411: failure would mean that all processes die after $O(\log n)$ steps.
412: Instead, we assume that failures occur
413: independently with probability $h(n)$ per operation, where $h$ is some
414: function chosen by the adversary.  The effect of stronger failure
415: models is discussed in Section \ref{section-extensions}.
416: 
417: \subsection{Quantum and Priority-Based Scheduling}
418: \label{section-hybrid}
419: 
420: Our intuition is that \algname{} should perform well in any setting
421: that prevents lockstep executions.  One such setting is the
422: hybrid-scheduled
423: uniprocessor model of \cite{AndersonM1999}, which combines the
424: priority-based scheduling model of \cite{RamamurthyMA1996}
425: with the quantum-based
426: scheduling model of \cite{AndersonJO1998}.  In this model, processes are
427: assumed to be time-sharing a uniprocessor under the control of a
428: pre-emptive scheduler.  Each process has a priority, and a process may
429: be pre-empted at any time by a process of higher priority.  A process
430: may only be pre-empted by a process of the same priority if it has
431: exhausted its \emph{quantum}, a minimum number of operations it must
432: complete between the time it wakes up and the time at which it becomes
433: vulnerable to pre-emption.  There is no requirement that a process
434: start the protocol at the beginning of a quantum; it may have used up
435: some or all of its quantum performing other work before starting the
436: protocol.  We do not consider failures in the hybrid-scheduling model; instead,
437: a process may be arbitrarily delayed subject to the constraints on the
438: scheduler.
439: 
440: \section{The \algname{} Algorithm}
441: \label{section-algorithm}
442: 
443: In this section, we describe the \algname{} algorithm.
444: The algorithm is very simple, because we are relying on 
445: randomness in the environment
446: to guarantee termination and thus the algorithm itself must
447: only guarantee correctness and provide the opportunity for the
448: underlying system to quickly jostle it into a decision state.
449: Structurally, it is essentially identical
450: to the 
451: multi-writer register consensus protocol of Chandra
452: \cite{Chandra1996}
453: with the shared coins removed,
454: leaving only the implementation from multi-writer bits
455: of the ``racing counters'' technique that has
456: been used in many shared-memory consensus protocols.
457: It also bears some similarities
458: to the Time-Adaptive Consensus algorithm of Alur et
459: al.\cite{AlurAT1997} with the delays removed.
460: 
461: At each step of the algorithm, each process \emph{prefers} either 0 or 1 as
462: its decision value.  The conflict between the 0-preferring processes and
463: the 1-preferring processes is settled by a race implemented using two
464: arrays $a_0$ and $a_1$ of atomic read/write bits, each initialized to
465: zero.  Each process
466: carries out a sequence of rounds, each consisting of a fixed sequence
467: of operations.  During round $r$, a process that prefers $b$ marks
468: location $a_b[r]$ with a one
469: and looks to see if either (a) it has fallen behind
470: its rivals who prefer $(1-b)$, in which case it abandons its former
471: preference and joins the winning team, or (b) it and its fellows have
472: sped far enough
473: ahead of any rival processes that they can safely decide $b$ knowing
474: that those rivals will give up and join the $b$ team before they catch
475: up.
476: The algorithm finishes fastest when the pack of processes disperses
477: quickly, so that a clear winner emerges as early as possible.
478: 
479: Let us look more closely at the details of the algorithm.
480: A process with input $b$ sets its preference $p$ to $b$ 
481: and its round number $r$ to $1$.
482: (We say that a process is \emph{at
483: round $r$} if its round number is set to $r$; processes thus start at round
484: $1$.)
485: It then repeatedly executes the following
486: sequence of steps.  To simplify the description of the algorithm, we
487: assume that while $a_0$ and $a_1$ are initialized to zeroes,
488: they are prefixed with (effectively read-only)
489: locations $a_0[0]$ and
490: $a_1[0]$,
491: both set to $1$.
492: 
493: \begin{enumerate}
494: \item Read $a_0[r]$ and $a_1[r]$.  If for some $b$,
495: $a_b[r]$ is $1$ and $a_{1-b}[r]$
496: is $0$, set $p$ to $b$.
497: \item Write $1$ to $a_p[r]$.
498: \item Read $a_{1-p}[r-1]$.  If this value is $0$, decide $p$ and exit.
499: \item Otherwise, set $r$ to $r+1$ and repeat.
500: \end{enumerate}
501: 
502: Note that in each round the process carries out exactly four
503: operations in the same sequence: two reads, a write, and another read.
504: It is tempting to optimize the algorithm by eliminating the write when
505: it is already evident from the previous step that $a_p[r]$ is set or
506: eliminating the last read when it can be deduced from the value of
507: $a_{1-p}[r]$ that $a_{1-p}[r-1]$ is set.  
508: However, this optimization reduces the work done by slow processes
509: (whom we'd like to have fall still further behind) while maintaining the same
510: per-round cost for fast processes (whom we'd like to have pull 
511: ahead).  So we must paradoxically carry out operations that 
512: might appear to be superfluous in order to minimize the actual
513: total cost.
514: 
515: 
516: \section{Agreement and Validity}
517: \label{section-agreement}
518: 
519: If we ignore the termination requirement, the correctness of the
520: algorithm does not depend on the behavior of the scheduler.  The
521: following two lemmas show that the validity and agreement properties
522: hold whenever the algorithm terminates.  The proofs are very similar
523: in spirit to those of Lemmas 1-4 in \cite{Chandra1996}.
524: 
525: \begin{lemma}
526: \label{lemma-converging-preferences}
527: No process sets $a_{b}[r]$ unless (a) $r = 1$ and $b$ is an input
528: value, or (b) $r > 1$ and $a_{b}[r-1]$ has already been set.
529: \end{lemma}
530: \begin{proof}
531: Consider the first process $P$ that sets $a_{b}[r]$.  
532: Then $P$ does not read $1$ from $a_{b}[r]$ at round $r$ and does not
533: change its preference during round $r$.  If $r=1$, $P$'s preference
534: equals its input, establishing case (a); if $r > 1$, $P$ must have set
535: $a_{b}[r-1]$ at round $r-1$, establishing case (b).
536: \end{proof}
537: 
538: \begin{lemma}
539: \label{lemma-validity}
540: If every process starts with the same input bit $b$, every process
541: decides $b$ after executing 8 operations.
542: \end{lemma}
543: \begin{proof}
544: From Lemma~\ref{lemma-converging-preferences},
545: if no process has input $1-b$, no process ever sets $a_{1-b}[1]$.  It
546: follows that every process sees a zero in $a_{1-b}[1]$ at round 2 and
547: decides $b$.
548: \end{proof}
549: 
550: \begin{lemma}
551: \label{lemma-agreement}
552: If some process decides $b$ at round $r$, 
553: then 
554: (a) no process ever writes $a_{1-b}[r]$,
555: and
556: (b) every process decides $b$ at
557: or before round $r+1$.
558: \end{lemma}
559: \begin{proof}
560: Let $P$ decide $b$ at round $r$.
561: We will show that this implies that no process ever sets $a_{1-b}[r]$.  
562: 
563: Suppose some process sets $a_{1-b}[r]$; let $Q$ be the first such
564: process.  Because $Q$ is the first process to set $a_{1-b}[r]$, it must read a
565: $0$ from $a_{1-b}[r]$ at the start of round $r$.  Thus $Q$ can only set
566: $a_{1-b}[r]$ if it already prefers $1-b$ at the start of round
567: $r$, implying that it set $a_{1-b}[r-1]$ during round $r-1$; 
568: and if it reads a $0$ from $a_{b}[r]$ at the start of round $r$,
569: preventing it from changing its preference after seeing a $0$ in
570: $a_{1-b}[r]$.
571: But $Q$'s read of $a_{b}[r]$ occurs after $Q$'s write to
572: $a_{1-b}[r-1]$, which occurs after $P$'s read of $a_{1-b}[r-1]$ at
573: round $r$ (because $P$ reads $0$), 
574: which in turn occurs after $P$'s write to $a_b[r]$.  Thus
575: $Q$ reads $1$ from $a_b[r]$, and changes its preference to $b$ at
576: round $r$.  This contradicts our
577: assumption that $Q$ is the first to set $a_{1-b}[r]$. 
578: It follows that if any process decides $b$ in round $r$, no process sets
579: $a_{1-b}[r]$.
580: 
581: Since no process sets $a_{1-b}[r]$, any process that reaches round
582: $r+1$ must set $a_{b}[r+1]$ (by Lemma~\ref{lemma-converging-preferences}),
583: and will decide $b$ after reading $0$ from $a_{1-b}[r]$.
584: Thus no process runs past round $r+1$ without deciding $b$.
585: 
586: To show agreement in earlier rounds, let $P'$ decide $b'$ at round $r'
587: \le r$.  By the preceding argument, if $P'$ decides $b'$ at round
588: $r'$, then no process sets $a_{1-b'}[r']$ and thus (by
589: Lemma~\ref{lemma-converging-preferences} again) no process sets
590: $a_{1-b'}[r]$.  But since $P$ sets $a_{b}[r]$, we must have $b'=b$.
591: \end{proof}
592: 
593: \section{Termination with Noisy Scheduling}
594: \label{section-noisy-termination}
595: 
596: In this section, we show that \algname{} terminates in $\Theta(\log n)$
597: rounds with noisy scheduling and random failures.  (This analysis includes the
598: core model without random failures as well, since the adversary can
599: always choose $h(n)=0$.)
600: We show that
601: either all processes die (in which case we treat the algorithm as
602: terminating in the last round in which some process takes a step), or some
603: group of processes with a common preference eventually gets two
604: rounds ahead of the other processes.  To avoid analyzing the
605: details of how processes shift preferences, we will show the even
606: stronger result that unless all processes die,
607: a \emph{single} process eventually gets two rounds
608: ahead of the other processes.  
609: 
610: To simplify the argument, we abstract away from the individual
611: sequence of operations in each round and look only at the times at
612: which rounds are completed.  We can thus assume that the adversary
613: provides a single noise distribution $F$ (corresponding to the
614: distribution of the sum of the delays on three reads and one write)
615: and that the values $\Delta_{ij}$, $X_{ij}$, and $H_{ij}$ 
616: provide the delay not
617: on the $j$-th \emph{operation} but on the $j$-th \emph{round}.
618: Since this abstraction merely involves summing together the underlying
619: variables on operations, it does not reduce the adversary's control
620: over the protocol.  We will scale $M$ appropriately so that it is
621: still the case that $0 \le \Delta_{ij} \le M$ when $j > 0$.
622: 
623: Using this approach, the increment $\Delta_{ij} + X_{ij} + H_{ij}$ 
624: is the time taken for
625: process $i$ to move from the end of round $j-1$ to the end
626: round $j$.  The
627: constant $\Delta_{i0}$ represents the process's starting time, and 
628: $S'_{ir} = \Delta_{i0} + \sum_{j=1}^{r} \left(\Delta_{ij} +
629: X_{ij} + H_{ij}\right)$
630: gives the time at which the process finishes round $r$.  A process $i$ wins
631: the race with a lead of $c$ rounds
632: at round $r+c$ if it finishes round $r+c$ before any other
633: process finishes round $r$, i.e., if $S'_{i,r+c} \le S'_{i',r}$ for all
634: $i' \ne i$.
635: 
636: We would like to show a bound on how the expected round at which some
637: process wins by $c$ scales as a function of the number of processes $n$,
638: keeping $c$, $M$, and $F$ fixed.  
639: This bound is given in Corollary \ref{corollary-race} below.
640: We will assume that $h(n) = o(1)$, as otherwise all processes die after
641: $O(\log n)$ rounds on average.
642: The proof proceeds in two steps: first we show that for \emph{any} $r$
643: which some process finishes with at least constant probability, there
644: exists a critical time $t$ that gives at least a constant probability
645: that $S'_{ir} \le t$ for exactly one $i$.  We then show that if $r$ is
646: large enough, $\Pr[S'_{i,r+c} \le t | S'_{ir} \le t]$ is also at least a
647: constant.  It then follows that the probability that $S'_{i,r+c} \le t$
648: while
649: $S'_{i'r} > t$ for any $i' \ne i$ is at least the product of these two
650: constants and the constant probability that $p_i$ is not killed
651: between rounds $r$ and $r+c$.  
652: Thus after a constant number of phases each consisting
653: of $r+c$ rounds we expect some process to win.
654: 
655: \subsection{Existence of a winner}
656: 
657: In this section, we build up the tools needed to show that for each
658: round there exists a fixed time at which there is likely to be a
659: unique winner.
660: 
661: \begin{lemma}
662: \label{lemma:one-vs-zero}
663: Let $A_1, \ldots, A_n$ be independent events.
664: If the probability that no $A_i$ occurs is $x$,
665: where $x$ is not zero,
666: then the probability that exactly one $A_i$ occurs is
667: at least $-x \ln x$.
668: \end{lemma}
669: \begin{proof}
670: Let $q_i$ be the probability that $A_i$ does not occur.
671: The probability $x$ that no $A_i$ occurs is the product of the $q_i$.
672: Since $x$ is nonzero, each $q_i$ must also be nonzero.
673: The probability that exactly one $A_i$ occurs is given by
674: \begin{eqnarray}
675: \left(\prod_{i=1}^{n} q_i\right) \sum_{i=1}^{n} \frac{1-q_i}{q_i}
676: &=&
677: x \sum_{i=1}^{n} \left(\frac{1}{q_i} - 1 \right)
678: \nonumber
679: \\
680: &=&
681: x \left(-n + \sum_{i=1}^{n} \frac{1}{q_i}\right).
682: \label{eq:one-vs-zero}
683: \end{eqnarray}
684: 
685: Let $G$ be the geometric mean of the $q_i$ and let $H$ be their
686: harmonic mean.  By the theorem of the means, $G > H$.
687: Observe that $G = x^{1/n}$ and
688: \begin{displaymath}
689: \sum_{i=1}^{n} \frac{1}{q_i} = n/H
690: > n/G = n x^{-1/n} = n \exp\left(-\frac{\ln x}{n}\right)
691: \ge n \left(1 - \frac{\ln x}{n}\right)
692: = n - \ln x.
693: \end{displaymath}
694: Plugging this inequality into (\ref{eq:one-vs-zero}) gives the result.
695: \end{proof}
696: 
697: Suppose $X_1, \ldots, X_n$ are random times.  The following lemma
698: shows that under certain conditions there exists a constant time
699: $t_0$,
700: such that, with constant probability, at most one of the $X_i$ is less
701: than $t_0$:
702: 
703: \begin{lemma}
704: \label{lemma-winner}
705: Let $X_1, \ldots, X_n$ be
706: independent random variables such that for
707: all finite values $t$ and all distinct $i, j$, the probability that
708: $X_i = X_j = t$ is zero.
709: Then either $\Pr[\forall i X_i = \infty]$ is greater than $e^{-1}$ or
710: there exists $t_0$ such that the
711: probability that exactly one of the $X_i$
712: is less than or equal to $t_0$
713: is at least $1/5$.
714: \end{lemma}
715: \begin{proof}
716: For each $t$, let $q_i(t)$ be the probability that $X_i$ is not
717: less than or equal to $t$.
718: Let $q(t) = \prod_{i=1}^{n} q_i(t)$
719: be the probability that none of the $X_i$
720: are
721: less than or equal to $t$.
722: Note that each $q_i(t)$ is
723: a decreasing right-continuous left-limited function with
724: $\lim_{t \rightarrow -\infty} q_i(t) = 1$
725: and
726: $\lim_{t \rightarrow \infty} q_i(t) = \Pr[X_i = \infty]$.
727: Similarly, $q(t) = \prod_{i} q_i(t)$ is right-continuous,
728: left-limited, and has
729: $\lim_{t \rightarrow -\infty} q(t) = 1$
730: and
731: $\lim_{t \rightarrow \infty} q(t) = \Pr[\forall i X_i = \infty]$.
732: 
733: Suppose that this latter quantity is less than or equal to $e^{-1}$.
734: (If not, the first case of the lemma holds.)
735: Then for some finite $t$, $q(t) \le e^{-1}$.
736: Let $t_0$ be the least such $t$.
737: 
738: Now suppose $q(t_0) \ge e^{-2}$.
739: Then, by Lemma \ref{lemma:one-vs-zero},
740: the probability that exactly one $X_i$ is less than or equal to $t_0$
741: is at least $2 e^{-2} \approx 0.27\ldots$.
742: 
743: Otherwise, we have $q(t_0) < e^{-2}$ but
744: $q(t_0-) = \lim_{t \rightarrow t_0-} q(t) > e^{-1}$.
745: (We are using the usual convention
746: that $f(x-)$ denotes the left limit
747: of $f$ at $x$.)
748: This discontinuity must correspond to a discontinuity
749: in $q_i$ for some $i$.
750: At most one $q_i$ has a discontinuity at $t_0$,
751: by the assumption that the probability that distinct $X_i$, $X_j$
752: both equal $t_0$ is zero.
753: Hence, for all $j \ne i$ we have
754: $q_j(t_0-) = q_j(t_0)$
755: and thus
756: $q_i(t_0-)/q_i(t_0) = q(t_0-)/q(t_0) \le e^{-1}$.
757: 
758: Since $q_i(t_0-) \le 1$, it follows immediately that
759: $q_i(t_0) \le e^{-1}$
760: and thus the probability that $X_i$
761: is less than or equal to $t_0$
762: is at least $1 - e^{-1}$.
763: Now the probability that no other $X_j$
764: is less than or equal to $t_0$
765: is at least $q(t_0)/q_i(t_0) \ge q(t_0-) > e^{-1}$.
766: Since the variables are independent, the probability that
767: only $X_i$
768: is less than or equal to $t_0$
769: is thus at least $(1-e^{-1})e^{-1} \approx 0.23\ldots$.
770: \end{proof}
771: 
772: \subsection{Size of the lead}
773: 
774: In this section, we show that if enough rounds have passed, a
775: process that is likely to be ahead of the others is in fact likely to
776: be several rounds ahead.  The proof is somewhat complicated by the
777: lack of restrictions on the noise distribution, but the following
778: lemma shows how the Strong Law of Large Numbers can be used to smooth
779: the noise terms out a bit.
780: 
781: \begin{lemma}
782: \label{lemma-gap}
783: Let $X_1, X_2, \ldots$ be finite non-negative independent identically
784: distributed random variables whose common distribution is
785: not concentrated on a point.  Define $S_n =
786: \sum_{i=1}^{n} X_i$.  For any $c$, there exist
787: $n, t$ such that $\Pr[S_n < t] < \frac{1}{2}$
788: but $\Pr[S_n < t-c] > 0$.
789: \end{lemma}
790: \begin{proof}
791: Let us first consider the case where $X_i$ has a finite expectation
792: $m$.  Then the Strong Law of Large Numbers says that 
793: $S_n/n$ converges to $m$ in the limit with probability $1$.  
794: So for any $\epsilon > 0$, the probability that $S_n$ is less than
795: $m-\epsilon$ goes to zero and thus drops below $1/2$ for all $n$
796: greater than some $n_0$.
797: 
798: Let $t_n = n(m-\epsilon)$.  As long as $n > n_0$, we have 
799: $\Pr[S_n < t] < \frac{1}{2}$.
800: Now suppose that $\Pr[S_n < t_n-c] = 0$ whenever $n > n_0$.  Since the
801: $X_i$ are independent, this event can only occur if for each $X_i$,
802: $X_i < \frac{t_n - c}{n} = m-\epsilon-\frac{c}{n}$ with probability $0$.
803: Taking the union of countably many 
804: such bad events for each rational $\epsilon$ and
805: each $n > n_0$ shows that the event $X_i < m$,
806: also has probability $0$.  It follows that $X_i \ge E[X_i]$ almost
807: surely and thus the distribution of $X_i$ is concentrated on
808: $E[X_i]$, a contradiction.
809: 
810: If $X_i$ does not have a finite expectation, then $S_n/n$ grows without
811: bound with probability $1$ (see the corollary to Theorem 22.1 in
812: \cite{Billingsley}).  So for any $x$, there exists $n_0$, such that
813: $\Pr[S_n/n < x] < \frac{1}{2}$ for $n > n_0$.  We repeat the above
814: analysis for $t = nx$; if $\Pr[S_n < t-c] = 0$ for all such $t$, we get
815: $X_i \ge x - \frac{c}{n}$ almost surely, implying $X_i$ exceeds any
816: finite bound $x$.  Again, a contradiction.
817: \end{proof}
818: 
819: Once the noise terms have been smoothed, it is not hard to show that
820: they eventually accumulate enough to push a winner ahead:
821: 
822: \begin{lemma}
823: \label{lemma-gap-relative}
824: Fix $c > 0$.  
825: Let $X_1, X_2, \ldots$ be finite independent identically distributed random
826: variables such that there exists a threshold $t_0$ for which
827: $\Pr[X < t_0] < \frac{1}{2}$ but
828: $\Pr[X < t_0-c] = \delta_0 > 0$.
829: Define $S_n = \sum_{i=1}^{n} X_i$.
830: 
831: Then for any $\epsilon > 0$, there exists an $n =
832: O(\log(1/\epsilon))$,
833: such that for any $t$,
834: $\Pr[S_n < t] > \epsilon$
835: implies
836: $\Pr[S_n < t - c | S_n < t] > \frac{1}{7}\delta_0$.
837: \end{lemma}
838: \begin{proof}
839: Set $n = 8(\ln(1/\epsilon)+1)$.  Each $X_i$ has probability at most
840: $1/2$ of being less than $t_0$, so a simple application of Chernoff
841: bounds shows that the probability that 3/4 or more of the $X_i$ are
842: less than $t_0$ is at most
843: $e^{-n/8} = \epsilon/e$.
844: 
845: We will use this fact to argue that even when conditioning on $S_n <
846: t$, there is nearly one chance in four that $X_n$ in particular is
847: greater than $t_0$.  In this case, $S_{n-1}$ is less than $t-t_0$ and
848: we can use independence to replace $X_n$ with a new value less than
849: $t_0 - c$, giving a sum $S_n$ less than $t-c$, all without reducing the
850: probability by much.
851: 
852: Formally, we have the following sequence of inequalities, each of
853: which is implied by the previous one.  Let 
854: $\Pr[S_n < t] = p$ and suppose $p > \epsilon$.  Then we have:
855: 
856: \begin{eqnarray*}
857: \Pr[S_n < t] &=& p \\
858: \Pr[S_n < t \wedge 
859: \mbox{at least $\frac{1}{4}$ of $X_i$ are
860: greater than $t_0$}]
861: &>& p-\epsilon/e \\
862: \Pr[S_n < t \wedge X_n > t_0] &>& \frac{1}{4}(p-\epsilon/e) \\
863: \Pr[S_{n-1} < t - t_0] &>& \frac{1}{4}(p-\epsilon/e) \\
864: \Pr[S_{n-1} < t - t_0 \wedge X_n < t_0-c] 
865: &>&
866: \frac{1}{4}(p-\epsilon/e)\delta_0 \\
867: \Pr[S_n < t - c]
868: &>&
869: \frac{1}{4}(p-\epsilon/e)\delta_0 \\
870: \Pr[S_n < t - c | S_n < t]
871: &>&
872: \frac{1}{4}(p-\epsilon/e)\delta_0/p
873: \end{eqnarray*}
874: 
875: Since $p > \epsilon$, this last quantity is at least
876: $\frac{1}{4}(1-1/e)\delta_0$, which is in turn greater than
877: $\frac{1}{7}\delta_0$.
878: \end{proof}
879: 
880: We can now combine Lemmas \ref{lemma-gap} and
881: \ref{lemma-gap-relative} into the following:
882: 
883: \begin{lemma}
884: \label{lemma-gap-combined}
885: Let $X_1, X_2, \ldots$ be finite
886: non-negative independent identically distributed
887: random variables whose common distribution is not
888: concentrated on a point.  Define $S_n = \sum_{i=1}^{n} X_i$.
889: Fix $c > 0$.  Then there is a constant $\delta$,
890: such that for any $\epsilon > 0$, there exists $n =
891: O(\log(1/\epsilon))$,
892: such that $\Pr[S_n < t - c | S_n < t] > \delta$
893: whenever $\Pr[S_n < t] > \epsilon$.
894: \end{lemma}
895: \begin{proof}
896: Use Lemma \ref{lemma-gap} to group the $X_i$ together into partial
897: sums $Y_i = \sum_{j=i n_0+1}^{i n_0 + n_0} X_j$ with the property that
898: for some $t$
899: $\Pr[Y_i < t] < \frac{1}{2}$ but $\Pr[Y_i < t - c] = \delta_0 > 0$.
900: (Note that $n_0$ does not depend on $\epsilon$, so it disappears into
901: the constant factor.)
902: Then apply Lemma \ref{lemma-gap-relative} to sums of these $Y_i$
903: variables to get the full result.
904: \end{proof}
905: 
906: \subsection{When the Race Ends}
907: 
908: In this section, we show that a race between $n$ independent delayed
909: renewal processes with bounded added delays ends in $O(\log n)$
910: rounds with at least constant probability.  In the
911: following section, we
912: translate this result, which appears as Corollary
913: \ref{corollary-race}, back into terms of the \algname{} algorithm 
914: to get Theorem \ref{theorem-noisy}.
915: 
916: \begin{theorem}
917: \label{theorem-race}
918: Let $\{X_{ij}\}$, where $i, j \ge 1$,
919: be a two-dimensional array of finite non-negative
920: independent identically
921: distributed random variables with a common distribution function $F$
922: that is not concentrated on a point.  
923: Let $\{\Delta_{ij}\}$, where $i \ge 1, j \ge 0$, be a
924: two-dimensional array of constants with $0 \le \Delta_{ij} \le M$ 
925: when $j \ge 1$.
926: Let $\{H_{ij}\}$, where $i, j \ge 1$, be a two-dimensional array of
927: independent random variables, each of which is equal to $\infty$ with
928: probability $h(n)$ and $0$ otherwise.
929: Define 
930: \[S'_{ir} = \Delta_{i0} + \sum_{j=1}^{r} \left(\Delta_{ij} +
931: X_{ij} + H_{ij}\right).\]
932: Assume that for any finite $t$, integer $r$, and $i\ne j$, 
933: $\Pr[S'_{ir} = S'_{jr} = t] = 0$.
934: Let $c$ be any integer constant greater than $0$.
935: 
936: Then there exists a constant $\delta > 0$, such that 
937: for any $n$, there exists $r=O(\log n)$ and $t$,
938: such that
939: \begin{displaymath}
940: \Pr
941: \left[
942: \forall i \> S'_{ir} = \infty
943: \vee
944: \left(
945: \exists i \le n: S'_{i,r+c} < t \wedge \forall i' \ne i, i' \le n: S'_{i'r} > t
946: \right)
947: \right]
948: >
949: \delta.
950: \end{displaymath}
951: 
952: The constant factor in $r=O(\log n)$ and
953: the constant $\delta$ may depend on $c$, $F$, $M$, and $h$; but
954: neither constant
955: depends on $n$.
956: \end{theorem}
957: \begin{proof}
958: Since each $X_{ij}$ is finite with probability $1$, there exists some
959: constant $c_1$ such that 
960: $\Pr[\sum_{j=r+1}^{r+c} X_{ij} < c_1] > \frac{1}{2}$.
961: Let $T_{ir} = \sum_{j=1}^{r} X_{ij}$
962: and let $S_{ir} = T_{ir} + \sum_{j=0}^{r} \Delta_{ir}$.
963: Apply Lemma \ref{lemma-gap-combined} to the sequence
964: $X_{ij}$ with $c = cM + c_1$
965: and $\epsilon = n^{-2}$ 
966: to obtain $r = O(\log n)$ and a constant $\delta_0$ for which
967: $\Pr[T_{ir} < t-cM - c_1 | T_{ir} < t] > \delta_0$
968: whenever
969: $\Pr[T_{ir} < t] > n^{-2}$.
970: Adding the missing constant terms $\sum_{j=0}^{r} \Delta_{ij}$ 
971: to $T_{ir}$ to get $S_{ir}$
972: is equivalent to subtracting these same terms from each occurrence of
973: $t$,
974: so we in fact have
975: $\Pr[S_{ir} < t-cM - c_1 | S_{ir} < t] > \delta_0$
976: whenever
977: $\Pr[S_{ir} < t] > n^{-2}$.
978: This gives us our target round $r$.
979: 
980: Now apply Lemma \ref{lemma-winner} 
981: to $S'_{ir}$, for all $i \le n$, to show that with probability at least $1/5$
982: either $\forall i S'_{ir} = \infty$ or
983: there exists a time $t_0$,
984: such that there is a unique winner $i \le n$ 
985: for which $S'_{ir}$ is less than $t_0$.
986: Let us assume without loss of generality that $n$ is at least
987: $6$.
988: Throw out all cases where $i$ has $\Pr[S'_{ir} < t_0] \le n^{-2}$;
989: this leaves a probability of at least $1/5 - 1/n \ge 1/30$ that
990: (a) there is a unique winner $i$, and (b) $i$ satisfies the condition
991: $\Pr[S'_{ir} < t_0] > n^{-2}$, implying
992: $\Pr[S_{ir} = S'_{ir} < t_0] > n^{-2}$ and thus
993: $\Pr[S_{ir} < t_0-cM - c_1 | S_{ir} < t_0] > \delta_0$.
994: So with probability at least $\frac{1}{30}\delta_0$,
995: we have $S_{ir} < t_0 - cM - c_1$,
996: and thus
997: with probability at least $\frac{1}{60}\delta_0$
998: we have
999: $S_{i,r+c} < S_{ir} + cM + c_1 = S'_{ir} + cM + c_1 < t_0$.
1000: 
1001: Suppose that this event holds.  It is still possible for $S'_{i,r+c}$
1002: to be infinite if $\sum_{j=r+1}^{r+c} H_{ij} = \infty$.  Call this
1003: event $I$; if $\Pr[I] = 1 - (1-h(n))^c > \frac{1}{120}\delta_0$, then
1004: $h(n)$ is bounded below by a constant and there exists $r' = O(\log
1005: n)$ such that $\Pr[\forall i S'_{ir'} = \infty]$ is at least a
1006: constant.  Alternatively, we have 
1007: \(
1008: \Pr[S'_{i,r+c} = S_{i,r+c} < S'_{ir} + cM + c_1] 
1009: > \delta = \frac{1}{120}\delta_0.
1010: \)
1011: In either case, the theorem holds.
1012: \end{proof}
1013: 
1014: \begin{corollary} 
1015: \label{corollary-race}
1016: Let 
1017: $R$ be the first round for which either
1018: \begin{itemize}
1019: \item
1020: There
1021: exists $i$, such that 
1022: $S'_{i,R+c} < S'_{i'R}$ for all $i' \ne i$, or
1023: \item
1024: For all $i$, $S'_{i,R+c} = \infty$.
1025: \end{itemize}
1026: Under the conditions of the preceding theorem, 
1027: $\E[R] = O(\log n)$, and, for any $k \ge 0$,
1028: $\Pr[R > k] \le e^{-\floor{k/O(\log n)}}$.
1029: \end{corollary}
1030: \begin{proof}
1031: Theorem \ref{theorem-race} says that the
1032: desired event occurs with constant probability $\delta$ after 
1033: a phase consisting of $r = O(\log n)$
1034: rounds.  If it does not occur, we can apply the theorem again to the
1035: subset of the $i$'s for which $S'_{i,r+c}$ is finite,
1036: starting with round $r+c+1$ and setting
1037: the initial delay $\Delta_{i0}$ to the value of
1038: $S'_{i,r+c}$ from the previous phase.
1039: 
1040: On average, at most $1/\delta = O(1)$ such phases are needed, giving
1041: $\E[R] \le (1/\delta) r = O(\log n)$.
1042: For the exponential tail bound, observe that the probability that the
1043: algorithm runs for more than $c$ phases of $r$ rounds each is at most
1044: $(1-\delta)^{c} = \left((1-\delta)^{1/\delta}\right)^{c \delta}
1045: \le \left(e^{-1}\right)^{c \delta} = e^{-c \delta}$.
1046: So the probability that the algorithm runs for more than $k$ rounds is
1047: at most $e^{-\floor{k/r}\delta} \le e^{-\floor{k/O(\log n)}}$.
1048: \end{proof}
1049: 
1050: \subsection{When \algname{} Ends}
1051: 
1052: Translating Corollary \ref{corollary-race} back into terms of the
1053: \algname{} algorithm gives:
1054: 
1055: \begin{theorem}
1056: \label{theorem-noisy}
1057: Under the noisy scheduling model with random failures,
1058: starting from any reachable state in the \algname{} algorithm in which
1059: the largest round number of any process is $r$, the algorithm running
1060: with $n$ active processes
1061: terminates by round $r+r'$, where $r'$ has expected value $O(\log n)$
1062: and $\Pr[r' > k] \le e^{-\floor{k/O(\log n)}}$ for any $k \ge 0$.
1063: \end{theorem}
1064: \begin{proof}
1065: Apply Corollary \ref{corollary-race} with $c=2$ and the initial delays
1066: $\Delta_{i0}$ set to the times at which each process completes round
1067: $r$.  This shows that after $R$ additional 
1068: rounds, where $\E[R] = O(\log n)$ and
1069: $\Pr[R > k] \le e^{-\floor{k/O(\log n)}}$, either some process
1070: $P$
1071: finishes some round $s$ before any other process finishes round $s-2$,
1072: or all processes fail.
1073: In the first case, if $P$ prefers $b$, it is the only process
1074: to have written to $a_b[s-1]$ or $a_{1-b}[s-1]$ by the time it reads
1075: $a_{1-b}[s-1]$ as part of round $s$.  
1076: Thus it reads a zero from $a_{1-b}[s-1]$ and decides.  All other
1077: processes decide at most one round later by Lemma
1078: \ref{lemma-agreement}.  We thus get $r' \le R+1$, and the single extra round
1079: disappears into the constant factors.
1080: \end{proof}
1081: 
1082: It is not hard to see that an $O(\log n)$ bound
1083: is the best possible, up to
1084: constant factors.
1085: 
1086: \begin{theorem}
1087: \label{theorem-lower-bound}
1088: There exists a noise distribution $F$ and a set of delays $\Delta$
1089: such that the \algname{} algorithm requires expected $\Omega(\log n)$
1090: rounds in the noisy scheduling model, even without failures.
1091: \end{theorem}
1092: \begin{proof}
1093: Let all $\Delta_{ij} = 0$ for $j > 0$, and let $F$ have each
1094: operation take either $1$ or $2$ time units with equal probability.
1095: Then any single processor completes its first $\log n$
1096: operations in $1$ time unit each with probability $1/n$.
1097: To avoid simultaneous operations, let $\Delta_{i0}$ be some small
1098: distinct epsilon value for each $i$.
1099: 
1100: Start $n/2$ processes with input $0$ and $n/2$ with input $1$.
1101: The probability that there exists at least one $0$-input process and
1102: at least one $1$-input process that both complete their first $\log n$
1103: operations in $1$ time unit each is given by
1104: \begin{displaymath}
1105: \left(1-\left(1-\frac{1}{n}\right)^{n/2}\right)^2
1106: \end{displaymath}
1107: which goes to $(1-e^{-1/2})^2 = \Theta(1)$ in the limit as $n$ grows.
1108: So there is a constant probability that at least one
1109: process with each input runs for $\log n$ operations without ever
1110: changing its preference to that of
1111: a faster process with the opposite preference, and we get expected
1112: $\Omega(\log n)$ rounds of disagreement.
1113: \end{proof}
1114: 
1115: \section{Termination with Quantum and Priority-Based Scheduling}
1116: \label{section-hybrid-termination}
1117: 
1118: In this section, we consider the question of termination subject to
1119: hybrid quantum and priority-based scheduling on a uniprocessor.  The
1120: required quantum size is 8 operations; curiously, this is the same
1121: size required for the specialized algorithm given in
1122: \cite{AndersonM1999}.  We see this coincidence as hinting at the
1123: possibility 
1124: that all shared-memory
1125: consensus algorithms may ultimately converge to a single ideal
1126: algorithm (though such an ideal algorithm, if it exists, 
1127: is probably not identical to \algname{}).
1128: 
1129: \begin{theorem}
1130: When running \algname{}
1131: in a hybrid-scheduled system with
1132: a quantum of at least 8 operations, 
1133: every process decides after executing at
1134: most 12 operations.
1135: \end{theorem}
1136: \begin{proof}
1137: We will show that at most one of $a_0[1]$ and $a_1[1]$ is set before
1138: some process finishes round $2$ and decides.
1139: Consider an execution in which $a_0[1]$ and $a_1[1]$ are each set at
1140: some point.  Let $P_0$ and $P_1$ be the first processes to set
1141: $a_0[1]$ and $a_1[1]$, respectively.  Neither $P_0$ nor $P_1$ can have
1142: observed the round-$1$ write of the other, or it would have changed
1143: its preference.  Thus both processes' round-$1$ reads of $a_0[1]$ and
1144: $a_1[1]$ must have occurred before either performed its round-$1$
1145: write.  Since we are on a uniprocessor, this can only occur if one of
1146: the processes was pre-empted before its write occurred.
1147: 
1148: Assume without loss of generality that $P_0$ is this unlucky process.
1149: Since $P_0$ is the first process to write to $a_0[1]$, if we can show
1150: that $P_0$ is not rescheduled before some process completes round $2$,
1151: then that process decides $1$ (and by Lemma \ref{lemma-agreement}, all
1152: processes eventually decide $1$) as soon as it observes a zero in
1153: $a_0[1]$.  So we need only show that $P_0$ is not rescheduled until
1154: some other process completes eight operations.
1155: 
1156: Let $Q_1$ be the process that pre-empts $P_0$.  At the time of
1157: pre-emption, $Q_1$ is at the start of a quantum; it either finishes
1158: eight operations without being pre-empted or is pre-empted by a
1159: higher-priority process $Q_2$.  But $Q_2$ in turn can only be
1160: pre-empted before completing its quantum by some higher-priority
1161: process $Q_3$.  After at most $n$ such pre-emptions, we run out of
1162: higher-priority processes, and the last process runs to the end of its
1163: quantum and decides.  Note that all of the processes in this chain
1164: (except possibly $Q_1$) have a higher priority than $P_0$ and thus
1165: cannot be equal to $P_0$.  It follows that some process finishes round
1166: $2$ before $P_0$ is rescheduled, and thus every process decides $1$ by
1167: the end of round $3$.
1168: \end{proof}
1169: 
1170: \section{Bounded space \algname}
1171: \label{section-bounded-memory}
1172: 
1173: The \algname algorithm as described in Section~\ref{section-algorithm}
1174: requires infinite space.  In this section, we describe how to modify
1175: the algorithm to use bounded space.  We assume that we have available
1176: a \emph{backup protocol}, which is a
1177: bounded-space consensus protocol that requires polynomial work per
1178: process (for example, the $O(n^4)$ protocol in \cite{Aspnes93} works).
1179: We will build a protocol that combines \algname with the backup
1180: protocol in a way that only uses the backup protocol rarely, so that
1181: its high cost adds only a constant to the $O(\log n)$ cost of the
1182: combined protocol.
1183: 
1184: Note that such a combined protocol is not necessary in the model of
1185: Section~\ref{section-hybrid-termination}, as in that model we only
1186: need space for $3$ rounds of \algname.
1187: 
1188: The combined protocol operates as follows:
1189: \begin{enumerate}
1190: \item Run \algname through round $\rmax$.
1191: \item At round $\rmax+1$, switch to the backup protocol, using the
1192: preference at the end of round $\rmax$ of \algname as input to the backup
1193: protocol.
1194: \end{enumerate}
1195: 
1196: If $\rmax$ is large enough, most of the time we will expect that
1197: \algname{} terminates before reaching $\rmax$ and the backup
1198: algorithm will not be used.  But in the case where $\rmax$ is
1199: reached (say, because the scheduler is nastier than we have assumed),
1200: the backup algorithm guarantees termination using bounded space and
1201: bounded (but possibly very large) expected time.  
1202: 
1203: \begin{theorem}
1204: \label{theorem-bounded-memory}
1205: For any polynomial-work consensus protocol chosen as a backup
1206: algorithm and any noise distribution, 
1207: there is a choice of $\rmax = O(\log^2 n)$ such that the combined
1208: algorithm described above is a consensus protocol that
1209: requires $O(\log n)$ expected operations per
1210: process and $O(\log^2 n)$ bits in the $a_0$ and $a_1$ arrays.
1211: \end{theorem}
1212: \begin{proof}
1213: First let us show that the combined algorithm solves consensus.
1214: Validity is immediate from Lemma~\ref{lemma-validity}; when all inputs
1215: are equal, we never get past round $2$ and the combined algorithm
1216: behaves identically to \algname.
1217: For agreement, the only tricky case is when some processes decide
1218: during \algname and others decide during the backup protocol.  But if
1219: some process $P$ decides $b$ at or before round $r$, then by
1220: Lemmas~\ref{lemma-converging-preferences} and~\ref{lemma-agreement}
1221: no process writes $a_{1-b}[r]$ and every process that executes the
1222: backup protocol has $b$ as input.  Thus the validity condition for the
1223: backup protocol implies that all processes decide $b$.
1224: 
1225: Now let us show that there is a choice of $\rmax$ that gives the
1226: desired performance bound.  Suppose each process finishes the backup
1227: protocol in $O(n^c)$ expected operations.  By
1228: Theorem~\ref{theorem-noisy}, there is a value $T = O(\log n)$ such
1229: that the probability that \algname does not finish by round $k$ is at
1230: most $e^{-\floor{k/T}}$.  
1231: Setting $\rmax = T\cdot c\cdot \log n =
1232: O(\log^2 n)$,
1233: the backup protocol is run with probability at most $e^{-c \log n} =
1234: n^{-c}$, and thus it contributes at most $n^{-c} O(n^c) = O(1)$ to the
1235: expected cost.
1236: 
1237: Finally, the size of the $a_0$ and $a_1$ arrays is clearly equal to
1238: $\rmax = O(\log^2 n)$.
1239: \end{proof}
1240: 
1241: \section{Simulation Results}
1242: \label{section-simulations}
1243: 
1244: \begin{figure*}
1245: \begin{center}
1246: \input{rounds.ps}
1247: \end{center}
1248: \caption{Results of simulating \algname{} with various interarrival
1249: distributions.}
1250: \label{figure-rounds}
1251: \end{figure*}
1252: 
1253: Figure \ref{figure-rounds} gives the results of simulating \algname{}
1254: with various interarrival distributions.  These simulations are of the
1255: model as described in Section \ref{section-noisy}; in
1256: particular it is assumed that all operations take zero time and that
1257: there are no contention effects or synchronization issues.
1258: 
1259: The X axis is plotted on a logarithmic scale and represents the number
1260: of processes.
1261: The Y axis is plotted on a linear scale and represents the round at
1262: which the first process terminates (which may be one less than the round
1263: at which the last process terminates).
1264: Each point in the graph represents an average termination round in
1265: 10,000 trials with the given distribution and number of processes.
1266: The starting times for all processes are the same except for a small
1267: random epsilon, generated uniformly in the range $(0,10^{-8})$.
1268: In each case, half the processes are started 
1269: with input 0 and half with
1270: input 1.
1271: There are no failures.
1272: 
1273: The random number generator used was {\tt drand48}. 
1274: The distributions used were:
1275: \begin{enumerate}
1276: \item Normal distribution with mean 1 and standard deviation 0.2
1277: (variance 0.04), 
1278: rejecting points outside $(0,2)$.
1279: \item $2/3$ or $4/3$ with equal probability.
1280: \item $0.5$ plus an exponential random variable with mean $0.5$.  This
1281: corresponds to a delayed Poisson process.
1282: \item Geometric with $p = 0.5$.
1283: \item Uniform in $(0,2)$.
1284: \item Exponential with mean $1$.  This corresponds to a Poisson process
1285: with no initial delay; it is also equivalent to generating a schedule
1286: by choosing one process uniformly at random for each time unit.
1287: \end{enumerate}
1288: 
1289: It is worth noting that while the expected number of rounds grows
1290: logarithmically for most distributions, both the rate of growth and
1291: the initial value are small.
1292: These small constant factors may be the result of most processes adopting
1293: the values of early leaders, so that termination can be reached by
1294: agreement among leaders rather than the emergence of a single leader.
1295: 
1296: The inverted behavior with a normal
1297: distribution is intriguing; it suggests that with large numbers of
1298: processes there are more chances for one particularly speedy process
1299: to leap ahead of its competitors, and that for some distributions this effect
1300: overshadows the effect of having more competitors to leap ahead of.
1301: It is not clear from the data
1302: whether this curve eventually turns around and starts rising
1303: again, or whether it converges to some constant asymptote.
1304: 
1305: \section{Conclusions, Extensions, and Future Work}
1306: \label{section-extensions}
1307: 
1308: We see this paper as making two main contributions.  The first is the
1309: extraction of the adaptive $\Theta(\log n)$ time 
1310: \algname{} protocol from its more sophisticated
1311: predecessors and the demonstration that this simplified algorithm can
1312: solve consensus in models that are less extreme than those
1313: predecessors were designed to survive but that are perhaps closer to
1314: capturing the scheduling behavior an algorithm is likely to experience
1315: in practice.  Although \algname{} does not really contain any new
1316: ideas, we believe that ripping out features
1317: that practitioners might
1318: balk at implementing is a valuable task in its own right.
1319: 
1320: The second is the noisy scheduling model.  This model limits the
1321: adversary not by covering its eyes but by making its hands shake.
1322: It allows us to express the understanding that in the real world
1323: failures and timing
1324: are usually not fully under the control of intelligent
1325: demons, while still retaining a healthy respect for the subtlety and
1326: unpredictability of the world.
1327: We believe that this ``perturbed worst-case analysis'' approach 
1328: is likely to have applications in many areas both in and outside of
1329: distributed computing.
1330: 
1331: There are still many questions left unanswered and many ways in
1332: which the noisy scheduling model could be extended.  We
1333: discuss some of these issues below.
1334: 
1335: \paragraph*{Non-random failures.}
1336: It would be nice to understand how \algname{} fares with failures that
1337: are not random.
1338: We can get an upper bound in this situation
1339: by restarting Theorem \ref{theorem-noisy} whenever a process dies.
1340: Since the adversary must kill at least one process every expected $O(\log n)$
1341: rounds, the algorithm terminates in expected $O(f \log n)$ rounds
1342: where $f$ is the number of failures.  This bound compares favorably with the
1343: $O(n \log^2 n)$ work per processor needed by the best known randomized
1344: algorithm that solves consensus with a fully-adaptive adversary
1345: and up to $n-1$ failures
1346: \cite{AspnesW1996},
1347: but the fully-adaptive adversary is 
1348: much stronger than 
1349: one limited to noisy scheduling.
1350: It seems likely that a better upper bound than $O(f \log n)$ could be
1351: obtained by a more careful analysis that includes how processes change
1352: preferences.  We conjecture that the real bound is in fact $O(\log n)$.
1353: 
1354: \paragraph*{Statistical adversaries.}
1355: We would also like to do away with
1356: the fixed bound $M$ on the delay between operations under the control
1357: of the adversary.  The technical reason for including this
1358: bound in the model is that it provides a scale for the noise
1359: introduced by the $X_{ij}$ variables; if the adversary can increase
1360: $\Delta_{ij}$ without limit, it can construct a steadily slower and
1361: slower execution in which the
1362: noise, relative to the gap between rounds, never accumulates enough to
1363: affect the schedule.  But a weaker statistical constraint, such as
1364: requiring $\sum_{j=1}^{r} \Delta_{ij} \le rM$, might avoid such
1365: Zeno-like pathologies while allowing more variation in the gaps
1366: between operations.\footnote{This is a bit like using the
1367: \emph{statistical adversary} of
1368: \cite{ChouCEKL95}.} 
1369: The present proof does not work with just this
1370: statistical constraint (the particular step that breaks down is
1371: the use of Lemma
1372: \ref{lemma-gap-combined} to show that being ahead at round $r$ often
1373: means being ahead by $c$ at round $r$), but we conjecture that 
1374: the statistical constraint is in fact enough 
1375: to get
1376: termination in $O(\log n)$ rounds.
1377: 
1378: \paragraph*{Synchronization and contention.}
1379: Though the present work was motivated by a desire to move away from
1380: powerful theoretical adversaries toward a model more closely
1381: reflecting the non-maliciousness of misbehavior in real systems, we
1382: cannot claim that the model accurately describes the behavior of any
1383: real shared-memory system.  One difficulty is that real shared-memory
1384: systems generally
1385: do not guarantee full serializability of memory operations in
1386: the absence of additional synchronization operations
1387: (see \cite[Section 8.6]{PattersonHG1996}).
1388: We can overcome this difficulty by adding
1389: synchronization barriers to each round of \algname{}; in principle
1390: this does not affect the analysis since the structure of each round is
1391: still the same as all other rounds.  
1392: A second problem is memory contention, which we have not analyzed.
1393: The difficulty with both explicit synchronization and memory
1394: contention is that their effects are unlikely to be consistent with
1395: the assumption that the timing of different processes' operations are
1396: independent.  To the extent that this lack of independence disperses
1397: processes (say, by slowing down laggards fighting over congested early-round
1398: registers while allowing the speedy to sail through relatively
1399: clear late-round registers), it helps the algorithm.  Whether such
1400: an effect would occur in practice cannot easily be predicted without
1401: experimentation.
1402: 
1403: \paragraph*{Lower bounds.} The noisy scheduling model is friendly
1404: enough that an $O(\log n)$ running time for consensus might not be
1405: the best possible.  A counterexample like the one given in the proof of
1406: Theorem \ref{theorem-lower-bound} might be able to show that no
1407: deterministic algorithm with certain strong symmetry properties (such
1408: as no dependence on process identity and a mirror-image handling of
1409: the different inputs) can do better, 
1410: but it not obvious where to look for a more general lower bound.
1411: It is not out of the question that a clever algorithm could solve
1412: consensus with noisy scheduling in as little as $O(1)$ time.
1413: 
1414: \paragraph*{Message passing.}
1415: All of our results are set in a shared-memory model.  It would
1416: be interesting to see whether a noisy scheduling assumption can be
1417: used to solve consensus quickly in an asynchronous message-passing
1418: model.
1419: 
1420: \paragraph*{Other problems.}
1421: Finally, though we have concentrated on a particularly simplified
1422: protocol for solving a single fundamental problem, it would be
1423: interesting to see how other algorithms fare in the
1424: noisy scheduling model.  It seems likely, for example, that
1425: algorithms designed for unknown-delay models such as Alur et al.'s
1426: \cite{AlurAT1997} should continue to work in the noisy scheduling
1427: model, perhaps with some constraint on the noise distribution to
1428: exclude random delays with unbounded expectations.  Similarly the line
1429: of inquiry started by Gafni and Mitzenmacher \cite{GafniM1999}, on analyzing
1430: the behavior of timing-based algorithms 
1431: for mutual exclusion and related problems
1432: with random scheduling,
1433: could naturally extend to the more general model of noisy scheduling.
1434: 
1435: \section{Acknowledgments}
1436: 
1437: I would like to thank Faith Fich and Maurice Herlihy for
1438: insightful comments on the plausibility of an early version of the
1439: noisy scheduling
1440: model; 
1441: the remaining implausibility
1442: is my fault and not theirs.  
1443: I am also
1444: indebted to Robbert van Renesse for pointing out the ``narrowness'' of
1445: the bad execution paths needed to prevent consensus as a reason for the
1446: relative lack of concern for asynchronous impossibility results
1447: among practitioners.
1448: 
1449: \bibliographystyle{plain}
1450: 
1451: \bibliography{race}
1452: 
1453: \end{document}
1454: