0206:cs0206012/race.tex

1: \documentclass[11pt]{article}

2:

3: \usepackage{cite}

4: \usepackage{xspace}

5:

6: \newtheorem{theorem}{Theorem}

7: \newtheorem{lemma}[theorem]{Lemma}

8: \newtheorem{corollary}[theorem]{Corollary}

9: \newenvironment{proof}{\noindent\par{\bf Proof:

10: }}{\nopagebreak\rule{1 ex}{0.8 em}\medskip}

11:

12: \newcommand{\algname}{{\sc{lean-consensus}}\xspace}

13:

14: \newcommand{\bigN}{\mbox{\bf N}}

15:

16: \newcommand{\newloglike}[2]{\newcommand{#1}{\mathop{\rm #2}\nolimits}}

17: \newloglike{\E}{E}

18:

19: \newcommand{\remove}[1]{}

20:

21: \newcommand{\floor}[1]{\left\lfloor{#1}\right\rfloor}

22:

23: \newcommand{\rmax}{r_{\max}}

24:

25: \begin{document}

26:

27: \title{Fast Deterministic Consensus in a Noisy Environment}

28:

29: \author{James Aspnes\thanks{

30: Yale University, Department~of

31: Computer Science, 51 Prospect Street/P.O. Box 208285, New

32: Haven CT 06520-8285.

33: Email: \texttt{aspnes@cs.yale.edu}.

34: This work was supported in part by NSF grants CCR-9820888 and

35: CCR-0098078.}}

36:

37: \maketitle

38:

39: \begin{abstract}

40: It is well known that the consensus problem cannot be solved

41: deterministically in an asynchronous environment, but that

42: randomized solutions are possible.

43: We propose a new model, called \emph{noisy scheduling}, in which an

44: adversarial schedule is perturbed randomly, and show

45: that in this model randomness in the environment can substitute for

46: randomness in the algorithm.

47: In particular, we show that

48: a simplified, \emph{deterministic} version of

49: Chandra's wait-free shared-memory consensus algorithm

50: (PODC, 1996, pp.~166--175)

51: solves consensus

52: in time at most logarithmic in the number of \emph{active} processes.

53: The proof of termination

54: is based on showing that a race between independent

55: delayed renewal processes produces a winner quickly.

56: In addition, we show that the protocol finishes in constant time using

57: quantum and priority-based scheduling on a uniprocessor, suggesting

58: that it is robust against the choice of model over a wide range.

59: \end{abstract}

60:

61: \section{Introduction}

62:

63: Perhaps the single most dramatic result in the theory of distributed

64: computing is Fischer, Lynch, and Paterson's proof of the impossibility

65: of deterministic consensus in an asynchronous environment with

66: failures \cite{FischerLP85}.  This result and its extensions

67: \cite{LouiA1987,DolevDS87}

68: show that the consensus

69: problem, in which a group of processes must collectively agree on a

70: bit, cannot be solved deterministically in an asynchronous message-passing

71: or shared-memory model if an unrestricted adversary

72: controls scheduling.

73: Solutions to the shared-memory version of this fundamental

74: problem have

75: thus taken the approach of restricting the adversary, either by

76: allowing randomization that limits the adversary's knowledge

77: \cite{ChorIL1994,Abrahamson1988,AttiyaDS1989,AspnesH90,SaksSW91,Aspnes93,BrachaR1991,Chandra1996,AumannB1996,Aumann1997}

78: or by

79: imposing timing constraints that limit the adversary's control

80: \cite{DolevDS87,DworkLS88,AlurAT1997}.

81: As a corollary to granting less power to the

82: adversary, these solutions often involve granting more power to the

83: algorithm, in the form of the ability to obtain random

84: bits or explicitly delay steps.

85: By using these additional powers an algorithm can escape the FLP bound

86: and reach agreement.

87:

88: These additional powers come at a cost.  Randomization alone is

89: not powerful enough to allow sublinear consensus protocols

90: \cite{Aspnes1998}, so

91: efficient randomized solutions have required additional constraints on

92: the ability of the adversary to observe the arguments to operations and

93: the contents of unread memory locations

94: \cite{Chandra1996,AumannB1996,Aumann1997}.

95: These algorithms carefully manage common pools of unread

96: random bits for future use, a clever but odd-looking practice that

97: is justified primarily by the specific details of the model.  The delay-based

98: algorithm of \cite{AlurAT1997} is less convoluted, but still depends on

99: using explicit delays that at the minimum require

100: that a process has the power to

101: invoke them and at worst may add unnecessary delay when

102: few processes participate.

103:

104: As an alternative to designing an algorithm specifically to exploit

105: the weaknesses of a particular adversary model, we consider the

106: approach of using a simple algorithm that guarantees agreement but

107: relies on good luck to terminate.

108: Our \algname{} algorithm, described in Section

109: \ref{section-algorithm}, is obtained by removing all of the

110: randomized parts of a similar algorithm due to Chandra

111: \cite{Chandra1996}.

112: The essential idea (which is the core of many consensus protocols in the

113: literature) is to stage a race between those processes that

114: prefer $0$ and those that prefer $1$, with the rule that

115: if a slow process sees that

116: faster processes are all in agreement it adopts their common

117: preference.

118: The race is implemented using two arrays of atomic read/write bits.

119: The algorithm terminates when the fastest processes are all in

120: agreement and can decide on their preferred value

121: safely, knowing that other processes will

122: adopt the same preference before they catch up.

123: As shown in Section~\ref{section-agreement}, this mechanism is enough

124: to ensure that if any one process

125: decides then all other processes soon decide on the same value,

126: no matter how the adversary arranges the schedule.

127:

128: In effect, the race framework allows the processes to detect agreement

129: once it occurs.

130: But unlike other consensus algorithms, \algname{} makes no attempt to

131: cajole the processes into reaching agreement--- it relies

132: entirely on the hope that some process eventually pulls ahead of the

133: others.  In order to dash this hope, the adversary must exercise enough

134: control to ensure that the fastest processes run in lockstep.  We

135: believe that in many natural system models it will be difficult for the

136: adversary to exercise this much control.

137:

138: One such model is what we call the

139: \emph{noisy scheduling} model,

140: described in Section

141: \ref{section-noisy}.  In this model, the

142: adversary proposes a schedule

143: that specifies the order in which read and write operations occur,

144: but this schedule is perturbed by random

145: noise drawn from some arbitrary non-constant distribution.

146: This noise corresponds to random factors in a system that might not

147: be

148: strongly correlated with the algorithm's behavior, such as network

149: delays, clock skew, or bus or memory contention.

150:

151: We show in Section~\ref{section-noisy-termination}

152: that, in the noisy scheduling model, \algname{} terminates with expected

153: $\Theta(\log n)$

154: work per process, where $n$ is the number of active

155: processes.

156: This result is distribution-independent, in the sense that the

157: algorithm's asymptotic performance does not depend on the noise

158: distribution in the model (though the constant factor does), and it

159: holds even if processes are subject to random halting failures.

160: Because the algorithm's performance depends only on the

161: number of processes actually executing the protocol and not on

162: the total number of processes in the system,

163: it is \emph{adaptive} in the sense of \cite{AttiyaF1998},

164: which implies

165: it is \emph{fast} in the

166: sense of

167: \cite{Lamport87,AfekDT95}.

168: Thus it is well-suited to situations where only one or a few processes

169: attempt to run the algorithm at the same time.

170:

171: Our noisy scheduling model

172: is similar to the model used by Gafni and Mitzenmacher

173: \cite{GafniM1999}

174: in their analysis of mutual exclusion protocols with random timing, but is

175: extended to include constant delays inserted by the adversary in

176: addition to random delays.

177: Another source of inspiration is Koutsoupias and Papadimitriou's

178: \emph{diffuse adversary}

179: \cite{KoutsoupiasP94},

180: which chooses a distribution over executions in which no branch at any

181: decision point can occur with probability more than some fixed

182: $\epsilon$.

183: Our model is not the first in which an adversary chooses

184: parameters for a stochastic process that then controls scheduling; a

185: sophisticated model of this type, based on asynchronous PRAMs, has been

186: proposed by Cole and Zajicek \cite{ColeZ95}.

187:

188: To give support to our intuition that many possible restrictions

189: on the adversary make \algname{} work, we also consider what happens

190: with a hybrid quantum and priority-based scheduler on a uniprocessor,

191: following the approach of \cite{AndersonM1999}.  (The details of this

192: model, which subsumes both quantum scheduling and priority-based

193: scheduling, are sketched in Section \ref{section-hybrid}.)  We show in

194: Section \ref{section-hybrid-termination} that \algname{} terminates in

195: $O(1)$ steps in the hybrid-scheduling

196: model, as long as the quantum is at least 8.

197: The restriction to a uniprocessor is

198: necessary because \cite{AndersonM1999} shows that no

199: deterministic algorithm can solve consensus with multiple processors,

200: even with hybrid scheduling,

201: without using stronger primitives than atomic read/write registers.

202:

203: Our basic consensus algorithm requires infinitely long arrays.

204: Obviously this is

205: undesirable in a real system.  In order to bound the required space,

206: we adopt a technique from \cite{Chandra1996} and

207: cut off the algorithm after consuming $O(\log^2 n)$ bits

208: of space,

209: using the preference each undecided process has at that point as input to a

210: more expensive,

211: bounded-memory consensus algorithm satisfying the validity

212: property.\footnote{An early example of this approach

213: is found in

214: the bounded-rounds randomized Byzantine agreement

215: protocol of Goldreich and Petrank~\cite{GoldreichP1990}, which

216: switches from a randomized to a deterministic protocol if the

217: randomized protocol does not terminate quickly enough.}

218: Since the more expensive algorithm is only run with low probability,

219: its higher costs do not increase the expected time for the algorithm as

220: a whole by more than a small constant factor.

221: Details are given in Section~\ref{section-bounded-memory}.

222:

223: Section~\ref{section-simulations}

224: describes some simulation results

225: that show that the constant factors in the noisy scheduling

226: analysis are in fact quite small for plausible noise distributions,

227: suggesting that the good theoretical performance of \algname{} might actually

228: translate into fast execution in a real system.

229:

230: In Section \ref{section-extensions}, we suggest a number of directions

231: in which the current work could be extended, including extensions to

232: the noisy scheduling model.

233: One interesting possibility is the inclusion of adaptive crash

234: failures.  We

235: argue briefly that

236: because \algname{} recovers quickly from such failures,

237: it terminates in at most $O(f \log n)$ work per process even if up to

238: $f$ processes fail.  However, there remains an interesting open

239: question whether noisy scheduling is enough to get $O(\log n)$

240: performance even with $\Theta(n)$ crash failures.

241:

242: \section{The Consensus Problem}

243:

244: In the \emph{binary consensus problem}, a group of $n$

245: processes, possibly subject to halting failures,

246: must agree on a bit.\footnote{Some authors

247: consider the stronger problem of \emph{id consensus},

248: in which the decision value is the id of

249: some active process.  In many cases, id consensus can be

250: solved in a natural way using a $(\lg n)$-depth tree of binary

251: consensus protocols; examples of this approach can be found in

252: \cite{Chandra1996,Aumann1997}.}

253: A \emph{consensus protocol} is a distributed algorithm in which each

254: non-faulty

255: process starts with an input bit and eventually terminates by deciding

256: on an output bit.

257: It must satisfy

258: the following three conditions with probability 1:

259:

260: \begin{itemize}

261: \item \emph{Agreement.} All non-faulty processes decide on the same bit.

262: \item \emph{Termination.} All non-faulty processes finish the protocol in

263: a finite number of steps.

264: \item \emph{Validity.} If all processes start with the same input bit,

265: all non-faulty processes decide on that bit.\footnote{

266: Some definitions of consensus replace the validity condition with a

267: weaker \emph{non-triviality} condition that says that there must exist

268: executions in which different decision values occur.

269: }

270: \end{itemize}

271:

272: \section{Model}

273:

274: We assume a shared-memory system consisting of an unbounded number of

275: processes that communicate only through shared atomic read/write

276: registers.

277: We use the usual interleaving model, in which operations are assumed to

278: occur in a sequence $\pi_1,\pi_2,\ldots$, and in which each read

279: operation returns the value of the last previous write to the same

280: location.  The order in which operations occur is determined by a

281: stochastic process that is partially under the control of an

282: adversary (Section \ref{section-noisy}), or directly by the adversary

283: subject to certain regularity constraints (Section

284: \ref{section-hybrid}).

285:

286: \subsection{Noisy Scheduling}

287: \label{section-noisy}

288:

289: In the \emph{noisy scheduling} model, we assume that the adversary

290: specifies when operations occur (subject to an upper bound on the time

291: between successive operations by the same process), but that this

292: specification is perturbed by random noise.

293:

294: Formally, the adversary chooses:

295:

296: \begin{enumerate}

297: \item An arbitrary starting time $\Delta_{i0}$ for each process $p_i$,

298:

299: \item A non-negative

300: delay $\Delta_{ij}$ between process $p_i$'s $(j-1)$-th and $j$-th

301: operations, bounded by some fixed constant $M$, and

302:

303: \item A fixed common distribution $F_{\pi}$ of the random delay

304: added to

305: each type of operation

306: $\pi$ (e.g., read or write).

307: If process $p_i$'s $j$-th operation is of type $\pi$,

308: it suffers an additional delay $X_{ij}$ whose distribution is $F_\pi$.

309: There is no restriction on the choice of

310: the $F_\pi$,

311: except that they must not be concentrated on a point and must produce

312: only non-negative values

313: $X_{ij}$.\footnote{In fact, the $F_\pi$ distributions can be quite

314: bizarre; it is not required, for example, that the $X_{ij}$ have finite

315: expectation.}

316: \end{enumerate}

317:

318: The time of process $p_i$'s $j$-th operation is given by

319: \begin{displaymath}

320: S_{ij} =

321: \Delta_{i0} + \sum_{k = 1}^{j} \left(\Delta_{ik} + X_{ik}\right).

322: \end{displaymath}

323:

324: Since we are using interleaving semantics, the effect of executing two

325: operations at exactly the same time is not well-defined.  To avoid

326: ill-defined executions, we

327: impose the additional technical constraint on the adversary's choices that

328: the probability that any two operations occur

329: simultaneously must be zero.  This is automatic if, for example, the

330: noise distributions $F_\pi$ are continuous.  Alternatively, it can

331: be arranged by dithering the starting times of each process by some

332: small epsilon.  This technical constraint does not qualitatively change

333: our results.

334:

335: Below we discuss the unfairness

336: of noisy scheduling and extensions to

337: allow random failures.

338:

339: \subsubsection{Unfairness}

340:

341: The upper bound on the $\Delta_{ij}$

342: and the common distribution on the $X_{ij}$ might suggest that the

343: noisy scheduling model produces fair schedules.  This is not entirely

344: true for sufficiently pathological distributions.

345: \begin{theorem}

346: There exists a choice of $F_\pi$ and $\Delta_{ij}$ such that for any

347: distinct processes $p_i$ and $p_{i'}$, and any operation $j$,

348: the expected number of operations $p_{i'}$ completes between

349: $p_i$'s $j$-th and $(j+1)$-th operations is infinite.

350: \end{theorem}

351: \begin{proof}

352: Set each $F_\pi$ so that

353: $X_{ij}$ takes on the value $2^{k^2}$ with probability

354: $2^{-k}$ for $k = 1, 2, \ldots$.  For simplicity, let us suppose

355: that $\Delta_{ij}=0$ for $j > 0$.  We will also assume that $A$ and $B$

356: execute no operations before time $0$.

357:

358: Let $X$ be the number of operations completed by $p_{i'}$ between

359: $S_{ij}$ and $S_{i,j+1}$.  We will show that the expectation of $X$ is

360: infinite conditioned on the value of

361: $t = \lceil S_{ij} \rceil$ (the ceiling is so that we have countably

362: many cases).

363:

364: The idea is this: for each $k$ we have probability $2^{-k}$ that

365: $S_{i,j+1} \ge X_{i,j+1} = 2^{k^2}$.

366: Condition on this event occurring for some particular $k$ and

367: consider how many operations $p_{i'}$ must execute to reach time

368: $2^{k^2}$.  Either (a) one of these operations

369: takes time $2^{k^2}$ or more (with

370: probability $2^{-k+1}$ per operation); or

371: (b) a total of at least $2^{2k-1}$ faster

372: operations, each of which

373: takes at most $2^{(k-1)^2}$ time, must occur.

374: If we wait only for event (a), we expect to see $2^{k-1}$ operations;

375: to get the actual expected number, we must subtract off the expected

376: number of operations until (a) occurs after (b) occurs ($2^{k-1}$

377: again) multiplied by the probability that (b) occurs.  This latter

378: probability is at most

379: $(1-\frac{1}{2^{k-1}})^{2k-1}$, which goes to $e^{-2}$ in the limit as

380: $k$ grows; it follows that $p_{i'}$ executes $\Omega(2^k)$ operations

381: on average before time $2^{k^2}$.  Of these, at most $t/2$ can occur

382: before time $S_{ij}$, so if $k \gg \lg t$, we have $\Omega(2^k)$

383: operations on average between $t$ and $2^{k^2}$, and thus also between

384: $S_{ij}$ and $S_{i,j+1}$, since $S_{ij} \le t < 2^{k^2} \le S_{i,j+1}$.

385:

386: To get the full result, we must remove two layers of conditioning.

387: First compute the expectation conditioned only on $t$ by

388: summing $2^{-k} \Omega(2^k)$ for each of the infinitely many

389: sufficiently large $k$.  It is

390: not difficult to see that this sum diverges and the expectation is

391: infinite.  Summing over all values of $t$ doesn't make it any less

392: infinite, and we are done.

393: \end{proof}

394:

395: \subsubsection{Failures}

396:

397: We can extend the noisy scheduling model to allow halting failures.

398: For each $i$ and each $j > 0$ let $H_{ij} = \infty$ if process $p_i$

399: halts before its $j$-th operation and $0$ otherwise.  Define

400: \begin{displaymath}

401: S'_{ij} =

402: \Delta_{i0} + \sum_{k = 1}^{j} \left(\Delta_{ik} + X_{ik} +

403: H_{ik}\right),

404: \end{displaymath}

405: with the usual convention for the extended real line

406: that $x+\infty = \infty+x = \infty$ for any finite $x$.

407: If $S'_{ij} = \infty$, $p_i$'s $j$-th operation does not occur.

408:

409: We do not include failures in the noise distributions $F_\pi$ because

410: these distributions do not depend on $n$, and a constant probability of

411: failure would mean that all processes die after $O(\log n)$ steps.

412: Instead, we assume that failures occur

413: independently with probability $h(n)$ per operation, where $h$ is some

414: function chosen by the adversary.  The effect of stronger failure

415: models is discussed in Section \ref{section-extensions}.

416:

417: \subsection{Quantum and Priority-Based Scheduling}

418: \label{section-hybrid}

419:

420: Our intuition is that \algname{} should perform well in any setting

421: that prevents lockstep executions.  One such setting is the

422: hybrid-scheduled

423: uniprocessor model of \cite{AndersonM1999}, which combines the

424: priority-based scheduling model of \cite{RamamurthyMA1996}

425: with the quantum-based

426: scheduling model of \cite{AndersonJO1998}.  In this model, processes are

427: assumed to be time-sharing a uniprocessor under the control of a

428: pre-emptive scheduler.  Each process has a priority, and a process may

429: be pre-empted at any time by a process of higher priority.  A process

430: may only be pre-empted by a process of the same priority if it has

431: exhausted its \emph{quantum}, a minimum number of operations it must

432: complete between the time it wakes up and the time at which it becomes

433: vulnerable to pre-emption.  There is no requirement that a process

434: start the protocol at the beginning of a quantum; it may have used up

435: some or all of its quantum performing other work before starting the

436: protocol.  We do not consider failures in the hybrid-scheduling model; instead,

437: a process may be arbitrarily delayed subject to the constraints on the

438: scheduler.

439:

440: \section{The \algname{} Algorithm}

441: \label{section-algorithm}

442:

443: In this section, we describe the \algname{} algorithm.

444: The algorithm is very simple, because we are relying on

445: randomness in the environment

446: to guarantee termination and thus the algorithm itself must

447: only guarantee correctness and provide the opportunity for the

448: underlying system to quickly jostle it into a decision state.

449: Structurally, it is essentially identical

450: to the

451: multi-writer register consensus protocol of Chandra

452: \cite{Chandra1996}

453: with the shared coins removed,

454: leaving only the implementation from multi-writer bits

455: of the ``racing counters'' technique that has

456: been used in many shared-memory consensus protocols.

457: It also bears some similarities

458: to the Time-Adaptive Consensus algorithm of Alur et

459: al.\cite{AlurAT1997} with the delays removed.

460:

461: At each step of the algorithm, each process \emph{prefers} either 0 or 1 as

462: its decision value.  The conflict between the 0-preferring processes and

463: the 1-preferring processes is settled by a race implemented using two

464: arrays $a_0$ and $a_1$ of atomic read/write bits, each initialized to

465: zero.  Each process

466: carries out a sequence of rounds, each consisting of a fixed sequence

467: of operations.  During round $r$, a process that prefers $b$ marks

468: location $a_b[r]$ with a one

469: and looks to see if either (a) it has fallen behind

470: its rivals who prefer $(1-b)$, in which case it abandons its former

471: preference and joins the winning team, or (b) it and its fellows have

472: sped far enough

473: ahead of any rival processes that they can safely decide $b$ knowing

474: that those rivals will give up and join the $b$ team before they catch

475: up.

476: The algorithm finishes fastest when the pack of processes disperses

477: quickly, so that a clear winner emerges as early as possible.

478:

479: Let us look more closely at the details of the algorithm.

480: A process with input $b$ sets its preference $p$ to $b$

481: and its round number $r$ to $1$.

482: (We say that a process is \emph{at

483: round $r$} if its round number is set to $r$; processes thus start at round

484: $1$.)

485: It then repeatedly executes the following

486: sequence of steps.  To simplify the description of the algorithm, we

487: assume that while $a_0$ and $a_1$ are initialized to zeroes,

488: they are prefixed with (effectively read-only)

489: locations $a_0[0]$ and

490: $a_1[0]$,

491: both set to $1$.

492:

493: \begin{enumerate}

494: \item Read $a_0[r]$ and $a_1[r]$.  If for some $b$,

495: $a_b[r]$ is $1$ and $a_{1-b}[r]$

496: is $0$, set $p$ to $b$.

497: \item Write $1$ to $a_p[r]$.

498: \item Read $a_{1-p}[r-1]$.  If this value is $0$, decide $p$ and exit.

499: \item Otherwise, set $r$ to $r+1$ and repeat.

500: \end{enumerate}

501:

502: Note that in each round the process carries out exactly four

503: operations in the same sequence: two reads, a write, and another read.

504: It is tempting to optimize the algorithm by eliminating the write when

505: it is already evident from the previous step that $a_p[r]$ is set or

506: eliminating the last read when it can be deduced from the value of

507: $a_{1-p}[r]$ that $a_{1-p}[r-1]$ is set.

508: However, this optimization reduces the work done by slow processes

509: (whom we'd like to have fall still further behind) while maintaining the same

510: per-round cost for fast processes (whom we'd like to have pull

511: ahead).  So we must paradoxically carry out operations that

512: might appear to be superfluous in order to minimize the actual

513: total cost.

514:

515:

516: \section{Agreement and Validity}

517: \label{section-agreement}

518:

519: If we ignore the termination requirement, the correctness of the

520: algorithm does not depend on the behavior of the scheduler.  The

521: following two lemmas show that the validity and agreement properties

522: hold whenever the algorithm terminates.  The proofs are very similar

523: in spirit to those of Lemmas 1-4 in \cite{Chandra1996}.

524:

525: \begin{lemma}

526: \label{lemma-converging-preferences}

527: No process sets $a_{b}[r]$ unless (a) $r = 1$ and $b$ is an input

528: value, or (b) $r > 1$ and $a_{b}[r-1]$ has already been set.

529: \end{lemma}

530: \begin{proof}

531: Consider the first process $P$ that sets $a_{b}[r]$.

532: Then $P$ does not read $1$ from $a_{b}[r]$ at round $r$ and does not

533: change its preference during round $r$.  If $r=1$, $P$'s preference

534: equals its input, establishing case (a); if $r > 1$, $P$ must have set

535: $a_{b}[r-1]$ at round $r-1$, establishing case (b).

536: \end{proof}

537:

538: \begin{lemma}

539: \label{lemma-validity}

540: If every process starts with the same input bit $b$, every process

541: decides $b$ after executing 8 operations.

542: \end{lemma}

543: \begin{proof}

544: From Lemma~\ref{lemma-converging-preferences},

545: if no process has input $1-b$, no process ever sets $a_{1-b}[1]$.  It

546: follows that every process sees a zero in $a_{1-b}[1]$ at round 2 and

547: decides $b$.

548: \end{proof}

549:

550: \begin{lemma}

551: \label{lemma-agreement}

552: If some process decides $b$ at round $r$,

553: then

554: (a) no process ever writes $a_{1-b}[r]$,

555: and

556: (b) every process decides $b$ at

557: or before round $r+1$.

558: \end{lemma}

559: \begin{proof}

560: Let $P$ decide $b$ at round $r$.

561: We will show that this implies that no process ever sets $a_{1-b}[r]$.

562:

563: Suppose some process sets $a_{1-b}[r]$; let $Q$ be the first such

564: process.  Because $Q$ is the first process to set $a_{1-b}[r]$, it must read a

565: $0$ from $a_{1-b}[r]$ at the start of round $r$.  Thus $Q$ can only set

566: $a_{1-b}[r]$ if it already prefers $1-b$ at the start of round

567: $r$, implying that it set $a_{1-b}[r-1]$ during round $r-1$;

568: and if it reads a $0$ from $a_{b}[r]$ at the start of round $r$,

569: preventing it from changing its preference after seeing a $0$ in

570: $a_{1-b}[r]$.

571: But $Q$'s read of $a_{b}[r]$ occurs after $Q$'s write to

572: $a_{1-b}[r-1]$, which occurs after $P$'s read of $a_{1-b}[r-1]$ at

573: round $r$ (because $P$ reads $0$),

574: which in turn occurs after $P$'s write to $a_b[r]$.  Thus

575: $Q$ reads $1$ from $a_b[r]$, and changes its preference to $b$ at

576: round $r$.  This contradicts our

577: assumption that $Q$ is the first to set $a_{1-b}[r]$.

578: It follows that if any process decides $b$ in round $r$, no process sets

579: $a_{1-b}[r]$.

580:

581: Since no process sets $a_{1-b}[r]$, any process that reaches round

582: $r+1$ must set $a_{b}[r+1]$ (by Lemma~\ref{lemma-converging-preferences}),

583: and will decide $b$ after reading $0$ from $a_{1-b}[r]$.

584: Thus no process runs past round $r+1$ without deciding $b$.

585:

586: To show agreement in earlier rounds, let $P'$ decide $b'$ at round $r'

587: \le r$.  By the preceding argument, if $P'$ decides $b'$ at round

588: $r'$, then no process sets $a_{1-b'}[r']$ and thus (by

589: Lemma~\ref{lemma-converging-preferences} again) no process sets

590: $a_{1-b'}[r]$.  But since $P$ sets $a_{b}[r]$, we must have $b'=b$.

591: \end{proof}

592:

593: \section{Termination with Noisy Scheduling}

594: \label{section-noisy-termination}

595:

596: In this section, we show that \algname{} terminates in $\Theta(\log n)$

597: rounds with noisy scheduling and random failures.  (This analysis includes the

598: core model without random failures as well, since the adversary can

599: always choose $h(n)=0$.)

600: We show that

601: either all processes die (in which case we treat the algorithm as

602: terminating in the last round in which some process takes a step), or some

603: group of processes with a common preference eventually gets two

604: rounds ahead of the other processes.  To avoid analyzing the

605: details of how processes shift preferences, we will show the even

606: stronger result that unless all processes die,

607: a \emph{single} process eventually gets two rounds

608: ahead of the other processes.

609:

610: To simplify the argument, we abstract away from the individual

611: sequence of operations in each round and look only at the times at

612: which rounds are completed.  We can thus assume that the adversary

613: provides a single noise distribution $F$ (corresponding to the

614: distribution of the sum of the delays on three reads and one write)

615: and that the values $\Delta_{ij}$, $X_{ij}$, and $H_{ij}$

616: provide the delay not

617: on the $j$-th \emph{operation} but on the $j$-th \emph{round}.

618: Since this abstraction merely involves summing together the underlying

619: variables on operations, it does not reduce the adversary's control

620: over the protocol.  We will scale $M$ appropriately so that it is

621: still the case that $0 \le \Delta_{ij} \le M$ when $j > 0$.

622:

623: Using this approach, the increment $\Delta_{ij} + X_{ij} + H_{ij}$

624: is the time taken for

625: process $i$ to move from the end of round $j-1$ to the end

626: round $j$.  The

627: constant $\Delta_{i0}$ represents the process's starting time, and

628: $S'_{ir} = \Delta_{i0} + \sum_{j=1}^{r} \left(\Delta_{ij} +

629: X_{ij} + H_{ij}\right)$

630: gives the time at which the process finishes round $r$.  A process $i$ wins

631: the race with a lead of $c$ rounds

632: at round $r+c$ if it finishes round $r+c$ before any other

633: process finishes round $r$, i.e., if $S'_{i,r+c} \le S'_{i',r}$ for all

634: $i' \ne i$.

635:

636: We would like to show a bound on how the expected round at which some

637: process wins by $c$ scales as a function of the number of processes $n$,

638: keeping $c$, $M$, and $F$ fixed.

639: This bound is given in Corollary \ref{corollary-race} below.

640: We will assume that $h(n) = o(1)$, as otherwise all processes die after

641: $O(\log n)$ rounds on average.

642: The proof proceeds in two steps: first we show that for \emph{any} $r$

643: which some process finishes with at least constant probability, there

644: exists a critical time $t$ that gives at least a constant probability

645: that $S'_{ir} \le t$ for exactly one $i$.  We then show that if $r$ is

646: large enough, $\Pr[S'_{i,r+c} \le t | S'_{ir} \le t]$ is also at least a

647: constant.  It then follows that the probability that $S'_{i,r+c} \le t$

648: while

649: $S'_{i'r} > t$ for any $i' \ne i$ is at least the product of these two

650: constants and the constant probability that $p_i$ is not killed

651: between rounds $r$ and $r+c$.

652: Thus after a constant number of phases each consisting

653: of $r+c$ rounds we expect some process to win.

654:

655: \subsection{Existence of a winner}

656:

657: In this section, we build up the tools needed to show that for each

658: round there exists a fixed time at which there is likely to be a

659: unique winner.

660:

661: \begin{lemma}

662: \label{lemma:one-vs-zero}

663: Let $A_1, \ldots, A_n$ be independent events.

664: If the probability that no $A_i$ occurs is $x$,

665: where $x$ is not zero,

666: then the probability that exactly one $A_i$ occurs is

667: at least $-x \ln x$.

668: \end{lemma}

669: \begin{proof}

670: Let $q_i$ be the probability that $A_i$ does not occur.

671: The probability $x$ that no $A_i$ occurs is the product of the $q_i$.

672: Since $x$ is nonzero, each $q_i$ must also be nonzero.

673: The probability that exactly one $A_i$ occurs is given by

674: \begin{eqnarray}

675: \left(\prod_{i=1}^{n} q_i\right) \sum_{i=1}^{n} \frac{1-q_i}{q_i}

676: &=&

677: x \sum_{i=1}^{n} \left(\frac{1}{q_i} - 1 \right)

678: \nonumber

679: \\

680: &=&

681: x \left(-n + \sum_{i=1}^{n} \frac{1}{q_i}\right).

682: \label{eq:one-vs-zero}

683: \end{eqnarray}

684:

685: Let $G$ be the geometric mean of the $q_i$ and let $H$ be their

686: harmonic mean.  By the theorem of the means, $G > H$.

687: Observe that $G = x^{1/n}$ and

688: \begin{displaymath}

689: \sum_{i=1}^{n} \frac{1}{q_i} = n/H

690: > n/G = n x^{-1/n} = n \exp\left(-\frac{\ln x}{n}\right)

691: \ge n \left(1 - \frac{\ln x}{n}\right)

692: = n - \ln x.

693: \end{displaymath}

694: Plugging this inequality into (\ref{eq:one-vs-zero}) gives the result.

695: \end{proof}

696:

697: Suppose $X_1, \ldots, X_n$ are random times.  The following lemma

698: shows that under certain conditions there exists a constant time

699: $t_0$,

700: such that, with constant probability, at most one of the $X_i$ is less

701: than $t_0$:

702:

703: \begin{lemma}

704: \label{lemma-winner}

705: Let $X_1, \ldots, X_n$ be

706: independent random variables such that for

707: all finite values $t$ and all distinct $i, j$, the probability that

708: $X_i = X_j = t$ is zero.

709: Then either $\Pr[\forall i X_i = \infty]$ is greater than $e^{-1}$ or

710: there exists $t_0$ such that the

711: probability that exactly one of the $X_i$

712: is less than or equal to $t_0$

713: is at least $1/5$.

714: \end{lemma}

715: \begin{proof}

716: For each $t$, let $q_i(t)$ be the probability that $X_i$ is not

717: less than or equal to $t$.

718: Let $q(t) = \prod_{i=1}^{n} q_i(t)$

719: be the probability that none of the $X_i$

720: are

721: less than or equal to $t$.

722: Note that each $q_i(t)$ is

723: a decreasing right-continuous left-limited function with

724: $\lim_{t \rightarrow -\infty} q_i(t) = 1$

725: and

726: $\lim_{t \rightarrow \infty} q_i(t) = \Pr[X_i = \infty]$.

727: Similarly, $q(t) = \prod_{i} q_i(t)$ is right-continuous,

728: left-limited, and has

729: $\lim_{t \rightarrow -\infty} q(t) = 1$

730: and

731: $\lim_{t \rightarrow \infty} q(t) = \Pr[\forall i X_i = \infty]$.

732:

733: Suppose that this latter quantity is less than or equal to $e^{-1}$.

734: (If not, the first case of the lemma holds.)

735: Then for some finite $t$, $q(t) \le e^{-1}$.

736: Let $t_0$ be the least such $t$.

737:

738: Now suppose $q(t_0) \ge e^{-2}$.

739: Then, by Lemma \ref{lemma:one-vs-zero},

740: the probability that exactly one $X_i$ is less than or equal to $t_0$

741: is at least $2 e^{-2} \approx 0.27\ldots$.

742:

743: Otherwise, we have $q(t_0) < e^{-2}$ but

744: $q(t_0-) = \lim_{t \rightarrow t_0-} q(t) > e^{-1}$.

745: (We are using the usual convention

746: that $f(x-)$ denotes the left limit

747: of $f$ at $x$.)

748: This discontinuity must correspond to a discontinuity

749: in $q_i$ for some $i$.

750: At most one $q_i$ has a discontinuity at $t_0$,

751: by the assumption that the probability that distinct $X_i$, $X_j$

752: both equal $t_0$ is zero.

753: Hence, for all $j \ne i$ we have

754: $q_j(t_0-) = q_j(t_0)$

755: and thus

756: $q_i(t_0-)/q_i(t_0) = q(t_0-)/q(t_0) \le e^{-1}$.

757:

758: Since $q_i(t_0-) \le 1$, it follows immediately that

759: $q_i(t_0) \le e^{-1}$

760: and thus the probability that $X_i$

761: is less than or equal to $t_0$

762: is at least $1 - e^{-1}$.

763: Now the probability that no other $X_j$

764: is less than or equal to $t_0$

765: is at least $q(t_0)/q_i(t_0) \ge q(t_0-) > e^{-1}$.

766: Since the variables are independent, the probability that

767: only $X_i$

768: is less than or equal to $t_0$

769: is thus at least $(1-e^{-1})e^{-1} \approx 0.23\ldots$.

770: \end{proof}

771:

772: \subsection{Size of the lead}

773:

774: In this section, we show that if enough rounds have passed, a

775: process that is likely to be ahead of the others is in fact likely to

776: be several rounds ahead.  The proof is somewhat complicated by the

777: lack of restrictions on the noise distribution, but the following

778: lemma shows how the Strong Law of Large Numbers can be used to smooth

779: the noise terms out a bit.

780:

781: \begin{lemma}

782: \label{lemma-gap}

783: Let $X_1, X_2, \ldots$ be finite non-negative independent identically

784: distributed random variables whose common distribution is

785: not concentrated on a point.  Define $S_n =

786: \sum_{i=1}^{n} X_i$.  For any $c$, there exist

787: $n, t$ such that $\Pr[S_n < t] < \frac{1}{2}$

788: but $\Pr[S_n < t-c] > 0$.

789: \end{lemma}

790: \begin{proof}

791: Let us first consider the case where $X_i$ has a finite expectation

792: $m$.  Then the Strong Law of Large Numbers says that

793: $S_n/n$ converges to $m$ in the limit with probability $1$.

794: So for any $\epsilon > 0$, the probability that $S_n$ is less than

795: $m-\epsilon$ goes to zero and thus drops below $1/2$ for all $n$

796: greater than some $n_0$.

797:

798: Let $t_n = n(m-\epsilon)$.  As long as $n > n_0$, we have

799: $\Pr[S_n < t] < \frac{1}{2}$.

800: Now suppose that $\Pr[S_n < t_n-c] = 0$ whenever $n > n_0$.  Since the

801: $X_i$ are independent, this event can only occur if for each $X_i$,

802: $X_i < \frac{t_n - c}{n} = m-\epsilon-\frac{c}{n}$ with probability $0$.

803: Taking the union of countably many

804: such bad events for each rational $\epsilon$ and

805: each $n > n_0$ shows that the event $X_i < m$,

806: also has probability $0$.  It follows that $X_i \ge E[X_i]$ almost

807: surely and thus the distribution of $X_i$ is concentrated on

808: $E[X_i]$, a contradiction.

809:

810: If $X_i$ does not have a finite expectation, then $S_n/n$ grows without

811: bound with probability $1$ (see the corollary to Theorem 22.1 in

812: \cite{Billingsley}).  So for any $x$, there exists $n_0$, such that

813: $\Pr[S_n/n < x] < \frac{1}{2}$ for $n > n_0$.  We repeat the above

814: analysis for $t = nx$; if $\Pr[S_n < t-c] = 0$ for all such $t$, we get

815: $X_i \ge x - \frac{c}{n}$ almost surely, implying $X_i$ exceeds any

816: finite bound $x$.  Again, a contradiction.

817: \end{proof}

818:

819: Once the noise terms have been smoothed, it is not hard to show that

820: they eventually accumulate enough to push a winner ahead:

821:

822: \begin{lemma}

823: \label{lemma-gap-relative}

824: Fix $c > 0$.

825: Let $X_1, X_2, \ldots$ be finite independent identically distributed random

826: variables such that there exists a threshold $t_0$ for which

827: $\Pr[X < t_0] < \frac{1}{2}$ but

828: $\Pr[X < t_0-c] = \delta_0 > 0$.

829: Define $S_n = \sum_{i=1}^{n} X_i$.

830:

831: Then for any $\epsilon > 0$, there exists an $n =

832: O(\log(1/\epsilon))$,

833: such that for any $t$,

834: $\Pr[S_n < t] > \epsilon$

835: implies

836: $\Pr[S_n < t - c | S_n < t] > \frac{1}{7}\delta_0$.

837: \end{lemma}

838: \begin{proof}

839: Set $n = 8(\ln(1/\epsilon)+1)$.  Each $X_i$ has probability at most

840: $1/2$ of being less than $t_0$, so a simple application of Chernoff

841: bounds shows that the probability that 3/4 or more of the $X_i$ are

842: less than $t_0$ is at most

843: $e^{-n/8} = \epsilon/e$.

844:

845: We will use this fact to argue that even when conditioning on $S_n <

846: t$, there is nearly one chance in four that $X_n$ in particular is

847: greater than $t_0$.  In this case, $S_{n-1}$ is less than $t-t_0$ and

848: we can use independence to replace $X_n$ with a new value less than

849: $t_0 - c$, giving a sum $S_n$ less than $t-c$, all without reducing the

850: probability by much.

851:

852: Formally, we have the following sequence of inequalities, each of

853: which is implied by the previous one.  Let

854: $\Pr[S_n < t] = p$ and suppose $p > \epsilon$.  Then we have:

855:

856: \begin{eqnarray*}

857: \Pr[S_n < t] &=& p \\

858: \Pr[S_n < t \wedge

859: \mbox{at least $\frac{1}{4}$ of $X_i$ are

860: greater than $t_0$}]

861: &>& p-\epsilon/e \\

862: \Pr[S_n < t \wedge X_n > t_0] &>& \frac{1}{4}(p-\epsilon/e) \\

863: \Pr[S_{n-1} < t - t_0] &>& \frac{1}{4}(p-\epsilon/e) \\

864: \Pr[S_{n-1} < t - t_0 \wedge X_n < t_0-c]

865: &>&

866: \frac{1}{4}(p-\epsilon/e)\delta_0 \\

867: \Pr[S_n < t - c]

868: &>&

869: \frac{1}{4}(p-\epsilon/e)\delta_0 \\

870: \Pr[S_n < t - c | S_n < t]

871: &>&

872: \frac{1}{4}(p-\epsilon/e)\delta_0/p

873: \end{eqnarray*}

874:

875: Since $p > \epsilon$, this last quantity is at least

876: $\frac{1}{4}(1-1/e)\delta_0$, which is in turn greater than

877: $\frac{1}{7}\delta_0$.

878: \end{proof}

879:

880: We can now combine Lemmas \ref{lemma-gap} and

881: \ref{lemma-gap-relative} into the following:

882:

883: \begin{lemma}

884: \label{lemma-gap-combined}

885: Let $X_1, X_2, \ldots$ be finite

886: non-negative independent identically distributed

887: random variables whose common distribution is not

888: concentrated on a point.  Define $S_n = \sum_{i=1}^{n} X_i$.

889: Fix $c > 0$.  Then there is a constant $\delta$,

890: such that for any $\epsilon > 0$, there exists $n =

891: O(\log(1/\epsilon))$,

892: such that $\Pr[S_n < t - c | S_n < t] > \delta$

893: whenever $\Pr[S_n < t] > \epsilon$.

894: \end{lemma}

895: \begin{proof}

896: Use Lemma \ref{lemma-gap} to group the $X_i$ together into partial

897: sums $Y_i = \sum_{j=i n_0+1}^{i n_0 + n_0} X_j$ with the property that

898: for some $t$

899: $\Pr[Y_i < t] < \frac{1}{2}$ but $\Pr[Y_i < t - c] = \delta_0 > 0$.

900: (Note that $n_0$ does not depend on $\epsilon$, so it disappears into

901: the constant factor.)

902: Then apply Lemma \ref{lemma-gap-relative} to sums of these $Y_i$

903: variables to get the full result.

904: \end{proof}

905:

906: \subsection{When the Race Ends}

907:

908: In this section, we show that a race between $n$ independent delayed

909: renewal processes with bounded added delays ends in $O(\log n)$

910: rounds with at least constant probability.  In the

911: following section, we

912: translate this result, which appears as Corollary

913: \ref{corollary-race}, back into terms of the \algname{} algorithm

914: to get Theorem \ref{theorem-noisy}.

915:

916: \begin{theorem}

917: \label{theorem-race}

918: Let $\{X_{ij}\}$, where $i, j \ge 1$,

919: be a two-dimensional array of finite non-negative

920: independent identically

921: distributed random variables with a common distribution function $F$

922: that is not concentrated on a point.

923: Let $\{\Delta_{ij}\}$, where $i \ge 1, j \ge 0$, be a

924: two-dimensional array of constants with $0 \le \Delta_{ij} \le M$

925: when $j \ge 1$.

926: Let $\{H_{ij}\}$, where $i, j \ge 1$, be a two-dimensional array of

927: independent random variables, each of which is equal to $\infty$ with

928: probability $h(n)$ and $0$ otherwise.

929: Define

930: \[S'_{ir} = \Delta_{i0} + \sum_{j=1}^{r} \left(\Delta_{ij} +

931: X_{ij} + H_{ij}\right).\]

932: Assume that for any finite $t$, integer $r$, and $i\ne j$,

933: $\Pr[S'_{ir} = S'_{jr} = t] = 0$.

934: Let $c$ be any integer constant greater than $0$.

935:

936: Then there exists a constant $\delta > 0$, such that

937: for any $n$, there exists $r=O(\log n)$ and $t$,

938: such that

939: \begin{displaymath}

940: \Pr

941: \left[

942: \forall i \> S'_{ir} = \infty

943: \vee

944: \left(

945: \exists i \le n: S'_{i,r+c} < t \wedge \forall i' \ne i, i' \le n: S'_{i'r} > t

946: \right)

947: \right]

948: >

949: \delta.

950: \end{displaymath}

951:

952: The constant factor in $r=O(\log n)$ and

953: the constant $\delta$ may depend on $c$, $F$, $M$, and $h$; but

954: neither constant

955: depends on $n$.

956: \end{theorem}

957: \begin{proof}

958: Since each $X_{ij}$ is finite with probability $1$, there exists some

959: constant $c_1$ such that

960: $\Pr[\sum_{j=r+1}^{r+c} X_{ij} < c_1] > \frac{1}{2}$.

961: Let $T_{ir} = \sum_{j=1}^{r} X_{ij}$

962: and let $S_{ir} = T_{ir} + \sum_{j=0}^{r} \Delta_{ir}$.

963: Apply Lemma \ref{lemma-gap-combined} to the sequence

964: $X_{ij}$ with $c = cM + c_1$

965: and $\epsilon = n^{-2}$

966: to obtain $r = O(\log n)$ and a constant $\delta_0$ for which

967: $\Pr[T_{ir} < t-cM - c_1 | T_{ir} < t] > \delta_0$

968: whenever

969: $\Pr[T_{ir} < t] > n^{-2}$.

970: Adding the missing constant terms $\sum_{j=0}^{r} \Delta_{ij}$

971: to $T_{ir}$ to get $S_{ir}$

972: is equivalent to subtracting these same terms from each occurrence of

973: $t$,

974: so we in fact have

975: $\Pr[S_{ir} < t-cM - c_1 | S_{ir} < t] > \delta_0$

976: whenever

977: $\Pr[S_{ir} < t] > n^{-2}$.

978: This gives us our target round $r$.

979:

980: Now apply Lemma \ref{lemma-winner}

981: to $S'_{ir}$, for all $i \le n$, to show that with probability at least $1/5$

982: either $\forall i S'_{ir} = \infty$ or

983: there exists a time $t_0$,

984: such that there is a unique winner $i \le n$

985: for which $S'_{ir}$ is less than $t_0$.

986: Let us assume without loss of generality that $n$ is at least

987: $6$.

988: Throw out all cases where $i$ has $\Pr[S'_{ir} < t_0] \le n^{-2}$;

989: this leaves a probability of at least $1/5 - 1/n \ge 1/30$ that

990: (a) there is a unique winner $i$, and (b) $i$ satisfies the condition

991: $\Pr[S'_{ir} < t_0] > n^{-2}$, implying

992: $\Pr[S_{ir} = S'_{ir} < t_0] > n^{-2}$ and thus

993: $\Pr[S_{ir} < t_0-cM - c_1 | S_{ir} < t_0] > \delta_0$.

994: So with probability at least $\frac{1}{30}\delta_0$,

995: we have $S_{ir} < t_0 - cM - c_1$,

996: and thus

997: with probability at least $\frac{1}{60}\delta_0$

998: we have

999: $S_{i,r+c} < S_{ir} + cM + c_1 = S'_{ir} + cM + c_1 < t_0$.

1000:

1001: Suppose that this event holds.  It is still possible for $S'_{i,r+c}$

1002: to be infinite if $\sum_{j=r+1}^{r+c} H_{ij} = \infty$.  Call this

1003: event $I$; if $\Pr[I] = 1 - (1-h(n))^c > \frac{1}{120}\delta_0$, then

1004: $h(n)$ is bounded below by a constant and there exists $r' = O(\log

1005: n)$ such that $\Pr[\forall i S'_{ir'} = \infty]$ is at least a

1006: constant.  Alternatively, we have

1007: \(

1008: \Pr[S'_{i,r+c} = S_{i,r+c} < S'_{ir} + cM + c_1]

1009: > \delta = \frac{1}{120}\delta_0.

1010: \)

1011: In either case, the theorem holds.

1012: \end{proof}

1013:

1014: \begin{corollary}

1015: \label{corollary-race}

1016: Let

1017: $R$ be the first round for which either

1018: \begin{itemize}

1019: \item

1020: There

1021: exists $i$, such that

1022: $S'_{i,R+c} < S'_{i'R}$ for all $i' \ne i$, or

1023: \item

1024: For all $i$, $S'_{i,R+c} = \infty$.

1025: \end{itemize}

1026: Under the conditions of the preceding theorem,

1027: $\E[R] = O(\log n)$, and, for any $k \ge 0$,

1028: $\Pr[R > k] \le e^{-\floor{k/O(\log n)}}$.

1029: \end{corollary}

1030: \begin{proof}

1031: Theorem \ref{theorem-race} says that the

1032: desired event occurs with constant probability $\delta$ after

1033: a phase consisting of $r = O(\log n)$

1034: rounds.  If it does not occur, we can apply the theorem again to the

1035: subset of the $i$'s for which $S'_{i,r+c}$ is finite,

1036: starting with round $r+c+1$ and setting

1037: the initial delay $\Delta_{i0}$ to the value of

1038: $S'_{i,r+c}$ from the previous phase.

1039:

1040: On average, at most $1/\delta = O(1)$ such phases are needed, giving

1041: $\E[R] \le (1/\delta) r = O(\log n)$.

1042: For the exponential tail bound, observe that the probability that the

1043: algorithm runs for more than $c$ phases of $r$ rounds each is at most

1044: $(1-\delta)^{c} = \left((1-\delta)^{1/\delta}\right)^{c \delta}

1045: \le \left(e^{-1}\right)^{c \delta} = e^{-c \delta}$.

1046: So the probability that the algorithm runs for more than $k$ rounds is

1047: at most $e^{-\floor{k/r}\delta} \le e^{-\floor{k/O(\log n)}}$.

1048: \end{proof}

1049:

1050: \subsection{When \algname{} Ends}

1051:

1052: Translating Corollary \ref{corollary-race} back into terms of the

1053: \algname{} algorithm gives:

1054:

1055: \begin{theorem}

1056: \label{theorem-noisy}

1057: Under the noisy scheduling model with random failures,

1058: starting from any reachable state in the \algname{} algorithm in which

1059: the largest round number of any process is $r$, the algorithm running

1060: with $n$ active processes

1061: terminates by round $r+r'$, where $r'$ has expected value $O(\log n)$

1062: and $\Pr[r' > k] \le e^{-\floor{k/O(\log n)}}$ for any $k \ge 0$.

1063: \end{theorem}

1064: \begin{proof}

1065: Apply Corollary \ref{corollary-race} with $c=2$ and the initial delays

1066: $\Delta_{i0}$ set to the times at which each process completes round

1067: $r$.  This shows that after $R$ additional

1068: rounds, where $\E[R] = O(\log n)$ and

1069: $\Pr[R > k] \le e^{-\floor{k/O(\log n)}}$, either some process

1070: $P$

1071: finishes some round $s$ before any other process finishes round $s-2$,

1072: or all processes fail.

1073: In the first case, if $P$ prefers $b$, it is the only process

1074: to have written to $a_b[s-1]$ or $a_{1-b}[s-1]$ by the time it reads

1075: $a_{1-b}[s-1]$ as part of round $s$.

1076: Thus it reads a zero from $a_{1-b}[s-1]$ and decides.  All other

1077: processes decide at most one round later by Lemma

1078: \ref{lemma-agreement}.  We thus get $r' \le R+1$, and the single extra round

1079: disappears into the constant factors.

1080: \end{proof}

1081:

1082: It is not hard to see that an $O(\log n)$ bound

1083: is the best possible, up to

1084: constant factors.

1085:

1086: \begin{theorem}

1087: \label{theorem-lower-bound}

1088: There exists a noise distribution $F$ and a set of delays $\Delta$

1089: such that the \algname{} algorithm requires expected $\Omega(\log n)$

1090: rounds in the noisy scheduling model, even without failures.

1091: \end{theorem}

1092: \begin{proof}

1093: Let all $\Delta_{ij} = 0$ for $j > 0$, and let $F$ have each

1094: operation take either $1$ or $2$ time units with equal probability.

1095: Then any single processor completes its first $\log n$

1096: operations in $1$ time unit each with probability $1/n$.

1097: To avoid simultaneous operations, let $\Delta_{i0}$ be some small

1098: distinct epsilon value for each $i$.

1099:

1100: Start $n/2$ processes with input $0$ and $n/2$ with input $1$.

1101: The probability that there exists at least one $0$-input process and

1102: at least one $1$-input process that both complete their first $\log n$

1103: operations in $1$ time unit each is given by

1104: \begin{displaymath}

1105: \left(1-\left(1-\frac{1}{n}\right)^{n/2}\right)^2

1106: \end{displaymath}

1107: which goes to $(1-e^{-1/2})^2 = \Theta(1)$ in the limit as $n$ grows.

1108: So there is a constant probability that at least one

1109: process with each input runs for $\log n$ operations without ever

1110: changing its preference to that of

1111: a faster process with the opposite preference, and we get expected

1112: $\Omega(\log n)$ rounds of disagreement.

1113: \end{proof}

1114:

1115: \section{Termination with Quantum and Priority-Based Scheduling}

1116: \label{section-hybrid-termination}

1117:

1118: In this section, we consider the question of termination subject to

1119: hybrid quantum and priority-based scheduling on a uniprocessor.  The

1120: required quantum size is 8 operations; curiously, this is the same

1121: size required for the specialized algorithm given in

1122: \cite{AndersonM1999}.  We see this coincidence as hinting at the

1123: possibility

1124: that all shared-memory

1125: consensus algorithms may ultimately converge to a single ideal

1126: algorithm (though such an ideal algorithm, if it exists,

1127: is probably not identical to \algname{}).

1128:

1129: \begin{theorem}

1130: When running \algname{}

1131: in a hybrid-scheduled system with

1132: a quantum of at least 8 operations,

1133: every process decides after executing at

1134: most 12 operations.

1135: \end{theorem}

1136: \begin{proof}

1137: We will show that at most one of $a_0[1]$ and $a_1[1]$ is set before

1138: some process finishes round $2$ and decides.

1139: Consider an execution in which $a_0[1]$ and $a_1[1]$ are each set at

1140: some point.  Let $P_0$ and $P_1$ be the first processes to set

1141: $a_0[1]$ and $a_1[1]$, respectively.  Neither $P_0$ nor $P_1$ can have

1142: observed the round-$1$ write of the other, or it would have changed

1143: its preference.  Thus both processes' round-$1$ reads of $a_0[1]$ and

1144: $a_1[1]$ must have occurred before either performed its round-$1$

1145: write.  Since we are on a uniprocessor, this can only occur if one of

1146: the processes was pre-empted before its write occurred.

1147:

1148: Assume without loss of generality that $P_0$ is this unlucky process.

1149: Since $P_0$ is the first process to write to $a_0[1]$, if we can show

1150: that $P_0$ is not rescheduled before some process completes round $2$,

1151: then that process decides $1$ (and by Lemma \ref{lemma-agreement}, all

1152: processes eventually decide $1$) as soon as it observes a zero in

1153: $a_0[1]$.  So we need only show that $P_0$ is not rescheduled until

1154: some other process completes eight operations.

1155:

1156: Let $Q_1$ be the process that pre-empts $P_0$.  At the time of

1157: pre-emption, $Q_1$ is at the start of a quantum; it either finishes

1158: eight operations without being pre-empted or is pre-empted by a

1159: higher-priority process $Q_2$.  But $Q_2$ in turn can only be

1160: pre-empted before completing its quantum by some higher-priority

1161: process $Q_3$.  After at most $n$ such pre-emptions, we run out of

1162: higher-priority processes, and the last process runs to the end of its

1163: quantum and decides.  Note that all of the processes in this chain

1164: (except possibly $Q_1$) have a higher priority than $P_0$ and thus

1165: cannot be equal to $P_0$.  It follows that some process finishes round

1166: $2$ before $P_0$ is rescheduled, and thus every process decides $1$ by

1167: the end of round $3$.

1168: \end{proof}

1169:

1170: \section{Bounded space \algname}

1171: \label{section-bounded-memory}

1172:

1173: The \algname algorithm as described in Section~\ref{section-algorithm}

1174: requires infinite space.  In this section, we describe how to modify

1175: the algorithm to use bounded space.  We assume that we have available

1176: a \emph{backup protocol}, which is a

1177: bounded-space consensus protocol that requires polynomial work per

1178: process (for example, the $O(n^4)$ protocol in \cite{Aspnes93} works).

1179: We will build a protocol that combines \algname with the backup

1180: protocol in a way that only uses the backup protocol rarely, so that

1181: its high cost adds only a constant to the $O(\log n)$ cost of the

1182: combined protocol.

1183:

1184: Note that such a combined protocol is not necessary in the model of

1185: Section~\ref{section-hybrid-termination}, as in that model we only

1186: need space for $3$ rounds of \algname.

1187:

1188: The combined protocol operates as follows:

1189: \begin{enumerate}

1190: \item Run \algname through round $\rmax$.

1191: \item At round $\rmax+1$, switch to the backup protocol, using the

1192: preference at the end of round $\rmax$ of \algname as input to the backup

1193: protocol.

1194: \end{enumerate}

1195:

1196: If $\rmax$ is large enough, most of the time we will expect that

1197: \algname{} terminates before reaching $\rmax$ and the backup

1198: algorithm will not be used.  But in the case where $\rmax$ is

1199: reached (say, because the scheduler is nastier than we have assumed),

1200: the backup algorithm guarantees termination using bounded space and

1201: bounded (but possibly very large) expected time.

1202:

1203: \begin{theorem}

1204: \label{theorem-bounded-memory}

1205: For any polynomial-work consensus protocol chosen as a backup

1206: algorithm and any noise distribution,

1207: there is a choice of $\rmax = O(\log^2 n)$ such that the combined

1208: algorithm described above is a consensus protocol that

1209: requires $O(\log n)$ expected operations per

1210: process and $O(\log^2 n)$ bits in the $a_0$ and $a_1$ arrays.

1211: \end{theorem}

1212: \begin{proof}

1213: First let us show that the combined algorithm solves consensus.

1214: Validity is immediate from Lemma~\ref{lemma-validity}; when all inputs

1215: are equal, we never get past round $2$ and the combined algorithm

1216: behaves identically to \algname.

1217: For agreement, the only tricky case is when some processes decide

1218: during \algname and others decide during the backup protocol.  But if

1219: some process $P$ decides $b$ at or before round $r$, then by

1220: Lemmas~\ref{lemma-converging-preferences} and~\ref{lemma-agreement}

1221: no process writes $a_{1-b}[r]$ and every process that executes the

1222: backup protocol has $b$ as input.  Thus the validity condition for the

1223: backup protocol implies that all processes decide $b$.

1224:

1225: Now let us show that there is a choice of $\rmax$ that gives the

1226: desired performance bound.  Suppose each process finishes the backup

1227: protocol in $O(n^c)$ expected operations.  By

1228: Theorem~\ref{theorem-noisy}, there is a value $T = O(\log n)$ such

1229: that the probability that \algname does not finish by round $k$ is at

1230: most $e^{-\floor{k/T}}$.

1231: Setting $\rmax = T\cdot c\cdot \log n =

1232: O(\log^2 n)$,

1233: the backup protocol is run with probability at most $e^{-c \log n} =

1234: n^{-c}$, and thus it contributes at most $n^{-c} O(n^c) = O(1)$ to the

1235: expected cost.

1236:

1237: Finally, the size of the $a_0$ and $a_1$ arrays is clearly equal to

1238: $\rmax = O(\log^2 n)$.

1239: \end{proof}

1240:

1241: \section{Simulation Results}

1242: \label{section-simulations}

1243:

1244: \begin{figure*}

1245: \begin{center}

1246: \input{rounds.ps}

1247: \end{center}

1248: \caption{Results of simulating \algname{} with various interarrival

1249: distributions.}

1250: \label{figure-rounds}

1251: \end{figure*}

1252:

1253: Figure \ref{figure-rounds} gives the results of simulating \algname{}

1254: with various interarrival distributions.  These simulations are of the

1255: model as described in Section \ref{section-noisy}; in

1256: particular it is assumed that all operations take zero time and that

1257: there are no contention effects or synchronization issues.

1258:

1259: The X axis is plotted on a logarithmic scale and represents the number

1260: of processes.

1261: The Y axis is plotted on a linear scale and represents the round at

1262: which the first process terminates (which may be one less than the round

1263: at which the last process terminates).

1264: Each point in the graph represents an average termination round in

1265: 10,000 trials with the given distribution and number of processes.

1266: The starting times for all processes are the same except for a small

1267: random epsilon, generated uniformly in the range $(0,10^{-8})$.

1268: In each case, half the processes are started

1269: with input 0 and half with

1270: input 1.

1271: There are no failures.

1272:

1273: The random number generator used was {\tt drand48}.

1274: The distributions used were:

1275: \begin{enumerate}

1276: \item Normal distribution with mean 1 and standard deviation 0.2

1277: (variance 0.04),

1278: rejecting points outside $(0,2)$.

1279: \item $2/3$ or $4/3$ with equal probability.

1280: \item $0.5$ plus an exponential random variable with mean $0.5$.  This

1281: corresponds to a delayed Poisson process.

1282: \item Geometric with $p = 0.5$.

1283: \item Uniform in $(0,2)$.

1284: \item Exponential with mean $1$.  This corresponds to a Poisson process

1285: with no initial delay; it is also equivalent to generating a schedule

1286: by choosing one process uniformly at random for each time unit.

1287: \end{enumerate}

1288:

1289: It is worth noting that while the expected number of rounds grows

1290: logarithmically for most distributions, both the rate of growth and

1291: the initial value are small.

1292: These small constant factors may be the result of most processes adopting

1293: the values of early leaders, so that termination can be reached by

1294: agreement among leaders rather than the emergence of a single leader.

1295:

1296: The inverted behavior with a normal

1297: distribution is intriguing; it suggests that with large numbers of

1298: processes there are more chances for one particularly speedy process

1299: to leap ahead of its competitors, and that for some distributions this effect

1300: overshadows the effect of having more competitors to leap ahead of.

1301: It is not clear from the data

1302: whether this curve eventually turns around and starts rising

1303: again, or whether it converges to some constant asymptote.

1304:

1305: \section{Conclusions, Extensions, and Future Work}

1306: \label{section-extensions}

1307:

1308: We see this paper as making two main contributions.  The first is the

1309: extraction of the adaptive $\Theta(\log n)$ time

1310: \algname{} protocol from its more sophisticated

1311: predecessors and the demonstration that this simplified algorithm can

1312: solve consensus in models that are less extreme than those

1313: predecessors were designed to survive but that are perhaps closer to

1314: capturing the scheduling behavior an algorithm is likely to experience

1315: in practice.  Although \algname{} does not really contain any new

1316: ideas, we believe that ripping out features

1317: that practitioners might

1318: balk at implementing is a valuable task in its own right.

1319:

1320: The second is the noisy scheduling model.  This model limits the

1321: adversary not by covering its eyes but by making its hands shake.

1322: It allows us to express the understanding that in the real world

1323: failures and timing

1324: are usually not fully under the control of intelligent

1325: demons, while still retaining a healthy respect for the subtlety and

1326: unpredictability of the world.

1327: We believe that this ``perturbed worst-case analysis'' approach

1328: is likely to have applications in many areas both in and outside of

1329: distributed computing.

1330:

1331: There are still many questions left unanswered and many ways in

1332: which the noisy scheduling model could be extended.  We

1333: discuss some of these issues below.

1334:

1335: \paragraph*{Non-random failures.}

1336: It would be nice to understand how \algname{} fares with failures that

1337: are not random.

1338: We can get an upper bound in this situation

1339: by restarting Theorem \ref{theorem-noisy} whenever a process dies.

1340: Since the adversary must kill at least one process every expected $O(\log n)$

1341: rounds, the algorithm terminates in expected $O(f \log n)$ rounds

1342: where $f$ is the number of failures.  This bound compares favorably with the

1343: $O(n \log^2 n)$ work per processor needed by the best known randomized

1344: algorithm that solves consensus with a fully-adaptive adversary

1345: and up to $n-1$ failures

1346: \cite{AspnesW1996},

1347: but the fully-adaptive adversary is

1348: much stronger than

1349: one limited to noisy scheduling.

1350: It seems likely that a better upper bound than $O(f \log n)$ could be

1351: obtained by a more careful analysis that includes how processes change

1352: preferences.  We conjecture that the real bound is in fact $O(\log n)$.

1353:

1354: \paragraph*{Statistical adversaries.}

1355: We would also like to do away with

1356: the fixed bound $M$ on the delay between operations under the control

1357: of the adversary.  The technical reason for including this

1358: bound in the model is that it provides a scale for the noise

1359: introduced by the $X_{ij}$ variables; if the adversary can increase

1360: $\Delta_{ij}$ without limit, it can construct a steadily slower and

1361: slower execution in which the

1362: noise, relative to the gap between rounds, never accumulates enough to

1363: affect the schedule.  But a weaker statistical constraint, such as

1364: requiring $\sum_{j=1}^{r} \Delta_{ij} \le rM$, might avoid such

1365: Zeno-like pathologies while allowing more variation in the gaps

1366: between operations.\footnote{This is a bit like using the

1367: \emph{statistical adversary} of

1368: \cite{ChouCEKL95}.}

1369: The present proof does not work with just this

1370: statistical constraint (the particular step that breaks down is

1371: the use of Lemma

1372: \ref{lemma-gap-combined} to show that being ahead at round $r$ often

1373: means being ahead by $c$ at round $r$), but we conjecture that

1374: the statistical constraint is in fact enough

1375: to get

1376: termination in $O(\log n)$ rounds.

1377:

1378: \paragraph*{Synchronization and contention.}

1379: Though the present work was motivated by a desire to move away from

1380: powerful theoretical adversaries toward a model more closely

1381: reflecting the non-maliciousness of misbehavior in real systems, we

1382: cannot claim that the model accurately describes the behavior of any

1383: real shared-memory system.  One difficulty is that real shared-memory

1384: systems generally

1385: do not guarantee full serializability of memory operations in

1386: the absence of additional synchronization operations

1387: (see \cite[Section 8.6]{PattersonHG1996}).

1388: We can overcome this difficulty by adding

1389: synchronization barriers to each round of \algname{}; in principle

1390: this does not affect the analysis since the structure of each round is

1391: still the same as all other rounds.

1392: A second problem is memory contention, which we have not analyzed.

1393: The difficulty with both explicit synchronization and memory

1394: contention is that their effects are unlikely to be consistent with

1395: the assumption that the timing of different processes' operations are

1396: independent.  To the extent that this lack of independence disperses

1397: processes (say, by slowing down laggards fighting over congested early-round

1398: registers while allowing the speedy to sail through relatively

1399: clear late-round registers), it helps the algorithm.  Whether such

1400: an effect would occur in practice cannot easily be predicted without

1401: experimentation.

1402:

1403: \paragraph*{Lower bounds.} The noisy scheduling model is friendly

1404: enough that an $O(\log n)$ running time for consensus might not be

1405: the best possible.  A counterexample like the one given in the proof of

1406: Theorem \ref{theorem-lower-bound} might be able to show that no

1407: deterministic algorithm with certain strong symmetry properties (such

1408: as no dependence on process identity and a mirror-image handling of

1409: the different inputs) can do better,

1410: but it not obvious where to look for a more general lower bound.

1411: It is not out of the question that a clever algorithm could solve

1412: consensus with noisy scheduling in as little as $O(1)$ time.

1413:

1414: \paragraph*{Message passing.}

1415: All of our results are set in a shared-memory model.  It would

1416: be interesting to see whether a noisy scheduling assumption can be

1417: used to solve consensus quickly in an asynchronous message-passing

1418: model.

1419:

1420: \paragraph*{Other problems.}

1421: Finally, though we have concentrated on a particularly simplified

1422: protocol for solving a single fundamental problem, it would be

1423: interesting to see how other algorithms fare in the

1424: noisy scheduling model.  It seems likely, for example, that

1425: algorithms designed for unknown-delay models such as Alur et al.'s

1426: \cite{AlurAT1997} should continue to work in the noisy scheduling

1427: model, perhaps with some constraint on the noise distribution to

1428: exclude random delays with unbounded expectations.  Similarly the line

1429: of inquiry started by Gafni and Mitzenmacher \cite{GafniM1999}, on analyzing

1430: the behavior of timing-based algorithms

1431: for mutual exclusion and related problems

1432: with random scheduling,

1433: could naturally extend to the more general model of noisy scheduling.

1434:

1435: \section{Acknowledgments}

1436:

1437: I would like to thank Faith Fich and Maurice Herlihy for

1438: insightful comments on the plausibility of an early version of the

1439: noisy scheduling

1440: model;

1441: the remaining implausibility

1442: is my fault and not theirs.

1443: I am also

1444: indebted to Robbert van Renesse for pointing out the ``narrowness'' of

1445: the bad execution paths needed to prevent consensus as a reason for the

1446: relative lack of concern for asynchronous impossibility results

1447: among practitioners.

1448:

1449: \bibliographystyle{plain}

1450:

1451: \bibliography{race}

1452:

1453: \end{document}

1454: