0504:cs0504029/arxiv.tex

1: %\documentclass{sig-alternate}

2:

3: \documentclass[11pt,onecolumn]{IEEEtran}

4: \usepackage{subfigure}

5: \usepackage{epsfig,graphicx,psfrag}

6: \usepackage{amsfonts,amsmath,amssymb}

7:

8:

9: \newtheorem{theorem}{Theorem}

10: \newtheorem{lemma}{Lemma}

11: \newtheorem{corollary}{Corollary}

12:

13: \newtheorem{definition}{Definition}

14: \newtheorem{note}{Note}

15: \newtheorem{property}{Property}

16:

17: \newcommand{\rf}{\right}

18: \newcommand{\lf}{\left}

19: \renewcommand{\epsilon}{\varepsilon}

20: \newcommand{\Reals}{\mathbf{R}}

21: \newcommand{\hy}{\hat{y}}

22: \newcommand{\cmp}{\text{cmp}}

23: \newcommand{\spr}{\text{spr}}

24: \newcommand{\bW}{\bar{W}}

25: \newcommand{\RealsP}{\Reals_{+}}

26: \newcommand{\hW}{\hat{W}}

27: \newcommand{\beq}{\begin{eqnarray}}

28: \newcommand{\eeq}{\end{eqnarray}}

29:

30: \begin{document}

31: %

32: % --- Author Metadata here ---

33:

34: \title{Fast Distributed Algorithms for Computing Separable Functions}

35:

36:

37: % Put no more than the first THREE authors in the \author command

38: \author{Damon Mosk-Aoyama and Devavrat Shah

39: %

40: % The command \alignauthor (no curly braces needed) should

41: % precede each author name, affiliation/snail-mail address and

42: % e-mail address. Additionally, tag each line of

43: % affiliation/address with \affaddr, and tag the

44: %% e-mail address with \email.

45: \thanks{D. Mosk-Aoyama is with the department of

46: Computer Science, Stanford University. D. Shah is

47: with the department of Electrical Engineering and Computer Science,

48: MIT.  Emails:~\{damonma@cs.stanford.edu,devavrat@mit.edu\}

49: }

50: }

51:

52: \date{}

53:

54: \maketitle

55:

56: \begin{abstract}

57:

58: The problem of computing functions of values at the nodes in a network

59: in a totally distributed manner, where nodes do not have unique

60: identities and make decisions based only on local information, has

61: applications in sensor, peer-to-peer, and ad-hoc networks.  The task

62: of computing separable functions, which can be written as linear

63: combinations of functions of individual variables, is studied in this

64: context.  Known iterative algorithms for averaging can be used to

65: compute the normalized values of such functions, but these algorithms

66: do not extend in general to the computation of the actual values of

67: separable functions.

68:

69: The main contribution of this paper is the design of a distributed

70: randomized algorithm for computing separable functions.  The running

71: time of the algorithm is shown to depend on the running time of a

72: minimum computation algorithm used as a subroutine.  Using a

73: randomized gossip mechanism for minimum computation as the subroutine

74: yields a complete totally distributed algorithm for computing

75: separable functions.  For a class of graphs with small spectral gap,

76: such as grid graphs, the time used by the algorithm to compute

77: averages is of a smaller order than the time required by a known

78: iterative averaging scheme.

79:

80: \end{abstract}

81:

82: % A category with the (minimum) three required fields

83: %\category{C.2.2}{Computer-Communication Networks}{Network Protocols}

84: %\category{F.2.2}{Analysis of Algorithms and Problem Complexity}{Nonnumerical Algorithms and Problems}

85: %A category including the fourth, optional field follows...

86: %\category{D.2.8}{Software Engineering}{Metrics}[complexity measures, performance measures]

87:

88: %\terms{Algorithms, Theory}

89:

90: %\keywords{~Data aggregation, randomized algorithms, gossip}

91:

92: \section{Introduction}

93:

94: The development of sensor, peer-to-peer, and ad hoc wireless networks

95: has stimulated interest in distributed algorithms for data

96: aggregation, in which nodes in a network compute a function of local

97: values at the individual nodes.  These networks typically do not have

98: centralized agents that organize the computation and communication

99: among the nodes.  Furthermore, the nodes in such a network may not

100: know the complete topology of the network, and the topology may change

101: over time as nodes are added and other nodes fail.  In light of the

102: preceding considerations, distributed computation is of vital

103: importance in these modern networks.

104:

105: We consider the problem of computing separable functions in a

106: distributed fashion in this paper.  A separable function can be

107: expressed as the sum of the values of individual functions.  Given a

108: network in which each node has a number, we seek a distributed

109: protocol for computing the value of a separable function of the

110: numbers at the nodes.  Each node has its own estimate of the value of

111: the function, which evolves as the protocol proceeds.  Our goal is to

112: minimize the amount of time required for all of these estimates to be

113: close to the actual function value.

114:

115: In this work, we are interested in {\em totally distributed}

116: computations, in which nodes have a local view of the state of the

117: network.  Specifically, an individual node does not have information

118: about nodes in the network other than its neighbors.  To accurately

119: estimate the value of a separable function that depends on the numbers

120: at all of the nodes, each node must obtain information about the other

121: nodes in the network.  This is accomplished through communication

122: between neighbors in the network.  Over the course of the protocol,

123: the global state of the network effectively diffuses to each

124: individual node via local communication among neighbors.

125:

126: More concretely, we assume that each node in the network knows only

127: its neighbors in the network topology, and can contact any neighbor to

128: initiate a communication.  On the other hand, we assume that the nodes

129: do not have unique identities (i.e., a node has no unique identifier

130: that can be attached to its messages to identify the source of the

131: messages).  This constraint is natural in ad-hoc and mobile networks,

132: where there is a lack of infrastructure (such as IP addresses or

133: static GPS locations), and it limits the ability of a distributed

134: algorithm to recreate the topology of the network at each node.  In

135: this sense, the constraint also provides a formal way to distinguish

136: distributed algorithms that are truly local from algorithms that

137: operate by gathering enormous amounts of global information at all the

138: nodes.

139:

140: The absence of identifiers for nodes makes it difficult, without

141: global coordination, to simply transmit every node's value throughout

142: the network so that each node can identify the values at all the

143: nodes.  As such, we develop an algorithm for computing separable

144: functions that relies on an {\em order- and duplicate-insensitive}

145: statistic \cite{ngsa} of a set of numbers, the minimum.  The algorithm

146: is based on properties of exponential random variables, and reduces

147: the problem of computing the value of a separable function to the

148: problem of determining the minimum of a collection of numbers, one for

149: each node.

150:

151: This reduction leads us to study the problem of

152: {\em information spreading} or {\em information dissemination} in a

153: network.  In this problem, each node starts with a message, and the

154: nodes must spread the messages throughout the network using local

155: communication so that every node eventually has every message.

156: Because the minimum of a collection of numbers is not affected by the

157: order in which the numbers appear, nor by the presence of duplicates

158: of an individual number, the minimum computation required by our

159: algorithm for computing separable functions can be performed by any

160: information spreading algorithm.  Our analysis of the algorithm for

161: computing separable functions establishes an upper bound on its

162: running time in terms of the running time of the information spreading

163: algorithm it uses as a subroutine.

164:

165: In view of our goal of distributed computation, we analyze a

166: {\em gossip} algorithm for information spreading.  Gossip algorithms

167: are a useful tool for achieving fault-tolerant and scalable

168: distributed computations in large networks.  In a gossip algorithm,

169: each node repeatedly iniatiates communication with a small number of

170: neighbors in the network, and exchanges information with those

171: neighbors.

172:

173: The gossip algorithm for information spreading that we study is

174: randomized, with the communication partner of a node at any time

175: determined by a simple probabilistic choice.  We provide an upper

176: bound on the running time of the algorithm in terms of the

177: {\em conductance} of a stochastic matrix that governs how nodes choose

178: communication partners.  By using the gossip algorithm to compute

179: minima in the algorithm for computing separable functions, we obtain

180: an algorithm for computing separable functions whose performance on

181: certain graphs compares favorably with that of known iterative

182: distributed algorithms \cite{bgps} for computing averages in a

183: network.

184:

185: \subsection{Related work}

186: \label{sec:related}

187:

188: In this section, we present a brief summary of related work.

189: Algorithms for computing the number of distinct elements in a multiset

190: or data stream \cite{fm, streamdistinct} can be adapted to compute

191: separable functions using information spreading \cite{clkb}.  We are

192: not aware, however, of a previous analysis of the amount of time

193: required for these algorithms to achieve a certain accuracy in the

194: estimates of the function value when the computation is totally

195: distributed (i.e., when nodes do not have unique identities).  These

196: adapted algorithms require the nodes in the network to make use of a

197: common hash function.  In addition, the discreteness of the counting

198: problem makes the resulting algorithms for computing separable

199: functions suitable only for functions in which the terms in the sum

200: are integers.  Our algorithm is simpler than these algorithms, and can

201: compute functions with non-integer terms.

202:

203: %% Although there has

204: %% been much interest in the topic of distributed computation of

205: %% functions, we are not aware of previous work on the problem of

206: %% computing separable functions in a totally distributed fashion (i.e.,

207: %% when nodes do not have unique identities).

208:

209: %% More importantly, there exists a small

210: %% $\epsilon_{0} > 0$ such that the analysis does not guarantee that

211: %% $T_{\cal{C}}^{\cmp}(\epsilon, \delta) < \infty$ for

212: %% $\epsilon < \epsilon_{0}$.  That is, the algorithm does not offer

213: %% provable accuracy in its estimates beyond a certain precision.  Our

214: %% algorithm is, to our knowledge, the first algorithm that computes

215: %% separable functions (with positive terms) in a totally distributed

216: %% manner up to any desired accuracy.

217:

218: There has been a lot of work on the distributed computation of

219: averages, a special case of the problem of reaching agreement or

220: consensus among processors via a distributed computation.  Distributed

221: algorithms for reaching consensus under appropriate conditions have

222: been known since the classical work of Tsitsiklis

223: \cite{tsitsiklis-thesis} and Tsitsiklis, Bertsekas, and Athans

224: \cite{tba} (see also the book by Bertsekas and Tsitsiklis

225: \cite{pardiscomp}).  Averaging algorithms compute the ratio of the sum

226: of the input numbers to $n$, the number of nodes in the network, and

227: not the exact value of the sum.  Thus, such algorithms cannot be

228: extended in general to compute arbitrary separable functions.  On the

229: other hand, an algorithm for computing separable functions can be used

230: to compute averages by separately computing the sum of the input

231: numbers, and the number of nodes in the graph (using one as the input

232: at each node).

233:

234: %% Iterative load-balancing schemes \cite{pardiscomp, rabani}, in which

235: %% processors send jobs to each other in order to balance the amount of

236: %% work at the different processors, can be considered distributed

237: %% algorithms that average the load in a system.

238:

239: Recently, Kempe, Dobra, and Gehrke showed the existence of a

240: randomized iterative gossip algorithm for averaging with the optimal

241: averaging time \cite{kempe}.  This result was restricted to complete

242: graphs.  The algorithm requires that the nodes begin the computation

243: in an asymmetric initial state in order to compute separable

244: functions, a requirement that may not be convenient for large networks

245: that do not have centralized agents for global coordination.

246: Furthermore, the algorithm suffers from the possibility of oscillation

247: throughout its execution.

248:

249: In a more recent paper, Boyd, Ghosh, Prabhakar, and Shah presented a

250: simpler iterative gossip algorithm for averaging that addresses some

251: of the limitations of the Kempe et al. algorithm \cite{bgps}.

252: Specifically, the algorithm and analysis are applicable to arbitrary

253: graph topologies.  Boyd et al. showed a connection between the

254: averaging time of the algorithm and the mixing time (a property that

255: is related to the conductance, but is not the same) of an appropriate

256: random walk on the graph representing the network.  They also found an

257: optimal averaging algorithm as a solution to a semi-definite program.

258:

259: For completeness, we contrast our results for the problem of averaging

260: with known results.  As we shall see, iterative averaging, which has

261: been a common approach in the previous work, is an order slower than

262: our algorithm for many graphs, including ring and grid graphs.  In

263: this sense, our algorithm is quite different than (and has advantages

264: in comparison with) the known averaging algorithms.

265:

266: On the topic of information spreading, gossip algorithms for

267: disseminating a message to all nodes in a complete graph in which

268: communication partners are chosen uniformly at random have been

269: studied for some time \cite{frieze, rumor2, epidemic}.  Karp,

270: Schindelhauer, Shenker, and V\"{o}cking presented a

271: {\em push and pull} gossip algorithm, in which communicating nodes

272: both send and receive messages, that disseminates a message to all $n$

273: nodes in a graph in $O(\log n)$ time with high probability

274: \cite{kssv}.  In this work, we have provided an analysis of the time

275: required for a gossip algorithm to disseminate $n$ messages to $n$

276: nodes for the more general setting of arbitrary graphs and non-uniform

277: random choices of communication partners.  For other related results,

278: we refer the reader to \cite{rumor3, gossip1, gossip2}.  We take note

279: of the similar (independent) recent work of Ganesh, Massouli\'{e}, and

280: Towsley \cite{gmt}, and Berger, Borgs, Chayes, and Saberi \cite{bbcs},

281: on the spread of epidemics in a network.

282:

283:

284: \subsection{Organization}

285:

286: The rest of the paper is organized as follows.  Section

287: \ref{sec:prelim} presents the distributed computation problems we

288: study and an overview of our results.  In Section \ref{sec:comp}, we

289: develop and analyze an algorithm for computing separable functions in

290: a distributed manner.  Section \ref{sec:infdis} contains an analysis

291: of a simple randomized gossip algorithm for information spreading,

292: which can be used as a subroutine in the algorithm for computing

293: separable functions.  In Section \ref{sec:appl}, we discuss

294: applications of our results to particular types of graphs, and compare

295: our results to previous results for computing averages.  Finally, we

296: present conclusions and future directions in Section \ref{sec:conc}.

297:

298: \section{Preliminaries and Results}

299: \label{sec:prelim}

300:

301: We consider an arbitrary connected network, represented by an

302: undirected graph $G = (V, E)$, with $|V| = n$ nodes.  For notational

303: purposes, we assume that the nodes in $V$ are numbered arbitrarily so

304: that $V = \{1, \dots, n\}$.  A node, however, does not have a unique

305: identity that can be used in a computation.  Two nodes $i$ and $j$ can

306: communicate with each other if (and only if) $(i, j) \in E$.

307:

308: To capture some of the resource constraints in the networks in which

309: we are interested, we impose a {\em transmitter gossip} constraint on

310: node communication.  Each node is allowed to contact at most one other

311: node at a given time for communication.  However, a node can be

312: contacted by multiple nodes simultaneously.

313:

314: Let $2^{V}$ denote the power set of the vertex set $V$ (the set of all

315: subsets of $V$).  For an $n$-dimensional vector

316: $\vec{x} \in \Reals^{n}$, let $x_{1}, \dots, x_{n}$ be the components

317: of $\vec{x}$.

318: \begin{definition}

319: %[Separable]

320: We say that a function $f : \Reals^{n} \times 2^{V} \to \Reals$

321: is {\em separable} if there exist functions $f_{1}, \dots, f_{n}$ such

322: that, for all $S \subseteq V$,

323: \begin{equation}

324: f(\vec{x}, S) = \sum_{i \in S} f_{i}(x_{i}).

325: \label{sepsum}

326: \end{equation}

327: \label{sepfunc}

328: \end{definition}

329:

330: \noindent

331: {\bf Goal.}  Let $\cal{F}$ be the class of separable functions $f$ for

332: which $f_{i}(x) \geq 1$ for all $x \in \Reals$ and $i = 1, \dots, n$.

333: Given a function $f \in \cal{F}$, and a vector $\vec{x}$ containing

334: initial values $x_{i}$ for all the nodes, the nodes in the network are

335: to compute the value $f(\vec{x}, V)$ by a distributed computation,

336: using repeated communication between nodes.

337:

338: \begin{note}

339: Consider a function $g$ for which there exist functions

340: $g_{1}, \dots, g_{n}$ satisfying, for all $S \subseteq V$, the

341: condition $g(\vec{x}, S) = \prod_{i \in S} g_{i}(x_{i})$ in lieu of

342: (\ref{sepsum}).  Then, $g$ is {\em logarithmic separable}, i.e.,

343: $f = \log_b g$ is separable.  Our algorithm for computing separable

344: functions can be used to compute the function $f = \log_{b} g$.  The

345: condition $f_{i}(x) \geq 1$ corresponds to $g_{i}(x) \geq b$ in this

346: case.  This lower bound of $1$ on $f_{i}(x)$ is arbitrary, although

347: our algorithm does require the terms $f_{i}(x_{i})$ in the sum to be

348: positive.

349: \end{note}

350:

351: Before proceeding further, we list some practical situations where the

352: distributed computation of separable functions arises naturally.  By

353: definition, the sum of a set of numbers is a separable function.

354: \renewcommand{\labelenumi}{(\arabic{enumi})}

355: \begin{enumerate}

356: \item

357: {\em Summation.} Let the value at each node be $x_{i} = 1$.  Then, the

358: sum of the values is the number of nodes in the network.

359:

360: \item

361: {\em Averaging.} According to Definition \ref{sepfunc}, the average of

362: a set of numbers is not a separable function.  However, the nodes can

363: estimate the separable function $\sum_{i = 1}^{n} x_{i}$ and $n$

364: separately, and use the ratio between these two estimates as an

365: estimate of the mean of the numbers.

366:

367: Suppose the values at the nodes are measurements of a quantity of

368: interest.  Then, the average provides an unbiased maximum likelihood

369: estimate of the measured quantity.  For example, if the nodes are

370: temperature sensors, then the average of the sensed values at the

371: nodes gives a good estimate of the ambient temperature.

372: \end{enumerate}

373:

374: For more sophisticated applications of a distributed averaging

375: algorithm, we refer the reader to \cite{distr_eigvec} and

376: \cite{MSZ}. %McSherry and Kempe, STOC

377: Averaging is used for the distributed computation of the top $k$

378: eigenvectors of a graph in \cite{distr_eigvec}, while in \cite{MSZ}

379: averaging is used in a throughput-optimal distributed scheduling

380: algorithm in a wireless network.

381:

382: \noindent{\bf Time model.} In a distributed computation, a time model

383: determines when nodes communicate with each other.  We consider two

384: time models, one synchronous and the other asynchronous, in this

385: paper.  The two models are described as follows.

386: \begin{enumerate}

387: \item

388: {\em Synchronous time model:} Time is slotted commonly across all

389: nodes in the network.  In any time slot, each node may contact one of

390: its neighbors according to a random choice that is independent of the

391: choices made by the other nodes.  The simultaneous communication

392: between the nodes satisfies the transmitter gossip constraint.

393:

394: \item

395: {\em Asynchronous time model:} Each node has a clock that ticks at the

396: times of a rate $1$ Poisson process.  Equivalently, a common clock

397: ticks according to a rate $n$ Poisson process at times

398: $C_{k}, k \geq 1$, where $\{C_{k + 1} - C_{k}\}$ are i.i.d.

399: exponential random variables of rate $n$.  On clock tick $k$, one of

400: the $n$ nodes, say $I_{k}$, is chosen uniformly at random.  We

401: consider this global clock tick to be a tick of the clock at node

402: $I_{k}$.  When a node's clock ticks, it contacts one of its neighbors

403: at random.  In this model, time is discretized according to clock

404: ticks.  On average, there are $n$ clock ticks per one unit of absolute

405: time.

406: %% Corollary \ref{discrete-to-contc} states a precise translation of

407: %% clock ticks into absolute time.

408: \end{enumerate}

409:

410: In this paper, we measure the running times of algorithms in absolute

411: time, which is the number of time slots in the synchronous model, and

412: is (on average) the number of clock ticks divided by $n$ in the

413: asynchronous model.  To obtain a precise relationship between clock

414: ticks and absolute time in the asynchronous model, we appeal to tail

415: bounds on the probability that the sample mean of i.i.d. exponential

416: random variables is far from its expected value.  In particular, we

417: make use of the following lemma, which also plays a role in the

418: analysis of the accuracy of our algorithm for computing separable

419: functions.

420: %% is provided by Corollary \ref{discrete-to-contc},

421: %% which is stated below.  Corollary \ref{discrete-to-contc} is a

422: %% consequence of the following lemma, which follows from Cram\'{e}r's

423: %% Theorem (see \cite{dembo}, pp. $30$, $35$) and properties of

424: %% exponential random variables.

425: \begin{lemma}

426: \label{discrete-to-cont}

427: For any $k \geq 1$, let $Y_{1}, \dots, Y_{k}$ be i.i.d. exponential

428: random variables with rate $\lambda$.  Let

429: $R_{k} = \frac{1}{k} \sum_{i = 1}^{k} Y_{i}$.  Then, for any

430: $\epsilon \in (0, 1/2)$,

431: \begin{eqnarray}

432: \Pr \left(\left|R_k - \frac{1}{\lambda}\right|

433: \geq \frac{\epsilon}{\lambda} \right)

434: & \leq & 2 \exp\left(-\frac{\epsilon^{2} k}{3}\right).

435: \label{e:dtoc1}

436: \end{eqnarray}

437: \end{lemma}

438: \begin{proof}

439: By definition,

440: $E[R_{k}] = \frac{1}{k}\sum_{i = 1}^{k} \lambda^{-1} = \lambda^{-1}$.

441: The inequality in (\ref{e:dtoc1}) follows directly from Cram\'{e}r's

442: Theorem (see \cite{dembo}, pp. $30$, $35$) and properties of

443: exponential random variables.

444: \end{proof}

445:

446: A direct implication of Lemma \ref{discrete-to-cont} is the following

447: corollary, which bounds the probability that the absolute time $C_{k}$

448: at which clock tick $k$ occurs is far from its expected value.

449: \begin{corollary}\label{discrete-to-contc}

450: For $k \geq 1$, $E[C_{k}] = k/n$.  Further, for any

451: $\epsilon \in (0, 1/2)$,

452: \begin{eqnarray}

453: \Pr \left( \left|C_{k} - \frac{k}{n}\right|

454: \geq \frac{\epsilon k}{n} \right)

455: & \leq & 2 \exp\left(-\frac{\epsilon^{2} k}{3}\right).

456: \label{e:dtoc}

457: \end{eqnarray}

458: \end{corollary}

459:

460: %% \noindent

461: %% {\bf Performance Measure.}

462: Our algorithm for computing separable functions is randomized, and is

463: not guaranteed to compute the exact quantity

464: $f(\vec{x}, V) = \sum_{i = 1}^{n} f_{i}(x_{i})$ at each node in the

465: network.  To study the accuracy of the algorithm's estimates, we

466: analyze the probability that the estimate of $f(\vec{x}, V)$ at every

467: node is within a $(1 \pm \epsilon)$ multiplicative factor of the true

468: value $f(\vec{x}, V)$ after the algorithm has run for some period of

469: time.  In this sense, the error in the estimates of the algorithm is

470: relative to the magnitude of $f(\vec{x}, V)$.

471:

472: To measure the amount of time required for an algorithm's estimates to

473: achieve a specified accuracy with a specified probability, we define

474: the following quantity.  For an algorithm ${\cal C}$ that estimates

475: $f(\vec{x}, V)$, let $\hy_i(t)$ be the estimate of $f(\vec{x}, V)$ at

476: node $i$ at time $t$.  Furthermore, for notational convenience, given

477: $\epsilon > 0$, let $A_{i}^{\epsilon}(t)$ be the following event.

478: \[

479: A_{i}^{\epsilon}(t)

480: = \left\{\hy_{i}(t) \not\in

481: \left[(1 - \epsilon)f(\vec{x}, V),

482: (1 + \epsilon)f(\vec{x}, V) \right] \right\}

483: \]

484: \begin{definition}

485: For any $\epsilon > 0$ and $\delta \in (0, 1)$, the

486: ($\epsilon$, $\delta$)-computing time of $\cal{C}$, denoted

487: $T_{\cal{C}}^{\cmp}(\epsilon, \delta)$, is

488: \[

489: T_{\cal{C}}^{\cmp}(\epsilon, \delta)

490: = \sup_{f \in \cal{F}} \sup_{\vec{x} \in \Reals^{n}}

491: \inf \Big\{\tau : \forall t \geq \tau,

492: \Pr \big(\cup_{i = 1}^{n} A_{i}^{\epsilon}(t) \big)

493: \leq \delta \Big\}.

494: \]

495: \end{definition}

496:

497: \noindent

498: Intuitively, the significance of this definition of the

499: $(\epsilon, \delta)$-computing time of an algorithm $\cal{C}$ is that,

500: if $\cal{C}$ runs for an amount of time that is at least

501: $T_{\cal{C}}^{\cmp}(\epsilon, \delta)$, then the probability that the

502: estimates of $f(\vec{x}, V)$ at the nodes are all within a

503: $(1 \pm \epsilon)$ factor of the actual value of the function is at

504: least $1 - \delta$.

505:

506: %% \noindent

507: %% {\bf Information spreading.}

508: As noted before, our algorithm for computing separable functions is

509: based on a reduction to the problem of information spreading, which is

510: described as follows.  Suppose that, for $i = 1, \dots, n$, node $i$

511: has the one message $m_{i}$.  The task of information spreading is to

512: disseminate all $n$ messages to all $n$ nodes via a sequence of local

513: communications between neighbors in the graph.  In any single

514: communication between two nodes, each node can transmit to its

515: communication partner any of the messages that it currently holds.  We

516: assume that the data transmitted in a communication must be a set of

517: messages, and therefore cannot be arbitrary information.

518:

519: Consider an information spreading algorithm $\cal{D}$, which specifies

520: how nodes communicate.  For each node $i \in V$, let $S_{i}(t)$ denote

521: the set of nodes that have the message $m_{i}$ at time $t$.  While

522: nodes can gain messages during communication, we assume that they do

523: not lose messages, so that $S_{i}(t_{1}) \subseteq S_{i}(t_{2})$ if

524: $t_{1} \leq t_{2}$.  Analogous to the $(\epsilon, \delta)$-computing

525: time, we define a quantity that measures the amount of time required

526: for an information spreading algorithm to disseminate all the messages

527: $m_{i}$ to all the nodes in the network.

528: \begin{definition}

529: For $\delta \in (0, 1)$, the $\delta$-information-spreading time

530: of the algorithm $\cal{D}$, denoted $T_{\cal{D}}^{\spr}(\delta)$, is

531: \[

532: T_{\cal{D}}^{\spr}(\delta)

533: = \inf

534: \left\{t : \Pr \left(\cup_{i = 1}^{n} \{S_{i}(t) \neq V\} \right)

535: \leq \delta \right\}.

536: \]

537: \end{definition}

538:

539: In our analysis of the gossip algorithm for information spreading, we

540: assume that when two nodes communicate, each node can send all of its

541: messages to the other in a single communication.  This rather

542: unrealistic assumption of {\em infinite} link capacity is merely for

543: convenience, as it provides a simpler analytical characterization of

544: $T_{\cal{C}}^{\cmp}(\epsilon, \delta)$ in terms of

545: $T_{\cal{D}}^{\spr}(\delta)$.  Our algorithm for computing separable

546: functions requires only links of unit capacity.

547:

548: \subsection{Our contribution}

549: \label{ssec:contrib}

550:

551: The main contribution of this paper is the design of a distributed

552: algorithm to compute separable functions of node values in an

553: arbitrary connected network.  Our algorithm is randomized, and in

554: particular uses exponential random variables.  This usage of

555: exponential random variables is analogous to that in an

556: algorithm by Cohen\footnote{We thank Dahlia Malkhi for pointing

557: this reference out to us.}

558:  for estimating the sizes of sets in a graph \cite{cohen}.  The

559: basis for our algorithm is the following property of the exponential

560: distribution.

561: \begin{property}

562: \label{p1}

563: Let $W_{1}, \dots, W_{n}$ be $n$ independent random variables such

564: that, for $i = 1, \dots, n$, the distribution of $W_{i}$ is

565: exponential with rate $\lambda_{i}$.  Let $\bW$ be the minimum of

566: $W_{1}, \dots, W_{n}$.  Then, $\bW$ is distributed as an exponential

567: random variable of rate $\lambda = \sum_{i = 1}^{n} \lambda_{i}$.

568: \end{property}

569: \begin{proof}

570: For an exponential random variable $W$ with rate $\lambda$, for any

571: $z \in \RealsP$,

572: \[

573: \Pr(W > z) = \exp(-\lambda z).

574: \]

575: Using this fact and the independence of the random variables $W_{i}$,

576: we compute $\Pr(\bW > z)$ for any $z \in \RealsP$.

577: \begin{eqnarray*}

578: \Pr(\bW > z)

579: & = & \Pr \left(\cap_{i = 1}^{n} \{W_{i} > z\} \right) \\

580: & = & \prod_{i = 1}^{n} \Pr(W_{i} > z) \\

581: & = & \prod_{i = 1}^{n} \exp(-\lambda_{i} z) \\

582: & = & \exp\left(-z \sum_{i = 1}^{n} \lambda_{i} \right).

583: \end{eqnarray*}

584: This establishes the property stated above.

585: \end{proof}

586:

587: Our algorithm uses an information spreading algorithm as a subroutine,

588: and as a result its running time is a function of the running time of

589: the information spreading algorithm it uses.  The faster the

590: information spreading algorithm is, the better our algorithm performs.

591: %% In this sense, the optimality of our algorithm is

592: %% with respect to the underlying information spreading mechanism.

593: Specifically, the following result provides an upper bound on the

594: ($\epsilon$, $\delta$)-computing time of the algorithm.

595: \begin{theorem}

596: \label{thm:main1}

597: Given an information spreading algorithm $\cal{D}$ with

598: $\delta$-spreading time $T_{\cal{D}}^{\spr}(\delta)$ for

599: $\delta \in (0, 1)$, there exists an algorithm ${\cal{A}}$ for

600: computing separable functions $f \in \cal{F}$ such that, for any

601: $\epsilon \in (0, 1)$ and $\delta \in (0, 1)$,

602: \[

603: T_{\cal{A}}^{\cmp}(\epsilon, \delta)

604: = O\left( \epsilon^{-2} (1 + \log \delta^{-1})

605: T_{\cal{D}}^{\spr}(\delta/2) \right).

606: \]

607: %% Further, for any algorithm $\cal{C}$ that computes $f$,

608: %% \[

609: %% T_{\cal{C}}^{\cmp}(\epsilon, \delta)

610: %% = \Omega\left(T_{\cal{D}}^{\spr}(\delta)\right).

611: %% \]

612: \end{theorem}

613:

614: %% \vspace{.15in}

615: Motivated by our interest in decentralized algorithms, we analyze a

616: simple randomized gossip algorithm for information spreading.  When

617: node $i$ initiates a communication, it contacts each node $j \neq i$

618: with probability $P_{ij}$.  With probability $P_{ii}$, it does not

619: contact another node.  The $n \times n$ matrix $P = [P_{ij}]$

620: characterizes the algorithm; each matrix $P$ gives rise to an

621: information spreading algorithm $\cal{P}$.  We assume that $P$ is

622: stochastic, and that $P_{ij} = 0$ if $i \neq j$ and $(i, j) \notin E$,

623: as nodes that are not neighbors in the graph cannot communicate with

624: each other.  Section \ref{sec:infdis} describes the data transmitted

625: between two nodes when they communicate.

626:

627: We obtain an upper bound on the $\delta$-information-spreading time of

628: this gossip algorithm in terms of the {\em conductance} of the matrix

629: $P$, which is defined as follows.

630: \begin{definition}

631: %[Conductance]

632: For a stochastic matrix $P$, the conductance of $P$, denoted

633: $\Phi(P)$, is

634: \[

635: \Phi(P)

636: = \min_{S \subset V, \; 0 < |S| \leq n/2}

637: \frac{\sum_{ i \in S, j \notin S} P_{ij}}{ |S|}.

638: \]

639: \end{definition}

640:

641: \noindent

642: In general, the above definition of conductance is not the same as the

643: classical definition \cite{sinclair}.  However, we restrict our

644: attention in this paper to doubly stochastic matrices $P$.  When $P$

645: is doubly stochastic, these two definitions are equivalent.  Note that

646: the definition of conductance implies that $\Phi(P) \leq 1$.

647: %% \vspace{.05in}

648: \begin{theorem}

649: \label{thm:main2}

650: Consider any doubly stochastic matrix $P$ such that if $i \neq j$ and

651: $(i, j) \notin E$, then $P_{ij} = 0$.  There exists an information

652: dissemination algorithm $\cal{P}$ such that, for any

653: $\delta \in (0, 1)$,

654: \[

655: T_{\cal{P}}^{\spr}(\delta)

656: = O\left(\frac{\log n + \log \delta^{-1}}{\Phi(P)}\right).

657: \]

658: \end{theorem}

659: %% \vspace{.1in}

660: \begin{note}

661: The results of Theorems \ref{thm:main1} and \ref{thm:main2} hold for

662: both the synchronous and asynchronous time models.  Recall that the

663: quantities $T_{\cal{C}}^{\cmp}(\epsilon, \delta)$ and

664: $T_{\cal{D}}^{\spr}(\delta)$ are defined with respect to absolute time

665: in both models.

666: %% We also note here that the algorithms (described in

667: %% detail later in the paper) are very simple as well as totally

668: %% distributed as claimed.

669: \end{note}

670:

671: %% \vspace{.1in}

672:

673: \noindent

674: {\bf A comparison.} Theorems \ref{thm:main1} and \ref{thm:main2} imply

675: that, given a doubly stochastic matrix $P$, the time required for our

676: algorithm to obtain a $(1 \pm \epsilon)$ approximation with

677: probability at least $1 - \delta$ is

678: $O\lf(\frac{\epsilon^{-2} (1 + \log \delta^{-1})

679: (\log n + \log \delta^{-1})}{\Phi(P)}\rf)$.

680: When the network size $n$ and the accuracy parameters $\epsilon$ and

681: $\delta$ are fixed, the running time scales in proportion to

682: $1/\Phi(P)$, a factor that captures the dependence of the algorithm on

683: the matrix $P$.  Our algorithm can be used to compute the average of a

684: set of numbers.  For iterative averaging algorithms such as the ones

685: in \cite{tsitsiklis-thesis} and \cite{bgps}, the convergence time

686: largely depends on the mixing time of $P$, which is lower bounded by

687: $\Omega(1/\Phi(P))$ (see \cite{sinclair}, for example).  Thus, our

688: algorithm is (up to a $\log n$ factor) no slower than the fastest

689: iterative algorithm based on time-invariant linear dynamics.

690:

691:

692:

693: \section{Function Computation}

694: \label{sec:comp}

695:

696: In this section, we describe our algorithm for computing the value

697: $y = f(\vec{x}, V) = \sum_{i = 1}^{n} f_{i}(x_{i})$ of the separable

698: function $f$, where $f_{i}(x_{i}) \geq 1$.  For simplicity of

699: notation, let $y_{i} = f_{i}(x_{i})$.  Given $x_{i}$, each node can

700: compute $y_{i}$ on its own.  Next, the nodes use the algorithm shown

701: in Fig. \ref{compalg}, which we refer to as COMP, to compute estimates

702: $\hy_{i}$ of $y = \sum_{i = 1}^{n} y_{i}$.  The quantity $r$ is a

703: parameter to be chosen later.

704: \begin{figure}[htbp]

705: \centering

706: \begin{minipage}{\textwidth}

707: \hrulefill

708:

709: \noindent

710: {\bf Algorithm COMP}

711:

712: \renewcommand{\labelenumi}{{\bf \arabic{enumi}.}}

713: \renewcommand{\labelenumii}{(\alph{enumii})}

714: \begin{enumerate}

715:

716: \item[{\bf 0.}]

717: Initially, for $i = 1, \dots, n$, node $i$ has the value

718: $y_{i} \geq 1$.

719:

720: \item

721: Each node $i$ generates $r$ independent random numbers

722: $W_{1}^{i}, \dots, W_{r}^{i}$, where the distribution of each

723: $W_{\ell}^{i}$ is exponential with rate $y_{i}$ (i.e., with mean

724: $1/y_{i}$).

725:

726: \item

727: \label{minstep}

728: Each node $i$ computes, for $\ell = 1, \dots, r$, an estimate

729: $\hW_{\ell}^{i}$ of the minimum

730: $\bW_{\ell} = \min_{i = 1}^{n} W_{\ell}^{i}$.  This computation can be

731: done using an information spreading algorithm as described below.

732:

733: \item

734: Each node $i$ computes

735: $\hy_{i} = \frac{r}{\sum_{\ell = 1}^{r} \hW_{\ell}^{i}}$ as its

736: estimate of $\sum_{i = 1}^{n} y_{i}$.

737: \end{enumerate}

738: \hrulefill

739: \end{minipage}

740: \caption{An algorithm for computing separable functions.}

741: \label{compalg}

742: \end{figure}

743:

744: We describe how the minimum is computed as required by step

745: {\bf \ref{minstep}} of the algorithm in Section \ref{ssec:minima}.

746: The running time of the algorithm COMP depends on the running time of

747: the algorithm used to compute the minimum.

748:

749: Now, we show that COMP effectively estimates the function value $y$

750: when the estimates $\hW_{\ell}^{i}$ are all correct by providing a

751: lower bound on the conditional probability that the estimates produced

752: by COMP are all within a $(1 \pm \epsilon)$ factor of $y$.

753: \begin{lemma}

754: \label{lem:estaccuracy}

755: Let $y_{1}, \dots, y_{n}$ be real numbers (with $y_{i} \geq 1$ for

756: $i = 1, \dots, n$), $y = \sum_{i = 1}^{n} y_{i}$, and

757: $\bW = (\bW_{1}, \dots, \bW_{r})$, where the $\bW_{\ell}$ are as

758: defined in the algorithm COMP.  For any node $i$, let

759: $\hW^{i} = (\hW_{1}^{i}, \dots, \hW_{r}^{i})$, and let $\hy_{i}$ be

760: the estimate of $y$ obtained by node $i$ in COMP.  For any

761: $\epsilon \in (0, 1/2)$,

762: \[

763: \begin{split}

764: \Pr & \left( \cup_{i = 1}^{n} \left\{

765: \left|\hy_{i} - y \right| > 2 \epsilon y \right\}

766: \mid \forall i \in V, \: \hW^{i} = \bW \right)

767:  \leq 2\exp\left(-\frac{\epsilon^{2} r}{3} \right).

768: \end{split}

769: \]

770: \end{lemma}

771:

772: \begin{proof}

773: Observe that the estimate $\hy_{i}$ of $y$ at node $i$ is a function

774: of $r$ and $\hW^{i}$.  Under the hypothesis that $\hW^{i} = \bW$ for

775: all nodes $i \in V$, all nodes produce the same estimate

776: $\hy = \hy_{i}$ of $y$.  This estimate is

777: $\hy = r \left(\sum_{\ell = 1}^{r} \bW_{\ell} \right)^{-1}$, and so

778: $\hy^{-1} = \left(\sum_{\ell = 1}^{r} \bW_{\ell} \right)r^{-1}$.

779:

780: Property \ref{p1} implies that each of the $n$ random variables

781: $\bW_{1}, \dots, \bW_{r}$ has an exponential distribution with rate

782: $y$.  From Lemma \ref{discrete-to-cont}, it follows that for any

783: $\epsilon \in (0, 1/2)$,

784: \begin{equation}

785: \begin{split}

786: \Pr & \left( \left|\hy^{-1} - \frac{1}{y}\right|

787: > \frac{\epsilon}{y}

788: \;\Big|\; \forall i \in V, \: \hW^{i} = \bW \right)

789: ~ \leq 2\exp\left(-\frac{\epsilon^2 r}{3}\right).

790: \end{split}

791: \label{e1}

792: \end{equation}

793: This inequality bounds the conditional probability of the event

794: $\{\hy^{-1} \not\in

795: [(1 - \epsilon) y^{-1}, (1 + \epsilon)y^{-1}]\}$,

796: which is equivalent to the event

797: $\{\hy \not\in [(1 + \epsilon)^{-1}y, (1 - \epsilon)^{-1}y]\}$.

798: Now, for $\epsilon \in (0, 1/2)$,

799: \begin{equation}

800: \begin{split}

801: (1 - \epsilon)^{-1} & \in

802: \left[ 1 + \epsilon, 1 + 2\epsilon \right],

803: ~ (1 + \epsilon)^{-1}

804: ~ \in \left[1 - \epsilon, 1 - 2\epsilon/3\right].

805: \end{split}

806: \label{e2}

807: \end{equation}

808: Applying the inequalities in (\ref{e1}) and (\ref{e2}), we conclude

809: that for $\epsilon \in (0, 1/2)$,

810: \[

811: \begin{split}

812: \Pr & \left(\left| \hy - y \right| > 2 \epsilon y

813: \mid \forall i \in V, \: \hW^{i} = \bW \right) ~

814:  \leq 2 \exp\left(-\frac{\epsilon^2 r}{3}\right).

815: \end{split}

816: \]

817:

818: \noindent

819: Noting that the event

820: $\cup_{i = 1}^{n} \{|\hy_{i} - y| > 2 \epsilon y\}$ is equivalent to

821: the event $\{|\hy - y| > 2 \epsilon y\}$ when $\hW^{i} = \bW$ for all

822: nodes $i$ completes the proof of Lemma \ref{lem:estaccuracy}.

823: \end{proof}

824:

825: \subsection{Using information spreading to compute minima}

826: \label{ssec:minima}

827:

828: We now elaborate on step {\bf \ref{minstep}} of the algorithm COMP.

829: Each node $i$ in the graph starts this step with a vector

830: $W^{i} = (W_{1}^{i}, \dots, W_{r}^{i})$, and the nodes seek the vector

831: $\bW = (\bW_{1}, \dots, \bW_{r})$, where

832: $\bW_{\ell} = \min_{i = 1}^{n} W_{\ell}^{i}$.  In the information

833: spreading problem, each node $i$ has a message $m_{i}$, and the nodes

834: are to transmit messages across the links until every node has every

835: message.

836:

837: If all link capacities are infinite (i.e., in one time unit, a node

838: can send an arbitrary amount of information to another node), then an

839: information spreading algorithm $\cal{D}$ can be used directly to

840: compute the minimum vector $\bW$.  To see this, let the message

841: $m_{i}$ at the node $i$ be the vector $W^{i}$, and then apply the

842: information spreading algorithm to disseminate the vectors.  Once

843: every node has every message (vector), each node can compute $\bW$ as

844: the component-wise minimum of all the vectors.  This implies that the

845: running time of the resulting algorithm for computing $\bW$ is the

846: same as that of the information spreading algorithm.

847:

848: The assumption of infinite link capacities allows a node to transmit

849: an arbitrary number of vectors $W^{i}$ to a neighbor in one time unit.

850: A simple modification to the information spreading algorithm, however,

851: yields an algorithm for computing the minimum vector $\bW$ using links

852: of capacity $r$.  To this end, each node $i$ maintains a single

853: $r$-dimensional vector $w^{i}(t)$ that evolves in time, starting with

854: $w^{i}(0) = W^{i}$.

855:

856: Suppose that, in the information dissemination algorithm, node $j$

857: transmits the messages (vectors) $W^{i_{1}}, \dots, W^{i_{c}}$ to node

858: $i$ at time $t$.  Then, in the minimum computation algorithm, $j$

859: sends to $i$ the $r$ quantities $w_{1}, \dots, w_{r}$, where

860: $w_{\ell} = \min_{u = 1}^{c} W_{\ell}^{i_{u}}$.  The node $i$ sets

861: $w_{\ell}^{i}(t^{+}) = \min(w_{\ell}^{i}(t^{-}), w_{\ell})$ for

862: $\ell = 1, \dots, r$, where $t^{-}$ and $t^{+}$ denote the times

863: immediately before and after, respectively, the communication.  At any

864: time $t$, we will have $w^{i}(t) = \bW$ for all nodes $i \in V$ if, in

865: the information spreading algorithm, every node $i$ has all the

866: vectors $W^{1}, \dots, W^{n}$ at the same time $t$.  In this way, we

867: obtain an algorithm for computing the minimum vector $\bW$ that uses

868: links of capacity $r$ and runs in the same amount of time as the

869: information spreading algorithm.

870:

871: An alternative to using links of capacity $r$ in the computation of

872: $\bW$ is to make the time slot $r$ times larger, and impose a unit

873: capacity on all the links.  Now, a node transmits the numbers

874: $w_{1}, \dots, w_{r}$ to its communication partner over a period of

875: $r$ time slots, and as a result the running time of the algorithm for

876: computing $\bW$ becomes greater than the running time of the

877: information spreading algorithm by a factor of $r$.  The preceding

878: discussion, combined with the fact that nodes only gain messages as an

879: information spreading algorithm executes, leads to the following

880: lemma.

881: \begin{lemma}

882: \label{lem:mininfdis}

883: Suppose that the COMP algorithm is implemented using an information

884: spreading algorithm $\cal{D}$ as described above.  Let $\hW^{i}(t)$

885: denote the estimate of $\bW$ at node $i$ at time $t$.  For any

886: $\delta \in (0, 1)$, let $t_{m} = r T_{\cal{D}}^{\spr}(\delta)$.

887: Then, for any time $t \geq t_{m}$, with probability at least

888: $1 - \delta$, $\hW^{i}(t) = \bW$ for all nodes $i \in V$.

889: %% Further, under the worst-case, the algorithm COMP requires

890: %% $\Omega(T_{\spr}(\epsilon))$ time for all nodes to learn the estimate

891: %% $\hat{w}$ with probability at least $1 - \epsilon$.

892: \end{lemma}

893:

894: Note that the amount of data communicated between nodes during the

895: algorithm COMP depends on the values of the exponential random

896: variables generated by the nodes.  Since the nodes compute minima of

897: these variables, we are interested in a probabilistic lower bound on

898: the values of these variables (for example, suppose that the nodes

899: transmit the values $1/W_{\ell}^{i}$ when computing the minimum

900: $\bW_{\ell} = 1/\max_{i = 1}^{n} \{1/W_{\ell}^{i}\}$).

901: To this end, we use the fact that each $\bW_{\ell}$ is an exponential

902: random variable with rate $y$ to obtain that, for any constant

903: $c > 1$, the probability that any of the minimum values $\bW_{\ell}$

904: is less than $1/B$ (i.e., any of the inverse values $1/W_{\ell}^{i}$

905: is greater than $B$) is at most $\delta/c$, where $B$ is proportional

906: to $cry/\delta$.

907:

908: \subsection{Proof of Theorem \ref{thm:main1}}

909:

910: Now, we are ready to prove Theorem \ref{thm:main1}.  In particular, we

911: will show that the COMP algorithm has the properties claimed in

912: Theorem \ref{thm:main1}. To this end, consider using an information spreading algorithm $\cal{D}$ with

913: $\delta$-spreading time $T_{\cal{D}}^{\spr}(\delta)$ for

914: $\delta \in (0, 1)$ as the subroutine in the COMP algorithm.  For any

915: $\delta \in (0, 1)$, let $\tau_{m} = rT_{\cal{D}}^{\spr}(\delta/2)$.

916: By Lemma \ref{lem:mininfdis}, for any time $t \geq \tau_{m}$, the

917: probability that $\hW^{i} \neq \bW$ for any node $i$ at time $t$ is at

918: most $\delta/2$.

919:

920: On the other hand, suppose that $\hW^{i} = \bW$ for all nodes $i$ at

921: time $t \geq \tau_{m}$.  For any $\epsilon \in (0, 1)$, by choosing

922: $r \geq 12 \epsilon^{-2} \log (4 \delta^{-1})$ so that

923: $r = \Theta(\epsilon^{-2}(1 + \log \delta^{-1}))$, we obtain from

924: Lemma \ref{lem:estaccuracy} that

925: \begin{equation}

926: \begin{split}

927: \Pr & \left(\cup_{i = 1}^{n} \left\{\hat{y}_{i}

928: \not\in \left[ (1 - \epsilon) y, (1 + \epsilon) y \right] \right\}

929: \mid \forall i \in V, \: \hW^{i} = \bW \right) ~

930: ~ \leq \delta/2.

931: \end{split}

932: \label{e3}

933: \end{equation}

934: Recall that $T_{COMP}^{\cmp}(\epsilon, \delta)$ is the smallest time

935: $\tau$ such that, under the algorithm COMP, at any time $t \geq \tau$,

936: all the nodes have an estimate of the function value $y$ within a

937: multiplicative factor of $(1 \pm \epsilon)$ with probability at least

938: $1 - \delta$.  By a straightforward union bound of events and

939: (\ref{e3}), we conclude that, for any time $t \geq \tau_{m}$,

940: \[

941: \Pr \left(\cup_{i = 1}^{n} \left\{\hat{y}_{i} \not\in

942: \left[ (1 - \epsilon) y, (1 + \epsilon) y \right] \right\} \right)

943: \leq \delta.

944: \]

945: For any $\epsilon \in (0, 1)$ and $\delta \in (0, 1)$, we now have, by

946: the definition of $(\epsilon, \delta)$-computing time,

947: \begin{eqnarray*}

948: T_{COMP}^{\cmp}(\epsilon, \delta)

949: & \leq & \tau_{m} \\

950: & = & O \left(\epsilon^{-2} (1 + \log \delta^{-1})

951: T_{\cal{D}}^{\spr}(\delta/2) \right).

952: \end{eqnarray*}

953: This completes the proof of Theorem \ref{thm:main1}.

954: %\end{proof}

955:

956: \section{Information spreading}

957: \label{sec:infdis}

958:

959: In this section, we analyze a randomized gossip algorithm for

960: information spreading.  The method by which nodes choose partners to

961: contact when initiating a communication and the data transmitted

962: during the communication are the same for both time models defined in

963: Section \ref{sec:prelim}.  These models differ in the times at which

964: nodes contact each other: in the asynchronous model, only one node can

965: start a communication at any time, while in the synchronous model all

966: the nodes can communicate in each time slot.

967:

968: %% \noindent

969: %% {\bf Information spreading algorithm.} When a node initiates a

970: %% communication at (absolute) time $t$, it chooses another node to

971: %% contact at random as described in Section \ref{ssec:contrib}.

972: %% To specify the gossip protocol, which determines the data transmitted

973: %% during the communication, we introduce the following notation.

974:

975: The information spreading algorithm that we study is presented in

976: Fig. \ref{spreadalg}, which makes use of the following notation.  Let

977: $M_{i}(t)$ denote the set of messages node $i$ has at time $t$.

978: Initially, $M_{i}(0) = \{m_{i}\}$ for all $i \in V$.  For a

979: communication that occurs at time $t$, let $t^{-}$ and $t^{+}$

980: denote the times immediately before and after, respectively, the

981: communication occurs.

982:

983: As mentioned in Section \ref{ssec:contrib}, the nodes choose

984: communication partners according to the probability distribution

985: defined by an $n \times n$ matrix $P$.  The matrix $P$ is

986: non-negative and stochastic, and satisfies $P_{ij} = 0$ for any pair

987: of nodes $i \neq j$ such that $(i, j) \not\in E$.  For each such

988: matrix $P$, there is an instance of the information spreading

989: algorithm, which we refer to as SPREAD($P$).

990: \begin{figure}[htbp]

991: \centering

992: \begin{minipage}{\textwidth}

993: \hrulefill

994:

995: \noindent

996: {\bf Algorithm SPREAD($P$)}

997:

998: \noindent

999: When a node $i$ initiates a communication at time $t$:

1000:

1001: \renewcommand{\labelenumi}{{\bf \arabic{enumi}.}}

1002: \begin{enumerate}

1003:

1004: \item

1005: Node $i$ chooses a node $u$ at random, and contacts $u$.  The choice

1006: of the communication partner $u$ is made independently of all other

1007: random choices, and the probability that node $i$ chooses any node $j$

1008: is $P_{ij}$.

1009:

1010: \item

1011: Nodes $u$ and $i$ exchange all of their messages, so that

1012: \[

1013: M_{i}(t^{+}) = M_u(t^+) =  M_{i}(t^{-}) \cup M_{u}(t^{-}).

1014: \]

1015: \end{enumerate}

1016: \hrulefill

1017: \end{minipage}

1018: \caption{A gossip algorithm for information spreading.}

1019: \label{spreadalg}

1020: \end{figure}

1021:

1022: We note that the data transmitted between two communicating nodes in

1023: SPREAD conform to the {\em push and pull mechanism}.  That is, when node $i$

1024: contacts node $u$ at time $t$, both nodes $u$ and $i$ exchange all of

1025: their information with each other.  We also note that the

1026: description in the algorithm  assumes that the communication

1027: links in the network have infinite capacity.  As discussed in Section

1028: \ref{ssec:minima}, however, an information spreading algorithm that

1029: uses links of infinite capacity can be used to compute minima using

1030: links of unit capacity.

1031:

1032: This algorithm is simple, distributed, and satisfies the transmitter

1033: gossip constraint.  We now present analysis of the information

1034: spreading time of SPREAD($P$) for doubly stochastic matrices $P$ in

1035: the two time models.  The goal of the analysis is to prove Theorem

1036: \ref{thm:main2}.  To this end, for any $i \in V$, let

1037: $S_{i}(t) \subseteq V$ denote the set of nodes that have the message

1038: $m_{i}$ after any communication events that occur at absolute time $t$

1039: (communication events occur on a global clock tick in the asynchronous

1040: time model, and in each time slot in the synchronous time model).  At

1041: the start of the algorithm, $S_{i}(0) = \{i\}$.

1042:

1043: \subsection{Asynchronous model}

1044:

1045: As described in Section \ref{sec:prelim}, in the asynchronous time

1046: model the global clock ticks according to a Poisson process of rate

1047: $n$, and on a tick one of the $n$ nodes is chosen uniformly at random.

1048: This node initiates a communication, so the times at which the

1049: communication events occur correspond to the ticks of the clock.  On

1050: any clock tick, at most one pair of nodes can exchange messages by

1051: communicating with each other.

1052:

1053: Let $k \geq 0$ denote the index of a clock tick.  Initially, $k = 0$,

1054: and the corresponding absolute time is $0$.  For simplicity of

1055: notation, we identify the time at which a clock tick occurs with its

1056: index, so that $S_{i}(k)$ denotes the set of nodes that have the

1057: message $m_{i}$ at the end of clock tick $k$.  The following lemma

1058: provides a bound on the number of clock ticks required for every node

1059: to receive every message.

1060: \begin{lemma}

1061: \label{lem:asynchticks}

1062: For any $\delta \in (0, 1)$, define

1063: \[

1064: K(\delta)

1065: = \inf\{k \geq 0:

1066: \Pr(\cup_{i = 1}^{n} \{S_{i}(k) \neq V\}) \leq \delta \}.

1067: \]

1068: Then,

1069: \[

1070: K(\delta)

1071: = O\left( n \frac{\log n + \log \delta^{-1}}{\Phi(P)}\right).

1072: \]

1073: \end{lemma}

1074:

1075: \begin{proof}

1076: Fix any node $v \in V$.  We study the evolution of the size of the set

1077: $S_{v}(k)$.  For simplicity of notation, we drop the subscript $v$,

1078: and write $S(k)$ to denote $S_{v}(k)$.

1079:

1080: Note that $|S(k)|$ is monotonically non-decreasing over the course of

1081: the algorithm, with the initial condition $|S(0)| = 1$.  For the

1082: purpose of analysis, we divide the execution of the algorithm into two

1083: phases based on the size of the set $S(k)$.  In the first phase,

1084: $|S(k)| \leq n/2$, and in the second phase $|S(k)| > n/2$.

1085:

1086: Under the gossip algorithm, after clock tick $k + 1$, we have either

1087: $|S(k + 1)| = |S(k)|$ or $|S(k + 1)| = |S(k)| + 1$.  Further, the size

1088: increases if a node $i \in S(k)$ contacts a node $j \notin S(k)$, as

1089: in this case $i$ will push the message $m_{v}$ to $j$.  For each such

1090: pair of nodes $i$, $j$, the probability that this occurs on clock tick

1091: $k + 1$ is $P_{ij}/n$.  Since only one node is active on each clock

1092: tick,

1093: \begin{equation}

1094: E[|S(k + 1)| - |S(k)| \mid S(k)]

1095: \geq \sum_{i \in S(k), j \notin S(k)} \frac{P_{ij}}{n}.

1096: \label{expinc}

1097: \end{equation}

1098:

1099: \noindent

1100: When $|S(k)| \leq n/2$, it follows from (\ref{expinc}) and the

1101: definition of the conductance $\Phi(P)$ of $P$ that

1102: \begin{eqnarray}

1103: E[|S(k + 1)| - |S(k)| \mid S(k)]

1104: & \geq & \frac{|S(k)|}{n}

1105: \frac{\sum_{i \in S(k), j \notin S(k)} P_{ij} }{|S(k)|}

1106: \notag \\

1107: & \geq &

1108: \frac{|S(k)|}{n} \min_{S \subset V, \; 0 < |S| \leq n/2}

1109: \frac{\sum_{ i \in S, j \notin S} P_{ij}}{|S|}

1110: \notag \\

1111: & = & \frac{|S(k)|}{n} \Phi(P)

1112: \notag \\

1113: & = & |S(k)| \hat{\Phi},

1114: \label{e4}

1115: \end{eqnarray}

1116:

1117: \noindent

1118: where $\hat{\Phi} = \frac{\Phi(P)}{n}$.

1119:

1120: We seek an upper bound on the duration of the first phase.  To this

1121: end, let

1122: \begin{equation*}

1123: Z(k) = \frac{\exp \left(\frac{\hat{\Phi}}{4} k \right)}{|S(k)|}.

1124: \end{equation*}

1125:

1126: \noindent

1127: Define the stopping time $L = \inf \{k : |S(k)| > n/2\}$, and

1128: $L \land k = \min(L, k)$.  If $|S(k)| > n/2$, then

1129: $L \land (k + 1) = L \land k$, and thus

1130: $E[Z(L \land (k + 1)) \mid S(L \land k)] = Z(L \land k)$.

1131:

1132: Now, suppose that $|S(k)| \leq n/2$, in which case

1133: $L \land (k + 1) = (L \land k) + 1$.  The function $g(z) = 1/z$ is

1134: convex for $z > 0$, which implies that, for $z_{1}, z_{2} > 0$,

1135: \begin{equation}

1136: g(z_{2}) \geq g(z_{1}) + g'(z_{1})(z_{2} - z_{1}).

1137: \label{convderunder}

1138: \end{equation}

1139:

1140: \noindent

1141: Applying \eqref{convderunder} with $z_{1} = |S(k + 1)|$ and

1142: $z_{2} = |S(k)|$ yields

1143: \begin{equation*}

1144: \frac{1}{|S(k + 1)|}

1145: \leq \frac{1}{|S(k)|}

1146: - \frac{1}{|S(k + 1)|^{2}} (|S(k + 1)| - |S(k)|).

1147: \end{equation*}

1148:

1149: \noindent

1150: Since $|S(k + 1)| \leq |S(k)| + 1 \leq 2|S(k)|$, it follows that

1151: \begin{equation}

1152: \frac{1}{|S(k + 1)|}

1153: \leq \frac{1}{|S(k)|}

1154: - \frac{1}{4 |S(k)|^{2}} (|S(k + 1)| - |S(k)|).

1155: \label{invsizeupper}

1156: \end{equation}

1157:

1158: Combining \eqref{e4} and \eqref{invsizeupper}, we obtain that, if

1159: $|S(k)| \leq n/2$, then

1160: \begin{equation*}

1161: E \left[\frac{1}{|S(k + 1)|} \;\Big|\; S(k) \right]

1162: \leq \frac{1}{|S(k)|} \left(1 - \frac{\hat{\Phi}}{4} \right)

1163: \leq \frac{1}{|S(k)|} \exp \left(-\frac{\hat{\Phi}}{4} \right),

1164: \end{equation*}

1165:

1166: \noindent

1167: as $1 - z \leq \exp(-z)$ for $z \geq 0$.  This implies that

1168: \begin{eqnarray*}

1169: E[Z(L \land (k + 1)) \mid S(L \land k)]

1170: & = & E \left[

1171: \frac{\exp \left(\frac{\hat{\Phi}}{4} (L \land (k + 1)) \right)}

1172: {|S(L \land (k + 1))|} \;\bigg|\; S(L \land k) \right]

1173: \notag \\

1174: & = & \exp \left(\frac{\hat{\Phi}}{4} (L \land k) \right)

1175: \exp \left(\frac{\hat{\Phi}}{4} \right)

1176: E \left[ \frac{1}

1177: {|S((L \land k) + 1)|} \;\Big|\; S(L \land k) \right]

1178: \notag \\

1179: & \leq & \exp \left(\frac{\hat{\Phi}}{4} (L \land k) \right)

1180: \exp \left(\frac{\hat{\Phi}}{4} \right)

1181: \exp \left(-\frac{\hat{\Phi}}{4} \right)

1182: \frac{1}{|S(L \land k)|}

1183: \notag \\

1184: & = & Z(L \land k),

1185: \end{eqnarray*}

1186:

1187: \noindent

1188: and therefore $Z(L \land k)$ is a supermartingale.

1189:

1190: Since $Z(L \land k)$ is a supermartingale, we have the inequality

1191: $E[Z(L \land k)] \leq E[Z(L \land 0)] = 1$ for any $k > 0$, as

1192: $Z(L \land 0) = Z(0) = 1$.  The fact that the set $S(k)$ can contain

1193: at most the $n$ nodes in the graph implies that

1194: \begin{equation*}

1195: Z(L \land k)

1196: = \frac{\exp \left(\frac{\hat{\Phi}}{4} (L \land k) \right)}

1197: {|S(L \land k)|}

1198: \geq \frac{1}{n} \exp \left(\frac{\hat{\Phi}}{4} (L \land k) \right),

1199: \end{equation*}

1200:

1201: \noindent

1202: and so

1203: \begin{equation*}

1204: E \left[\exp \left(\frac{\hat{\Phi}}{4} (L \land k) \right) \right]

1205: \leq n E[Z(L \land k)] \leq n.

1206: \end{equation*}

1207:

1208: \noindent

1209: Because $\exp(\hat{\Phi}(L \land k)/4) \uparrow \exp (\hat{\Phi}L/4)$

1210: as $k \to \infty$, the monotone convergence theorem implies that

1211: \begin{equation*}

1212: E \left[\exp \left(\frac{\hat{\Phi}L}{4} \right) \right] \leq n.

1213: \end{equation*}

1214:

1215: \noindent

1216: Applying Markov's inequality, we obtain that, for

1217: $k_{1} = 4(\ln 2 + 2 \ln n + \ln (1/\delta))/\hat{\Phi}$,

1218: \begin{eqnarray}

1219: \Pr (L > k_{1})

1220: & = & \Pr \left(\exp \left(\frac{\hat{\Phi} L}{4} \right)

1221: > \frac{2n^{2}}{\delta} \right)

1222: \notag \\

1223: & < & \frac{\delta}{2n}.

1224: \label{e6a}

1225: \end{eqnarray}

1226:

1227: For the second phase of the algorithm, when $|S(k)| > n/2$, we study

1228: the evolution of the size of the set of nodes that do not have the

1229: message, $|S(k)^{c}|$.  This quantity will decrease as the message

1230: spreads from nodes in $S(k)$ to nodes in $S(k)^{c}$.  For simplicity,

1231: let us consider restarting the process from clock tick $0$ after $L$

1232: (i.e., when more than half the nodes in the graph have the message),

1233: so that we have $|S(0)^{c}| \leq n/2$.

1234:

1235: In clock tick $k + 1$, a node $j \in S(k)^{c}$ will receive the

1236: message if it contacts a node $i \in S(k)$ and pulls the message from

1237: $i$.  As such,

1238: \begin{equation*}

1239: E[|S(k)^{c}| - |S(k + 1)^{c}| \mid S(k)^{c}]

1240: \geq \sum_{j \in S(k)^{c}, i \notin S(k)^{c}} \frac{P_{ji}}{n},

1241: \end{equation*}

1242:

1243: \noindent

1244: and thus

1245: \begin{eqnarray}

1246: \notag

1247: E[|S(k + 1)^{c}| \mid S(k)^{c}]

1248: & \leq & |S(k)^c|

1249: - \frac{\sum_{j \in S(k)^{c}, i \notin S(k)^c} P_{ji}}{n}

1250: \notag \\

1251: & = & |S(k)^c|

1252: \lf(1

1253: - \frac{\sum_{j \in S(k)^c, i \notin S(k)^c} P_{ji}}{n |S(k)^c|} \rf)

1254: \notag \\

1255: & \leq & |S(k)^c| \lf( 1 - \hat{\Phi} \rf).

1256: \label{e:5}

1257: \end{eqnarray}

1258:

1259: We note that this inequality holds even when $|S(k)^{c}| = 0$, and as

1260: a result it is valid for all clock ticks $k$ in the second phase.

1261: Repeated application of \eqref{e:5} yields

1262: \begin{eqnarray*}

1263: E[|S(k)^{c}|]

1264: & = & E[E[|S(k)^{c}| \mid S(k - 1)^{c}]] \\

1265: & \leq & \left(1 - \hat{\Phi} \right)E[|S(k - 1)^{c}|] \\

1266: & \leq & \left(1 - \hat{\Phi} \right)^{k} E[|S(0)^{c}|] \\

1267: & \leq & \exp \left(-\hat{\Phi} k \right) \left(\frac{n}{2} \right)

1268: %\label{e7}

1269: \end{eqnarray*}

1270:

1271: For

1272: $k_{2} = \ln (n^{2}/\delta)/2\hat{\Phi} =

1273: (2\ln n + \ln (1/\delta))/\hat{\Phi}$,

1274: we have $E[|S(k_{2})^{c}|] \leq \delta/(2n)$.  Markov's inequality now

1275: implies the following upper bound on the probability that not all of

1276: the nodes have the message at the end of clock tick $k_{2}$ in the

1277: second phase.

1278: \begin{eqnarray}

1279: \Pr(|S(k_{2})^{c}| > 0) & = & \Pr(|S(k_{2})^{c}| \geq 1)

1280: \notag \\

1281: & \leq & E[|S(k_{2})^{c}|]

1282: \notag \\

1283: & \leq & \frac{\delta}{2n}.

1284: \label{e8}

1285: \end{eqnarray}

1286:

1287: Combining the analysis of the two phases, we obtain that, for

1288: $k' = k_{1} + k_{2} = O((\log n + \log \delta^{-1})/\hat{\Phi})$,

1289: $\Pr(S_{v}(k') \neq V) \leq \delta/n$.  Applying the union bound over

1290: all the nodes in the graph, and recalling that

1291: $\hat{\Phi} = \Phi(P)/n$, we conclude that

1292: \begin{eqnarray*}

1293: K(\delta)

1294: & \leq & k'

1295: ~ = ~ O\left(n \frac{\log n + \log \delta^{-1}}{\Phi(P)}\right).

1296: \end{eqnarray*}

1297: This completes the proof of Lemma \ref{lem:asynchticks}.

1298: \end{proof}

1299:

1300: To extend the bound in Lemma \ref{lem:asynchticks} to absolute time,

1301: observe that Corollary \ref{discrete-to-contc} implies that the

1302: probability that

1303: $\kappa = K(\delta/3) + 27 \ln (3/\delta) =

1304: O(n(\log n + \log \delta^{-1})/\Phi(P))$

1305: clock ticks do not occur in absolute time

1306: $(4/3) \kappa/n = O((\log n + \log \delta^{-1})/\Phi(P))$ is at most

1307: $2 \delta/3$.  Applying the union bound now yields

1308: $T_{SPREAD(P)}^{\spr}(\delta) =

1309: O((\log n + \log \delta^{-1})/\Phi(P))$,

1310: thus establishing the  upper bound in Theorem \ref{thm:main2} for the

1311: asynchronous time model.

1312:

1313: \subsection{Synchronous model}

1314:

1315: In the synchronous time model, in each time slot every node contacts a

1316: neighbor to exchange messages.  Thus, $n$ communication events may

1317: occur simultaneously.  Recall that absolute time is measured in rounds

1318: or time slots in the synchronous model.

1319:

1320: The analysis of the randomized gossip algorithm for information

1321: spreading in the synchronous model is similar to the analysis for the

1322: asynchronous model.  However, we need additional analytical arguments

1323: to reach analogous conclusions due to the technical challenges

1324: presented by multiple simultaneous transmissions.

1325:

1326: In this section, we sketch a proof of the time bound in Theorem

1327: \ref{thm:main2},

1328: $T_{SPREAD(P)}^{\spr}(\delta) =

1329: O((\log n + \log \delta^{-1})/\Phi(P))$,

1330: for the synchronous time model.  Since the proof follows a similar

1331: structure as the proof of Lemma \ref{lem:asynchticks}, we only point

1332: out the significant differences.

1333:

1334: As before, we fix a node $v \in V$, and study the evolution of the

1335: size of the set $S(t) = S_{v}(t)$.  Again, we divide the execution of

1336: the algorithm into two phases based on the evolution of $S(t)$: in the

1337: first phase $|S(t)| \leq n/2$, and in the second phase

1338: $|S(t)| > n/2$. In the first phase, we analyze the increase in

1339: $|S(t)|$, while in the second we study the decrease in $|S(t)^{c}|$.

1340: For the purpose of analysis, in the first phase we ignore the effect

1341: of the increase in $|S(t)|$ due to the {\em pull} aspect of protocol:

1342: that is, when node $i$ contacts node $j$, we assume (for the purpose

1343: of analysis) that $i$ sends the messages it has to $j$, but that $j$

1344: does not send any messages to $i$.  Clearly, an upper bound obtained

1345: on the time required for every node to receive every message under

1346: this restriction is also an upper bound for the actual algorithm.

1347:

1348: %%

1349: Consider a time slot $t + 1$ in the first phase.  For $j \notin S(t)$,

1350: let $X_{j}$ be an indicator random variable that is $1$ if node $j$

1351: receives the message $m_{v}$ via a push from some node $i \in S(t)$ in

1352: time slot $t + 1$, and is $0$ otherwise.  The probability that $j$

1353: does not receive $m_{v}$ via a push is the probability that no node

1354: $i \in S(t)$ contacts $j$, and so

1355: \begin{eqnarray}

1356: E[X_{j} \mid S(t)]

1357: & = & 1 - \Pr(X_{j} = 0 \mid S(t))

1358: \notag \\

1359: & = & 1 - \prod_{i \in S(t)} (1 - P_{ij})

1360: \notag \\

1361: & \geq & 1 - \prod_{i \in S(t)} \exp(-P_{ij})

1362: \notag \\

1363: & = & 1 - \exp \left(-\sum_{i \in S(t)} P_{ij} \right).

1364: \label{pullproblower}

1365: \end{eqnarray}

1366:

1367: \noindent

1368: The Taylor series expansion of $\exp(-z)$ about $z = 0$ implies that,

1369: if $0 \leq z \leq 1$, then

1370: \begin{equation}

1371: \exp(-z) \leq 1 - z + z^{2}/2 \leq 1 - z + z/2 = 1 - z/2.

1372: \label{taylorexpsecondterm}

1373: \end{equation}

1374:

1375: \noindent

1376: For a doubly stochastic matrix $P$, we have

1377: $0 \leq \sum_{i \in S(t)} P_{ij} \leq 1$, and so we can combine

1378: \eqref{pullproblower} and \eqref{taylorexpsecondterm} to obtain

1379: \begin{equation*}

1380: E[X_{j} \mid S(t)]

1381: \geq \frac{1}{2} \sum_{i \in S(t)} P_{ij}.

1382: \end{equation*}

1383:

1384: By linearity of expectation,

1385: \begin{eqnarray*}

1386: E[|S(t + 1)| - |S(t)| \mid S(t)]

1387: & = & \sum_{j \not\in S(t)} E[X_{j} \mid S(t)]

1388: \notag \\

1389: & \geq & \frac{1}{2} \sum_{i \in S(t), j \not\in S(t)} P_{ij}

1390: \notag \\

1391: & = & \frac{|S(t)|}{2}

1392: \frac{\sum_{i \in S(t), j \not\in S(t)} P_{ij}}{|S(t)|}.

1393: \end{eqnarray*}

1394:

1395: \noindent

1396: When $|S(t)| \leq n/2$, we have

1397: \begin{equation}

1398: E[|S(t + 1)| - |S(t)| \mid S(t)]

1399: \geq |S(t)| \frac{\Phi(P)}{2}.

1400: \label{synchexpinccond}

1401: \end{equation}

1402:

1403: Inequality \eqref{synchexpinccond} is analogous to inequality

1404: \eqref{e4} for the asynchronous time model, with $\Phi(P)/2$ in the

1405: place of $\hat{\Phi}$.  We now proceed as in the proof of Lemma

1406: \ref{lem:asynchticks} for the asynchronous model.  Note that

1407: $|S(t + 1)| \leq 2 |S(t)|$ here in the synchronous model because of

1408: the restriction in the analysis to only consider the push aspect of

1409: the protocol in the first phase, as each node in $S(t)$ can push a

1410: message to at most one other node in a single time slot.  Repeating

1411: the analysis from the asynchronous model leads to the conclusion that

1412: the first phase of the algorithm ends in

1413: $O\lf(\frac{\log n + \log \delta^{-1}}{{\Phi(P)}}\rf)$ time with

1414: probability at least $1 - \delta/2n$.

1415:

1416: The analysis of the second phase is the same as that presented for the

1417: asynchronous time model, with $\hat{\Phi}$ replaced by $\Phi$.  As a

1418: summary, we obtain that it takes at most

1419: $O\lf(\frac{\log n + \log \delta^{-1}}{{\Phi(P)}}\rf)$ time for the

1420: algorithm to spread all the messages to all the nodes with probability

1421: at least $1 - \delta$.  This completes the proof of Theorem

1422: \ref{thm:main2} for the synchronous time model.

1423:

1424:

1425: \section{Applications}

1426: \label{sec:appl}

1427:

1428: We study here the application of our preceding results to several

1429: types of graphs.  In particular, we consider complete graphs,

1430: constant-degree expander graphs, and grid graphs.  We use grid graphs

1431: as an example to compare the performance of our algorithm for

1432: computing separable functions with that of a known iterative averaging

1433: algorithm.

1434:

1435: For each of the three classes of graphs mentioned above, we are

1436: interested in the $\delta$-information-spreading time

1437: $T_{SPREAD(P)}^{\spr}(\delta)$, where $P$ is a doubly stochastic

1438: matrix that assigns equal probability to each of the neighbors of any

1439: node.  Specifically, the probability $P_{ij}$ that a node $i$ contacts

1440: a node $j \neq i$ when $i$ becomes active is $1/\Delta$, where

1441: $\Delta$ is the maximum degree of the graph, and

1442: $P_{ii} = 1 - d_{i}/\Delta$, where $d_{i}$ is the degree of $i$.

1443: Recall from Theorem \ref{thm:main1} that the information dissemination

1444: algorithm SPREAD($P$) can be used as a subroutine in an algorithm for

1445: computing separable functions, with the running time of the resulting

1446: algorithm being a function of $T_{SPREAD(P)}^{\spr}(\delta)$.

1447:

1448: \subsection{Complete graph}

1449:

1450: On a complete graph, the transition matrix $P$ has $P_{ii} = 0$ for

1451: $i = 1, \dots, n$, and $P_{ij} = 1/(n - 1)$ for $j \neq i$.  This

1452: regular structure allows us to directly evaluate the conductance of

1453: $P$, which is $\Phi(P) \approx 1/2$.  This implies that the

1454: ($\epsilon$, $\delta$)-computing time of the algorithm for computing

1455: separable functions based on SPREAD($P$) is

1456: $O(\epsilon^{-2} (1 + \log \delta^{-1})(\log n + \log \delta^{-1}))$.

1457: Thus, for a constant $\epsilon \in (0, 1)$ and $\delta = 1/n$, the

1458: computation time scales as $O(\log^{2} n)$.

1459:

1460: \subsection{Expander graph}

1461:

1462: Expander graphs have been used for numerous applications, and explicit

1463: constructions are known for constant-degree expanders \cite{rvw}.  We

1464: consider here an undirected graph in which the maximum degree of any

1465: vertex, $\Delta$, is a constant.  Suppose that the edge expansion of

1466: the graph is

1467: \begin{equation*}

1468: \min_{S \subset V, \; 0 < |S| \leq n/2}

1469: \frac{|F(S, S^{c})|}{|S|} = \alpha,

1470: \end{equation*}

1471:

1472: \noindent

1473: where $F(S, S^{c})$ is the set of edges in the cut $(S, S^{c})$, and

1474: $\alpha > 0$ is a constant.  The transition matrix $P$ satisfies

1475: $P_{ij} = 1/\Delta$ for all $i \neq j$ such that $(i, j) \in E$, from

1476: which we obtain $\Phi(P) \geq \alpha/\Delta$.  When $\alpha$ and

1477: $\Delta$ are constants, this leads to a similar conclusion as in the

1478: case of the complete graph: for any constant $\epsilon \in (0, 1)$ and

1479: $\delta = 1/n$, the computation time is $O(\log^{2} n)$.

1480:

1481: \subsection{Grid}

1482: \label{gridsec}

1483:

1484: We now consider a $d$-dimensional grid graph on $n$ nodes, where

1485: $c = n^{1/d}$ is an integer.  Each node in the grid can be represented

1486: as a $d$-dimensional vector $a = (a_{i})$, where

1487: $a_{i} \in \{1, \dots, c\}$ for $1 \leq i \leq d$.  There is one node

1488: for each distinct vector of this type, and so the total number of

1489: nodes in the graph is $c^{d} = (n^{1/d})^{d} = n$.  For any two nodes

1490: $a$ and $b$, there is an edge $(a, b)$ in the graph if and only if,

1491: for some $i \in \{1, \dots, d\}$, $|a_{i} - b_{i}| = 1$, and

1492: $a_{j} = b_{j}$ for all $j \neq i$.

1493:

1494: In \cite{isogrid}, it is shown that the isoperimetric number of this

1495: grid graph is

1496: \begin{equation*}

1497: \min_{S \subset V, \; 0 < |S| \leq n/2}

1498: \frac{|F(S, S^{c})|}{|S|}

1499: = \Theta \left(\frac{1}{c} \right)

1500: = \Theta \left(\frac{1}{n^{1/d}} \right).

1501: \end{equation*}

1502:

1503: \noindent

1504: By the definition of the edge set, the maximum degree of a node in the

1505: graph is $2d$.  This means that $P_{ij} = 1/(2d)$ for all $i \neq j$

1506: such that $(i, j) \in E$, and it follows that

1507: $\Phi(P) = \Omega \left(\frac{1}{dn^{1/d}} \right)$.  Hence, for any

1508: $\epsilon \in (0, 1)$ and $\delta \in (0, 1)$, the

1509: ($\epsilon$, $\delta$)-computing time of the algorithm for computing

1510: separable functions is

1511: $O(\epsilon^{-2} (1 + \log \delta^{-1})(\log n + \log \delta^{-1})

1512: d n^{1/d})$.

1513:

1514: \subsection{Comparison with Iterative Averaging}

1515:

1516: We briefly contrast the performance of our algorithm for computing

1517: separable functions with that of the iterative averaging algorithms in

1518: \cite{tsitsiklis-thesis} \cite{bgps}.  As noted earlier, the

1519: dependence of the performance of our algorithm is in proportion to

1520: $1/\Phi(P)$, which is a lower bound for the iterative algorithms based

1521: on a stochastic matrix $P$.

1522:

1523: In particular, when our algorithm is used to compute the average of a

1524: set of numbers (by estimating the sum of the numbers and the number of

1525: nodes in the graph) on a $d$-dimensional grid graph, it follows from

1526: the analysis in Section \ref{gridsec} that the amount of time required

1527: to ensure the estimate is within a $(1 \pm \epsilon)$ factor of the

1528: average with probability at least $1 - \delta$ is

1529: $O(\epsilon^{-2} (1 + \log \delta^{-1})(\log n + \log \delta^{-1})

1530: dn^{1/d})$

1531: for any $\epsilon \in (0, 1)$ and $\delta \in (0, 1)$.  So, for a

1532: constant $\epsilon \in (0, 1)$ and $\delta = 1/n$, the computation

1533: time scales as $O(dn^{1/d} \log^{2} n)$ with the size of the graph,

1534: $n$.  The algorithm in \cite{bgps} requires $\Omega(n^{2/d} \log n)$

1535: time for this computation.  Hence, the running time of our algorithm

1536: is (for fixed $d$, and up to logarithmic factors) the {\em square

1537: root} of the runnning time of the iterative algorithm!  This

1538: relationship holds on other graphs for which the spectral gap is

1539: proportional to the square of the conductance.

1540:

1541:

1542: \section{Conclusions and Future Work}

1543: \label{sec:conc}

1544:

1545: In this paper, we presented a novel algorithm for computing separable

1546: functions in a totally distributed manner.  The algorithm is based on

1547: properties of exponential random variables, and the fact that the

1548: minimum of a collection of numbers is an order- and

1549: duplicate-insensitive statistic.

1550:

1551: Operationally, our algorithm makes use of an information spreading

1552: mechanism as a subroutine.  This led us to the analysis of a

1553: randomized gossip mechanism for information spreading.  We obtained an

1554: upper bound on the information spreading time of this algorithm in

1555: terms of the conductance of a matrix that characterizes the algorithm.

1556:

1557: In addition to computing separable functions, our algorithm improves

1558: the computation time for the canonical task of averaging.  For

1559: example, on graphs such as paths, rings, and grids, the performance of

1560: our algorithm is of a smaller order than that of a known iterative

1561: algorithm.

1562:

1563: We believe that our algorithm will lead to the following totally

1564: distributed computations: (1) an approximation algorithm for convex

1565: minimization with linear constraints; and (2) a ``packet marking''

1566: mechanism in the Internet.  These areas, in which summation is a key

1567: subroutine, will be topics of our future research.

1568:

1569: \section{Acknowledgments}

1570:

1571: We thank Ashish Goel for a useful discussion and providing

1572: suggestions, based on previous work \cite{Goel}, when we started this

1573: work.

1574:

1575: \bibliographystyle{abbrv}

1576:

1577: \bibliography{rumorspreading}

1578:

1579: \end{document}

1580: