cs0612012/cs0612012
1: \documentclass[12pt]{article}
2: 
3: \usepackage{amsmath,amssymb, fullpage}
4: \usepackage{graphicx}
5: \usepackage[latin1]{inputenc}
6: \usepackage[T1]{fontenc}
7: \usepackage{lmodern}
8: \usepackage[pdfstartview={FitH}]{hyperref}
9: \usepackage{algorithm}
10: \usepackage{algorithmic}
11: \setlength{\oddsidemargin}{0.25 in}
12: \setlength{\evensidemargin}{-0.25 in} \setlength{\topmargin}{-0.6
13: in} \setlength{\textwidth}{6.5 in} \setlength{\textheight}{8.5 in}
14: \setlength{\headsep}{0.75 in} \setlength{\parindent}{0 in}
15: \setlength{\parskip}{0.1 in}
16: \newcommand{\lecture}[4]{
17:    \pagestyle{myheadings}
18:    \thispagestyle{plain}
19:    \newpage
20:    \setcounter{page}{1}
21:    \noindent
22: }
23: \newtheorem{theorem}{Theorem}
24: \newtheorem{lemma}{Lemma}
25: \newtheorem{proposition}{Proposition}
26: \newtheorem{claim}{Claim}
27: \newtheorem{corollary}[theorem]{Corollary}
28: \newtheorem{defn}{Definition}
29: \newtheorem{construction}{Construction}
30: \newtheorem{exercise}{Exercise}
31: \newtheorem{example}{Example}
32: \newtheorem{open}[theorem]{Open Question}
33: \newtheorem{notation}{Notation}
34: %\newtheorem{algorithm}{Algorithm}
35: \newtheorem{observation}{Observation}
36: %\newtheorem{conjecture}[theorem]{Conjecture}
37: 
38: \def\beq{\begin{eqnarray}}
39: \def\eeq{\end{eqnarray}}
40: \def\beqs{\begin{eqnarray*}}
41: \def\eeqs{\end{eqnarray*}}
42: %% Blackboard bold symbols %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
43: \newcommand{\N}{\mathbb{N}}
44: \newcommand{\Z}{\mathbb{Z}}
45: \newcommand{\Q}{\mathbb{Q}}
46: \newcommand{\R}{\mathbb{R}}
47: \newcommand{\RR}{\mathbb{R}}
48: \newcommand{\C}{\mathbb{C}}
49: \newcommand{\CC}{\mathcal{C}}
50: \newcommand{\T}{\mathbb{T}}
51: \newcommand{\A}{\mathbb{A}}
52: \newcommand{\x}{\mathbf{x}}
53: \newcommand{\y}{\mathbf{y}}
54: \newcommand{\z}{\mathbf{z}}
55: \newcommand{\n}{\mathbf{n}}
56: \newcommand{\I}{\mathbb{I}}
57: \newcommand{\K}{\mathbb{K}}
58: \newcommand{\E}{\mathbb{E}}
59: \newcommand{\p}{\mathbb{P}}
60: \newcommand{\e}{\mathbf{e}}
61: \newcommand{\one}{\mathbf{1}}
62: \newcommand{\LL}{\mathcal L}
63: \newcommand{\MM}{\mathcal M}
64: \newcommand{\ra}{\rightarrow}
65: \newcommand{\la}{\leftarrow}
66: %\def\a{{\mbox{\boldmath $\alpha$}}}
67: %\def\l{{\mbox{\boldmath $\lambda$}}}
68: %\def\m{{\mbox{\boldmath $\mu$}}}
69: %\def\n{{\mbox{\boldmath $\nu$}}}
70: \def\eee{{\mathrm e}}
71: \def\a{{\mathbf{\alpha}}}
72: \def\l{{\mathbf{\lambda}}}
73: \def\m{{\mathbf{\mu}}}
74: %\def\n{{\mathbf{\nu}}}
75: \def\A{{\mathcal{A}}}
76: \def\ie{i.\,e.\,}
77: \def\of{{\bf off}}
78: \def\on{{\bf on}}
79: \def\pa{{\bf passive}}
80: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
81: 
82: %\def\baselinestretch{1.5}         % double space (well...almost!)
83: \title{Geographic Gossip on Geometric Random Graphs via Affine Combinations}
84: \author{Hariharan Narayanan\\
85: Department of Computer Science, University of Chicago\\
86:  {\tt hari@cs.uchicago.edu} }
87: 
88: \begin{document}
89: \maketitle
90: \begin{abstract}
91: In recent times, a considerable amount of work has been devoted to
92: the development and analysis of gossip algorithms in Geometric
93: Random Graphs. In a recently introduced model termed ``Geographic
94: Gossip," each node is aware of its position but possesses no further
95: information. Traditionally, gossip protocols have always used convex
96: linear combinations to achieve averaging. We develop a new protocol
97: for Geographic Gossip, in which counter-intuitively, we use {\it
98: non-convex affine combinations} as updates in addition to convex
99: combinations to accelerate the averaging process. The dependence of
100: the number of transmissions used by our algorithm on the number of
101: sensors $n$ is $n \exp(O(\log \log n)^2) = n^{1 + o(1)}$. For the
102: previous algorithm, this dependence was $\tilde{O}(n^{1.5})$.
103: %This reduces the energy consumption by a factor of
104: %n^{1/2 - o(1)} over the most efficient algorithm previously known.
105: The exponent 1+ o(1) of our algorithm is asymptotically optimal. Our
106: algorithm involves a hierarchical structure of $\log \log n$ depth
107: and is not completely decentralized. However, the extent of control
108: exercised by a sensor on another is restricted to switching the
109: other on or off.
110: \end{abstract}
111: 
112: \section{Introduction}
113: 
114: Geometric Random Graphs have become an accepted model for wireless
115: ad hoc and sensor networks. Due to applications in distributed
116: sensing, a significant amount of effort has been directed towards
117: developing energy efficient algorithms for information exchange on
118: these graphs. The problem of distributed averaging  has been studied
119: intensively because it appears in several applications such as
120: estimation on ad hoc networks, and encapsulates many of the
121: difficulties faced in asynchronous distributed computation. Let
122: $v_1, \dots, v_n$ be $n$ points independently chosen uniformly at
123: random from a unit square in $\R^2$. A Geometric Random Graph $G(n,
124: r)$ is obtained from these points by connecting any two points
125: within Euclidean distance $r$. A Gossip Algorithm is an averaging
126: algorithm that, after a certain number of information exchanges and
127: updates, leaves each node with a value close to the average of all
128: the originally held values.
129: %to be connected, $r(n)$ must scale as $\Theta(\sqrt{\frac{\log n}{n}})$.
130: % Geometric random graphs
131: %have been studied extensively (\cite{}) and have been used to model
132: %distributed wireless networks such as sensor networks.
133: 
134: \subsection{Related Work}
135: There is an extensive body of work surrounding the subject of gossip
136: algorithms in various contexts. Here, we only survey the results
137: relevant in a narrow sense to the question under consideration.
138: 
139: Gupta and Kumar \cite{kumar} gave conditions under which $G(n, r)$
140: is connected with high probability (w.h.p.). It is sufficient that
141: $r$ scales as $\Omega(\sqrt{\frac{\log n}{n}})$ in order that $G(n,
142: r)$ be connected with probability greater than $1 - n^{-\Theta(1)}$.
143: 
144: A distributed Gossip Algorithm for arbitrary graphs was presented by
145: Boyd et al \cite{Boyd}.  In this algorithm, when the clock of a
146: sensor $s$ ticks, $s$ sends its value $x_s$ to a sensor $v$ chosen
147: uniformly at random from its neighbors, and receives the value $x_v$
148: of $v$. Thereafter $s$ and $v$ set their values to $\frac{x_s +
149: x_v}{2}$. The dependence of the number of transmissions required by
150: this algorithm on $n$ is $\tilde{O}(n^2)$. The performance was
151: related to the mixing time of the natural random walk on that graph.
152: In fact they showed that if the connectivity graph is $G$, the
153: number of transmissions made in the course of the algorithm is
154: $\Theta(n T_{mix}(G))$, where $T_{mix}(G)$ is the mixing time of
155: $G$.
156: 
157: In the standard framework for modeling sensor networks, $n$ sensors
158: are placed at random on a unit square $\square$ and have a radius of
159: connectivity $r = \Theta(\sqrt{\frac{\log n}{n}})$. One does not
160: assume that a sensor possesses any information about its own
161: location. In this model, the number of transmissions that the best
162: known algorithm uses is $\tilde{O}(n^2)$ as described
163: above.\footnote{In using $\tilde{O}$, we ignore polylogarithmic
164: factors and depending on context, the dependence on parameters other
165: than $n$.}
166: 
167: A more powerful model was proposed by Dimakis et al
168: $\cite{wainwright}$, wherein each sensor is aware of its own
169: location with reference to $\square$ , but possess no further
170: information. It is mentioned in \cite{wainwright} that this is
171: reasonable in typical scenarios. With this model, by exploiting
172: geographic information, they were able to provide an algorithm that
173: requires $\tilde{O}(n^{1.5})$ transmissions. In their algorithm,
174: each node exchanges its value with the node nearest to a position
175: chosen randomly on $\square$, and both nodes replace their values by
176: the average as in the algorithm of Boyd et al \cite{Boyd}. Rejection
177: sampling is used to make the distribution roughly uniform on nodes.
178: The routing takes $\tilde{O}(\sqrt{n})$ hops w.h.p, but since the
179: mixing time on the complete graph is $O(1)$, one obtains an
180: algorithm using $\tilde{O}(n^{1.5})$ transmissions, which is an
181: improvement over \cite{Boyd} by a factor of $\tilde{O}(\sqrt{n})$.
182: 
183: A natural approach to obtaining more efficient algorithms would be
184: to engage in long-range information exchanges less frequently than
185: short-range ones. However, it appears that the benefit derived from
186: an improved mixing time with long-range transmissions more than
187: compensates for the additional cost in terms of hops for a
188: long-range routing. Due to this fact, simply altering the
189: probability distribution with which a node picks targets seems to be
190: counterproductive.
191: 
192: \subsection{Our Contribution}
193: An affine combination of two vectors $\mathbf{a}$ and $\mathbf{b}$
194: has the form $\alpha \mathbf{a} + (1-\alpha) \mathbf{b}$. Unlike the
195: case of convex combinations, $\alpha$ need not belong to $[0, 1]$.
196: We introduce counter-intuitive update rules which are {\it affine
197: combinations} rather than {\it convex combinations} (with
198: coefficients possibly as large as $\Omega(\sqrt{n})$) to achieve
199: faster averaging. The total number of transmissions used by the
200: proposed algorithm in order that the $\ell_2$-distance of the output
201: from the average diminish by a multiplicative factor of $\epsilon$
202: w.h.p, is $n\exp(O((\log\log n) \log \log \frac{n}{\epsilon}))$.
203: When $\epsilon = \exp(n^{\frac{o(1)}{\log \log n}})$ the number of
204: transmissions is $n^{1+o(1)}$.
205:  The
206: exponent $1 + o(1)$ is asymptotically optimal, since every node must
207: make at least one transmission for an averaging algorithm to work.
208: Like previous algorithms, ours makes packet exchanges with random
209: nodes. Due to
210:  the instability introduced
211: into the system by the use of non-convex combinations, for the
212: present analysis to hold, a certain amount of control needs to be
213: exercised and our algorithm is not truly decentralized. However, the
214: extent of control exerted by any sensor on another is restricted to
215: switching the other on or off.
216: 
217: \section{Preliminaries}
218: The standard model for a sensor network is as follows.
219:  We assume
220: that each node or sensor has a clock that is a Poisson process with
221: rate $1$, and that these processes are independent. This model is
222: equivalent to having a single clock that is Poisson of rate $n$, and
223: assigning clock ticks to nodes uniformly at random. We assume that
224: the time units are adjusted so communication time between any two
225: adjacent nodes is insignificant in comparison with the length of an
226: average time slot $n^{-1}$. Our algorithm involves packet forwarding
227: when two non-adjacent nodes communicate. We shall assume that the
228: time taken to forward a packet is also insignificant in comparison
229: with $n^{-1}$, and that a single packet exists in the network in
230: each time slot w.h.p.. We assume some limited computational power,
231: which amounts to memory of logarithmic size, and the ability to do
232: floating point computations.
233: 
234: %\section{Preliminaries}
235: For our purposes, a Geometric Random Graph is defined in the
236: following way.
237:  Let $v_1, \dots, v_n$ be $n$ points independently chosen uniformly at random
238: from a unit square in $\R^2$. A Geometric Random Graph $G(n, r)$ is
239: obtained from these points by connecting any two points within
240: Euclidean distance $r$.
241: %In our context, $r(n) =
242: %\Theta(\sqrt{\frac{\log n}{n}})$.
243: 
244: %We shall use the standard model for timing in a sensor network which
245: %is as follows.
246: % We assume
247: %that each node or sensor has a clock that is a Poisson process with
248: %rate $1$, and that these processes are independent. This model is
249: %equivalent to having a single global clock that is Poisson of rate
250: %$n$, and assigning clock ticks to nodes uniformly at random. We
251: %assume that the time units are adjusted so communication time
252: %between any two adjacent nodes is insignificant in comparison with
253: %the length of an average time slot $n^{-1}$. Our algorithm involves
254: %packet forwarding when two non-adjacent nodes communicate. We shall
255: %assume that the time taken to forward a packet is also insignificant
256: %in comparison with $n^{-1}$, and that a single packet exists in the
257: %network in each time slot.
258: \subsection{Problem Statement}
259: Let node $v_i$ for  $i = 1, \dots, n$ hold a value $x_i(t)$ at the
260: $t^{th}$ global clock tick, the initial values being $x_i(0)$.
261: Without loss of generality, we assume $\overline{\x(0)} = 0$. Given
262: $\epsilon, \delta > 0$,  the task is to design an algorithm
263: %using as
264: %few transmissions $Trans(n, \epsilon, \delta)$ as possible so that
265: %after this many transmissions,
266: such that $\|\x(t)\| < \epsilon \|\x(0)\|$  for all possible choices
267: of $\x(0)$  with probability $> 1 - \delta$. The cost of the
268: algorithm is the expected number of transmissions made until $t$.
269: %We are interested in designing a distributed
270: %algorithm of modifying these values so that for each $i$,  $\lim_{t
271: %\ra \infty} x_i(t) = x_{ave} := \frac{1}{n}\sum_i x_i(t)$, and the
272: %time taken for approximate convergence is small. The
273: %$\epsilon$-averaging time $T(n, \epsilon)$ is defined as follows:
274: 
275: %\begin{defn}
276: %Given $\epsilon, \delta > 0$, the $\epsilon, \delta$-averaging time
277: %is the earliest time $t$ (the number of ``global" clock-ticks) at
278: %which the vector $x(t)$ is $\epsilon$ close to the normalized true
279: %average with probability $> \delta$,
280: %$$T_{ave}(n, \epsilon, \delta) := \sup_{x(0)}
281: %\inf \left(t:\p\left(\frac{\|x(k) - x_{ave}\vec{1}\|}{\|x(0)\|} \geq
282: %\epsilon \right) \leq \delta\right).$$
283: %\end{defn}
284: In the rest of the paper, we shall make the standard assumption that
285: the radius of connectivity $r(n) = \Theta(\sqrt{\frac{\log n}{n}})$
286: (eg \cite{wainwright}.) Under this assumption, the probability of
287: the graph $G(n, r)$ being disconnected is $\Omega(n^{-O(1)})$, for
288: an appropriate constant $a$.  As a consequence, it is not possible
289: to drive $\delta$ below $n^{-O(1)}$. For this reason, in the
290: analysis, we shall assume that $\delta = n^{-O(1)}$. On the other
291: hand $\epsilon$ can be made arbitrarily small by running the
292: averaging algorithm for a sufficiently long interval of time. In
293: this paper, we shall assume that $\log \frac{1}{\epsilon} =
294: n^{\frac{o(1)}{\log \log n}}$. This does not allow $\epsilon$ to be
295: exponentially small but permits it to be the reciprocal of a
296: quasipolynomial. A sufficiently large constant $a$ will appear in
297: the parameters of our algorithm described later.When we use the term
298: {\it high probability}, we shall mean with probability $1 -
299: n^{-\Theta(1)}$.
300: %For a
301: %discussion of the probability of $G(n, r)$ being disconnected, see
302: %\cite{kumar}.
303: %If the connectivity radius $r(n)$ is $= \sqrt{\frac{a \ln n}{\pi
304: %n}}$, the probability that the graph $G(n, r)$ is disconnected is
305: %$\Omega {n^{-a}}$, as can be seen by simply computing the
306: %probability that node $1$ is disconnected from all other nodes.
307: %Therefore under the condition that $r(n) = \Theta{\sqrt{\frac{\log
308: %n}
309: %\subsection{Gossip Algorithms}
310: %Gossi
311: %\begin{definition}
312: %Let $\A(v_1, \dots, v_n)$ be a Gossip algorithm on nodes $v_1,
313: %\dots, v_n$. Let $T[n, \epsilon]$ be the
314: %\section{Results}
315: %As in \cite{wainwright} we shall seek to minimize the total number
316: %of transmissions. If the $t^{th}$ clock tick belongs to node $j$,
317: %$j$ will communicate with some node in the network, not necessarily
318: %within the radius of connectivity. Each such communication may take
319: %$R(t)$ radio transmissions, and our communication cost shall be
320: %$$C(n, \epsilon, \delta) := \sum_{t=1}^{T_{ave}(n, \epsilon, \delta)} R(t).$$
321: 
322: \section{Overview of Algorithm}
323: %The square $\square$ is partitioned into $n_1$  subsquares
324: %$\square_i$, where $n_1$ is the nearest integer to $\sqrt{n}$ that
325: %is the square of an even number. For a square $\square_{i_1\dots
326: %i_r}$, let $\E_\#\square_{i_1\dots i_r}$ denote the expected number
327: %of sensors within $\square_{i_1\dots i_r}$. Then, while
328: %$\E_\#\square_{i_1\dots i_r} > \log (n)^8$,
329: 
330: %the square $\square_{i_1\dots i_r}$ is partitioned into $n_{r+1}$
331: %subsquares $\square_{i_1\dots i_{r+1}}$, where $n_{r+1}$ is the
332: %nearest integer to $\sqrt{\E_\#\square_{i_1\dots i_r}}$ that is the
333: %square of an even number. Let $$\ell := 1 +
334: %\sup\limits_{\square_{i_1\dots i_r}} r,$$ \ie the number of levels
335: %in this recursion. Given a square $\square_{<i>}$, let
336: %$s(\square_{<i>})$ denote the sensor nearest to its center. By our
337: %construction, these centers are well separated, and any sensor has
338: %this property with respect to at most one square. We shall denote
339: %this by $\square(s)$. %Note that the total number of squares in our
340: Let $\square$ be the unit square in which the $n$ sensors are
341: randomly placed. Let the initial values carried by sensors be
342: $x_i(0)$, for $i = 1$ to $n$.  We consider a partition of $\square$
343: into $\sim n^{1/2}$ smaller squares $\square_i$. Let $\square_i$
344: contain $\#(\square_i)$ sensors. Let $time(n)$ represent the
345: expected number of transmissions until $\|\x(t)\| \leq \epsilon
346: \|\x(0)\|$  w.h.p., where $\epsilon$ is some function of $n$ that we
347: shall not investigate at the moment. Suppose that we had a ``nearly
348: perfect" averaging protocol $\A$ on the smaller squares $\square_i$,
349: \ie when $\A$ is run on each square, after $t = time_{\A}(\sqrt n)$
350: transmissions, within $\square_i$ the values are for practical
351: purposes equal to the the average of the original values. That is,
352: $$(\forall i) (\forall s \in \square_i) x_s(t) \backsimeq \frac{\sum\limits_{s \in
353: \square_i} x_s(0)}{\#(\square_i)}.$$
354: \begin{defn}
355: For each square $\square_i$, let $s(\square_i)$ be the sensor
356: closest to the center of $\square_i$.
357: \end{defn}
358: This can be determined by each square, using a constant number of
359: transmissions w.h.p.
360: 
361: The $s(\square_i)$ exchange values among themselves by Greedy
362: Geographic Routing (see \cite{wainwright}).
363: 
364: Consider the following protocol. Suppose that $\A$ has been run on
365: each subsquare of the form $\square_i$ independently, and the values
366: carried by the nodes within $\square_i$ are all equal. When
367: $s(\square_i)$ becomes active,
368:  the following round takes place.
369: \begin{enumerate}
370: \item $s_i :=s(\square_i)$ picks a square $\square_j$ uniformly at
371: random. $s_i$ geographically routes a packet with its value to $s_j
372: := s(\square_j)$.
373: 
374: %\item Node $s$ sends the packet, which is routed to the node nearest
375: %to $t$ by greedy geographical routing (see \cite{wainwright} for
376: %details). Let $v := s(\square_j)$ be the node closest to $t$.
377: \item $s_j$ routes its own value to $s_i$ by greedy
378: geographic routing.
379: \item $x_{s_i} \la x_{s_i} + \frac{2\sqrt{n}}{5}(x_{s_j} - x_{s_i})$.
380: \item  $x_{s_j} \la x_{s_j} + \frac{2\sqrt{n}}{5}(x_{s_i} - x_{s_j})$.
381: \item $\A$ is independently run on $\square_i$ (the process being activated by $s_i$ by switching certain nodes on)
382: and on $\square_j$ (initiated by $s_j$ similarly).
383: \item $\A$ is ended on square $\square_i$ by $s_i$ (by turning certain nodes off), and
384: $\A$ is ended on $\square_j$ by $s_j$ (by switching certain nodes
385: off.)
386: \end{enumerate}
387: 
388: Now, let $z_i(t) := \sum\limits_{s \in \square_i} x_s(t)$. Without
389: loss of generality, we assume that $\sum_i{x_i} = 0$, since this
390: only adds a constant offset and does not affect the rate of
391: convergence. An application of the Chernoff Bound tells us that
392: $(\forall i)\left| \frac{\#(\square_i)}{\sqrt n}-1\right| <
393: \frac{1}{10}$ w.h.p . If we examine the evolution of $\z$, we see
394: that after a round of the kind described above
395: 
396: \begin{itemize}
397: \item $z_i(t) = (1-\alpha_i)z_i(t-1)
398: + \alpha_jz_j(t-1)$
399: \item $z_j(t) = (1-\alpha_j) z_j(t-1) + \alpha_i z_i(t-1)$
400: \end{itemize}
401: where $\forall i,  \alpha_i \in (\frac{1}{2}, \frac{1}{3})$. From
402: Lemma~\ref{l:1}, it follows that
403: 
404: $\E[\|\z(t)\|^2] < (1-\frac{1}{2\sqrt{n}})^t \|\z(0)\|^2$. Roughly
405: speaking after $O(\sqrt{n} \log(\frac{n}{\epsilon}))$ of these
406: steps, we have a distribution $\x(t')$ such that $\|\x(t')\| <
407: \epsilon \|\x(0)\|$.
408: 
409: Each geographical routing mentioned above takes $O(\sqrt n)$
410: transmissions w.h.p (see \cite{wainwright}). Also, each process of
411: initiating or ending $\A$ on a square $\square_i$ takes
412: $O(\sqrt{n})$ transmissions.
413: 
414: So, the total number of transmissions with $n$ nodes $time(n)$
415: satisfies a recurrence of the form: $$time(n) \backsimeq
416: O\left(\sqrt n \log(\frac{n}{\epsilon}) ( time_\A(\sqrt n) + O(\sqrt
417: n))\right).$$ Ignoring the dependence on $\epsilon$, it would allows
418: us to recursively define the algorithm $\A$ on $\square$, for which
419: $time_\A(n) = n \exp(O(\log\log n)^2).$
420: 
421: 
422: 
423: %We shall describe our algorithm recursively. In order to do this, we
424: %shall first have to define the problem in a way that facilitates the
425: %recursion.
426: 
427: %Let $k$ be a \texttt{Binomial}(n, p) random variable, where $E[k] =
428: %pn = \Omega((\log n)^4)$
429: %for some constant $c \geq 3$.
430: %Let $k$ sensors be placed uniformly at randomly in a square $S$ of
431: %area $p$. Let sensors within a distance $\Theta(\sqrt{\frac{\log
432: %n}{n}})$ be connected. We shall describe the algorithm $A(s, S, p,
433: %n)$ that sensor $s$ implements.
434: 
435: %\footnote{This allows us to get the probability of "failure"
436: %$\epsilon$ down to $\frac{1}{n^a}$, for a constant $a$. For a
437: %discussion of the related probability of $G(n, r)$ being
438: %disconnected, see \cite{kumar}.}.
439: %Let $C >> 1$ be a large constant.  $A(s, S, p, n, \epsilon, \delta)$
440: %is described recursively as follows:
441: %\begin{enumerate}
442: %\item  If $pn < C(\log n)^8$:
443: 
444: %Consider a partition $\{S_i\}_{1 \leq i \leq u}$ of square $S$
445: %into $u = \lceil k^{1/4} \rceil^2$ smaller squares $S_i$ of area
446: %$\frac{p}{u}$ each. It can be seen from Chernoff bounds that with
447: %high probability $\frac{2\sqrt{pn}}{3} < u <
448: %\frac{4\sqrt{pn}}{3}$.
449: 
450: %Let $s_i$ be the sensor in square $S_i$, that is nearest to the
451: %center $C_i$ of $S_i$.
452: %From Lemma~\ref{centers}, a sensor $s \in
453: %S_i$ can determine whether or not it is the nearest to $C_i$ with
454: %high probability using a $\mathrm{polylog}(n)$ number of radio
455: %transmissions. By Lemma~\ref{center_dist}, with high probability,
456: %each $s_i$ is within a Euclidean distance of
457: %$\Theta(\sqrt{\frac{\log n}{n}})$ of $C_i$.
458: %Maybe the above discussion should not be here.
459: %Suppose $s \in S_i$.
460: %%I don't have to make complex error computations here. The recursively defined expression is short.
461: %With probability $1 - \frac{(\log
462: %n)^{-4}}{T_{ave}(\lceil2\frac{pn}{\ell}\rceil,
463: %\frac{\epsilon^3}{n^3}, \frac{\delta}{\ell^2})}$,%
464: 
465: %Note that with high probability, there is a sensor within a
466: %Euclidean distance $\Theta(\sqrt{\frac{\log n}{n}})$ of each $C_i$.
467: \section{Description of the Algorithm}
468: \subsection{Notation}\label{s:1}
469: The square $\square$ is partitioned into $n_1$  subsquares
470: $\square_i$, where $n_1$ is the nearest integer to $\sqrt{n}$ that
471: is the square of an even number. For a square $\square_{i_1\dots
472: i_r}$, let $\E_\#\square_{i_1\dots i_r}$ denote the expected number
473: of sensors within $\square_{i_1\dots i_r}$. Then, while
474: $\E_\#\square_{i_1\dots i_r} > (\log n)^8$,
475: 
476: the square $\square_{i_1\dots i_r}$ is partitioned into $n_{r+1}$
477: subsquares $\square_{i_1\dots i_{r+1}}$, where $n_{r+1}$ is the
478: nearest integer to $\sqrt{\E_\#\square_{i_1\dots i_r}}$ that is the
479: square of an even number. Let $$\ell := 1 +
480: \sup\limits_{\square_{i_1\dots i_r}} r,$$ \ie the number of levels
481: in this recursion. Given a square $\square_{<i>}$, let
482: $s(\square_{<i>})$ denote the sensor nearest to its center. By our
483: construction, these centers are well separated, and any sensor has
484: this property with respect to at most one square w.h.p.. We shall
485: denote
486: this by $\square(s)$. %Note that the total number of squares in our
487: %construction is $o(n)$.
488: We assign a Level to each node by the
489: following rule: If $s = s(\square_{i_1\dots i_r})$, $s$ has level
490: $\ell - r$. These nodes are have Levels $1, \dots, \ell$. There is a
491: single root node at Level $\ell$, namely $s(\square)$. The nodes at
492: Level $0$ are the nodes not of the form $s(\square_{i_1\dots i_r})$.
493: In the informal discussion earlier, we did not concern ourselves
494: with the error in the averaging carried out on subsquares
495: $\square_i$. However, these errors propagate up the hierarchy
496: rapidly, and hence it is necessary to obtain results with greater
497: accuracy in smaller squares. Thus we define the desired accuracy
498: recursively. Let $\epsilon_r$ be the accuracy for the averaging
499: process in a square $\square_{i_1 \dots i_{r-1}}.$ Lemma~\ref{l:2}
500: tells us that it is sufficient to take $\epsilon_r$, to be
501: $\frac{\epsilon_{r-1}}{\text{poly}(n)}$ for a polynomial of
502: sufficiently large degree.
503: 
504:  Let $\epsilon_0 =
505: \epsilon$, $\delta_0 = \delta$. We recursively define
506: $\epsilon_{r+1} := \frac{\epsilon_r}{25 n^{\frac{7}{2} + a}}$ and
507: $\delta_{r+1} = \frac{\delta_r}{n_r^{2a}}$.
508: 
509: We define $time(n, \ell-1, \epsilon_r, \delta_r)$ to be $\left((\log
510: \frac{n}{\epsilon_{\ell-1}})
511: \log(\delta_{\ell-1}^{-1})\right)^{16}$. Thereafter, we define
512: $time(n, r-1, \epsilon_{r-1}, \delta_{r-1}) := time(n, r,
513: \epsilon_r, \delta_r) n^a \left(\log (\frac{n_r}{\epsilon_r})\log
514: (\delta_r^{-1})\right)^{16}.$
515: 
516: Let  $s \in \square_{i_1\dots i_{\ell-1}}$.
517: 
518: \subsection{The Protocol}
519: Every node $s$ has two states, a $local.state$ and a $global.state$,
520: both of which are initially $= off$, but can also take the value
521: $on$. Each node $s$ possesses a private counter $counter(s)$. During
522: initialization, the $global.state$ of $s(\square)$ is set to $on$
523: but every other $global.state$ is $0$. The $local.state$ of {\it
524: all} nodes is set to $off$ at this juncture.
525: 
526: Let us suppose that the clock of $s$ ticks. We describe the protocol
527: followed by it below. We consider two cases. If $s$ is at Level $0$,
528: it obeys the following protocol: \{
529: \begin{enumerate}
530: \item If $local.state(s) = on$\\ $Near(s)$;
531: \end{enumerate}
532: \}
533: 
534: $Near(s)$\{
535: \begin{enumerate}
536: \item $s$ picks an adjacent node $v$ contained in $\square_{i_1\dots i_{\ell-1}}$
537: uniformly at random.
538: \item $s$ sets $x_s(t+1) = \frac{x_s(t) + x_v(t)}{2};$\\
539:       $v$ sets $x_v(t+1) = \frac{x_s(t) + x_v(t)}{2};$
540: \end{enumerate}
541: \}
542: 
543: We next describe the protocol if $s$ is at a Level greater than $0$.
544: The subroutine $Near$ is the same as above.
545: Let $\square(s)=: \square_{i_1 \dots i_r}.$\\
546: \{
547: \begin{enumerate}
548:  \item If $global.state(s) = on$ %\{\\
549: 
550:  \begin{enumerate}
551:  \item  If $counter(s) = 0$ $Activate.square(s);$
552:  \item With probability $n^{-a} time(n, r, \epsilon_r,
553:  \delta_r)^{-1}$
554:  \begin{itemize}
555:  \item $Far(s)$;
556:  \item $counter(s) \la 0$;
557:  \end{itemize}
558: 
559: \end{enumerate}
560: 
561: \item If  $local.state(s) = on$ \\$Near(s);$
562: 
563: \item If $counter(s) \geq time(r, n, \epsilon_r, \delta_r)$
564: $Deactivate.square(s);$\\
565: Else $counter(s) \la counter(s) + 1;$
566: 
567: \end{enumerate}
568: \}
569: 
570: ${Far(s)}$\{
571: \begin{enumerate}
572: \item $s$ picks a square $\square_{i_1'\dots i_r'} \not\ni s$ uniformly at
573: random. Let $s' := s(\square_{i_1'\dots i_r'})$ . Node $s$ routes
574: its value to $s'$ geographically.
575: %\item Node $s$ sends the packet, which is routed to the node nearest
576: %to $t$ by greedy geographical routing (see \cite{wainwright} for
577: %details). Let $v$ be the node closest to $t$.
578: 
579: %\} \\
580: 
581: \item $x_s(t+1) = x_s(t) + \frac{2}{5}(\E_\#\square_{i_1\dots i_r} x_{s'}(t) -
582: \E_\#\square_{i_1 \dots i_r}x_s(t))$.
583: \item $s'$ sends  back to a packet with its value $x_{s'}(t)$ to  $s$ by greedy
584: geographic routing.
585: \item Node $s$ computes $x_s(t+1) = x_s(t) + \frac{2}{5}(\E_\#\square_{i_1\dots
586: i_r}x_{s'}(t) - \E_\#\square_{i_1\dots i_r} x_s(t))$.
587: \item $counter(v) \la 0$.
588: \end{enumerate}\}
589: 
590: $Activate.square(s)$\{
591: \begin{enumerate}
592: \item If $s \in $ Level $1$, send packets to each node $s'$ in
593: $\square(s)$ setting $local.state(s') \la on$ by flooding.
594: \item If $s \in $ Level $i>1$, send packets to each Level $i-1$ node $s'$ in
595: $\square(s)$ by greedy geographic routing, setting $global.state(s')
596: \la on$.
597: \end{enumerate}
598: \}
599: 
600: $Deactivate.square(s)$\{
601: \begin{enumerate}
602: \item If $s \in $ Level $1$, send packets to each node $s'$ in
603: square($s$) setting $local.state(s') \la off$ by flooding.
604: \item If $s \in $ Level $i>1$, send packets to each Level $i-1$ node $s'$ in
605: $\square(s)$ by greedy geographic routing, setting $global.state(s')
606: \la off$.
607: \end{enumerate}
608: \}
609: \section{Analyzing the number of Transmissions}
610: Let $H(n, r, \epsilon_r, \delta_r)$ denote the number of
611: transmissions used in our protocol in one round of
612: $\square_{i_1\dots i_r}$, in order to diminish the variance (of the
613: values carried by sensors in $\square_{i_1\dots i_r}$) by a factor
614: $\epsilon_r$, with probability $1-\delta_r$.
615: 
616: %We shall need an observation, which is a consequence of
617: %Lemma~\ref{l:2} applied to a vector $\y$ defined analogously to $\z$
618: %in the discussion above.
619: 
620: %We shall explain this in greater detail in the final version of the
621: %paper.
622: 
623: \begin{observation}\label{l:red}
624:  In one round, \ie the duration
625: between $s$ activating $\square(s) := \square_{i_1\dots i_r}$ and
626: deactivating $\square(s)$, the number of long-range packet exchanges
627: between sensors of the kind $s(\square_{i_1\dots i_r i_{r+1}})$ is
628: $\Theta\left(\tilde{n} \log(\frac{\tilde{n}}{\epsilon_r})\right)$
629: w.h.p, where $$\tilde{n} = \frac{\E_{\#}[\square_{i_1\dots
630: i_r}]}{\E_{\#}[\square_{i_1 \dots i_r i_{r+1}}]}.$$
631: \end{observation}
632:   Each of these
633: involves $O(\sqrt{\E_{\#}[\square(s)]}) \tilde{n}$ hops w.h.p (see
634: \cite{wainwright}). Therefore the total number of transmissions here
635: is $O\left(\tilde{n}^2 \log(\frac{\tilde{n}}{\epsilon_r})\right)$
636: w.h.p.
637: 
638: Each of these long-range packet exchanges is followed by a period of
639: averaging within the involved subsquares, and this takes $H(n, r+1,
640: \epsilon_{r+1}, \delta_r) = \Omega(\tilde{n})$ transmissions. Thus
641: we have the recurrence \beqs \label{e:1} H(n, r, \epsilon_r,
642: \delta_r) & = & O\left((H(n, r+1, \epsilon_{r+1}, \delta_{r+1}) +
643: \tilde{n})\tilde{n}\log(\frac{\tilde{n}}{\epsilon_r})\right)\\
644:  & = &  O\left(H(n, r+1, \epsilon_{r+1}, \delta_{r+1})\tilde{n}\log(\frac{\tilde{n}}{\epsilon_r})\right).
645:  \eeqs
646: 
647:  %In subsection~\ref{epsdel}, we shall
648: % It can be shown that to obtain the desired
649: % result, it suffices to choose $\epsilon_{r+1} =
650: % \frac{\epsilon_{r}}{25 n^{3/2}}$ for $r \leq \ell - 1$, and
651: % $\delta_{r+1} = \frac{\delta_r}{\tilde{n}^2}$. $\epsilon_0 :=
652: % \epsilon$, and $\delta_0 := \delta$, which are input parameters.
653: As mentioned in subsection~\ref{s:1}, we let
654:  $\epsilon_0 =
655: \epsilon$, $\delta_0 = \delta$ and  recursively define
656: $\epsilon_{r+1} := \frac{\epsilon_r}{25 n^{7/2}}$ and $\delta_{r+1}
657: = \frac{\delta_r}{n_r^2}$.
658:  For these parameters,
659: $\delta_r = \Omega(\frac{1}{\text{poly}(n)})$, since $\delta_0 =
660: \Omega(\frac{1}{\text{poly}(n)})$ and the $\tilde{n}$ telescope.
661: $\epsilon_r = \epsilon_0\Omega{n^{-O(\log\log n)}}$ since $\ell \sim
662: \log \log n$.
663:  Now, the smallest squares that we create have $O(\text{polylog}n)$
664:  sensors each w.h.p. Since the ordinary averaging that we do there
665:  (described by the procedure "Near(s)") has an averaging time that is
666:  quadratic \cite{Boyd, Boyd2},
667: $H(n, \ell, \epsilon_{\ell}, \delta_\ell) =
668: \Omega(\text{polylog}(\frac{n}{\epsilon_\ell}))$. And so using the
669: recurrence for $H$ and telescoping, we see that the total number of
670: transmissions is
671:  \beqs H(n, 0, \epsilon_0, \delta_0) &=& \left(H(n,
672: \ell, \epsilon_{r+1}, \delta_{r+1})\right)\prod_r
673: \left\{\frac{\E_{\#}[\square_{i_1\dots i_r}]}{\E_{\#}[\square_{i_1
674: \dots i_r i_{r+1}}]} \log \frac{n}{\epsilon_r}\right\}\\
675: & = & n (\log \frac{n}{\epsilon})^{O(\log \log n)}. \eeqs This is
676: $n^{1+o(1)}$ if $\epsilon = \exp(-n^{\frac{o(1)}{\log \log n}})$,
677: and $\delta = n^{-O(1)}$.
678: %\section{Lemma}
679: 
680: \section{Notes on Correctness}
681: In the algorithm proposed in this paper, each square $\square(s)$
682: has a certain latency, which is the averaging time restricted to
683: that square. In order for our algorithm to be correct, we require
684: that $\square(s)$ be undisturbed by the long-range exchanges that
685: $s$ is involved in, during this period. This is not a condition that
686: can be imposed without the long-range exchanges of $s$ losing their
687: i.i.d property, which is crucial in our analysis of convergence. In
688: order to retain this, and have an algorithm that is successful w.h.p
689: we have set the rates at which long-range exchanges of $s$ occur to
690: be lower than the inverse of the latency by a factor $n^a$. As a
691: consequence, w.h.p, in the course of the entire algorithm, there are
692: no long-range transmissions made by any node $s$ while $\square(s)$
693: is active. The only issue that we have not dealt with in detail is
694: of showing that our choice of errors $\epsilon_r$ achieves the
695: desired end. This follows from Lemma~\ref{l:2} interpreted as
696: follows: The nodes $i$  represent {\it subsquares} $\square_{i_1
697: \dots i_r i_{r+1}}$ of $\square_{i_1 \dots i_r}$ and the $y_j(t)$
698: for different $j$ represent  the {\it sum} of the values held by the
699: nodes in a subsquare $\square_{i_1 \dots i_r j}$ after $t$ long
700: distance transmissions between subsquares since the activation of
701: $\square_{i_1 \dots i_r}$. We set $\epsilon := {\epsilon_{r+1}}
702: \|\x(0)\|$. The perturbations $n(t)$ represent the errors generated
703: from imperfect averaging within these subsquares.
704: 
705: \section{Concluding Remarks}
706: We introduced {\it non-convex affine combinations}, in our averaging
707: protocol in order to accelerate Geographic Gossip in Geometric
708: random graphs. The number of transmissions used in the course of our
709: protocol is $n^{1+o(1)}$. This exponent is asymptotically optimal.
710: Our algorithm, unlike the previous one in \cite{wainwright} is not
711: completely decentralized. However as far as we can see, this is not
712: a necessary feature associated with the use of affine combinations.
713: 
714: % In this scenario, if the sensors in $\square(s)$ are not
715: %shut down after the time required for them to have performed their
716: %task, there is an excessive wastage of power.
717: 
718: % The
719: %reason we required to have a sensor $s$ activate and deactivate the
720: %region $\square(s)$ is that in order for the long-range information
721: %exchanges between $s$ and other nodes of its ``Level" to form an
722: %i.i.d process, one cannot have a clock that controls the immediate
723: %subsquares of $\square(s)$ to form an i.i.d process
724: 
725: \section{Future Directions}
726: It would be interesting to study whether affine combinations can be
727: used to develop a completely decentralized algorithm for Geographic
728: Gossip that is also energy efficient.
729: 
730: 
731: \begin{thebibliography}{50}
732: %\vspace*{0.5mm} \scriptsize
733: 
734: \bibitem{Boyd}
735: S.~Boyd, A.~Ghosh, B.~Prabhakar, and D.~Shah.
736: \newblock Gossip algorithms : Design, analysis and applications.
737: \newblock In {\em Proceedings of the 24th Conference of the IEEE Communications
738:   Society (INFOCOM 2005)}, 2005.
739: 
740: \bibitem{Boyd2}
741: S.~Boyd, A.~Ghosh, B.~Prabhakar, and D.~Shah.
742: \newblock Mixing Times for Random Walks on Geometric Random Graphs.
743: \newblock SIAM ANALCO 2005.
744: 
745: \bibitem{car}
746: S. ~Carruthers, V. ~King.
747: \newblock Connectivity of Wireless Sensor Networks with Constant
748: Density.{\em ADHOC-NOW, 2004}, 149-157
749: \newblock
750: 
751: \bibitem{kumar}
752: P.~Gupta and P.~Kumar.%\\
753: \newblock The capacity of wireless networks.%\\
754: \newblock {\em IEEE Transactions on Information Theory}, 46(2):388--404, March
755:   2000.
756: 
757: \bibitem{wainwright}
758:  A. ~Dimakis, A. ~Sarwate, M. ~Wainwright.
759:  \newblock Geographic gossip: efficient aggregation for sensor
760:  networks.
761:  \newblock In {\em Proceedings of the fifth international conference on information processing in sensor networks (IPSN)}, 2006.
762: 
763: \bibitem{Karp}
764: R.~Karp, C.~Schindelhauer, S.~Shenker, and B.~V\"{o}cking.%\\
765: \newblock Randomized rumor spreading.%\\
766: \newblock In {\em Proc. IEEE Conference of Foundations of Computer Science,
767:   (FOCS)}, 2000.
768: 
769: \bibitem{k1}
770: D.~Kempe, J.~Kleinberg, A.~Demers.%\\
771:  \newblock Spatial gossip and
772: resource location protocols.%\\
773:  \newblock in {\em Proc. 33rd ACM
774: Symposium on Theory of Computing,} 2001.
775: 
776: \bibitem{k2}
777: D. ~Kempe, J. ~Kleinberg.%\\
778: \newblock Protocols and Impossibility
779: Results for Gossip-Based Communication Mechanisms.%\\
780:  \newblock In
781: {\em Proc. 43rd IEEE Symposium on Foundations of Computer Science,}
782: 2002.
783: 
784: \bibitem{MoskAoyama}
785: D.~Mosk-Aoyama and D.~Shah.
786: \newblock Information dissemination via gossip: Applications to averaging and
787:   coding.
788: \newblock http://arxiv.org/cs.NI/0504029, April 2005.
789: 
790: \bibitem{MR95}
791: R.~Motwani and P.~Raghavan.%\\
792: \newblock {\em Randomized Algorithms}.%\\
793: \newblock Cambridge University Press, Cambridge, 1995.
794: 
795: \bibitem{Penrose}
796: M.~Penrose.%\\
797: \newblock {\em Random Geometric Graphs}.%\\
798: \newblock Oxford studies in probability. Oxford University Press, Oxford,
799: 2003.%\\
800: 
801: \bibitem{Xiao}
802: L.~Xiao, S.~Boyd, and S.~Lall.%\\
803: \newblock A scheme for asynchronous distributed sensor fusion based on average
804:   consensus.%\\
805: \newblock In {\em 2005 Fourth International Symposium on Information Processing
806:   in Sensor Networks (IPSN)}, 2005.
807: \end{thebibliography}
808: \appendix
809: \section{Appendix}
810:  Let $K_n$ be the complete graph on $n$ vertices $\{1, \dots,
811: n\}.$ $\forall i,$ let $\alpha_i \in (\frac{1}{3}, \frac{1}{2}).$ At
812: time $t \geq 0$, for $i = 1, \dots, n$, let node $i$ hold the value
813: $x_i(t)$. Consider the following update rule. If the $t^{th}$ clock
814: tick belongs to node $i$, then, $i$ chooses a node $j$ uniformly at
815: random, and the following update occurs:
816: 
817: \begin{itemize}\label{update}
818: \item $x_i(t) = (1-\alpha_i) x_i(t-1) + \alpha_j x_j(t-1) .$
819: \item $x_j(t) = (1-\alpha_j) x_j(t-1) + \alpha_i x_i(t-1).$
820: \end{itemize}
821: 
822: \begin{lemma}\label{l:1}
823: $\E[\x(t)^T \x(t)] < (1-\frac{1}{2n})^t \x(0)^T \x(0)$.
824: \end{lemma}
825: {\bf Proof:}%Please refer to the Appendix.\\
826: %{\bf Proof of Lemma~\ref{l:1}}\\
827: Let the update rule for $\x(t)$ be given by $A(t-1)$, \ie \, $\x(t)
828: = A(t-1)\x(t-1)$. Note that $A(t-1) = I - (\alpha_i \e_i - \alpha_j
829: \e_j)(\e_i^T - \e_j^T)$, if the $i^{th}$ vector of the standard
830: basis is denoted by $\e_i$.
831: 
832: \beqs\label{eq1}
833: \E[\x(t)^T \x(t)|\x(t-1)] & = & \E[\x(t-1)^T A(t-1)^T A(t-1) \x(t-1)|\x(t-1)]\\
834:                & = & \x(t-1)^T \E[A(t-1)^T A(t-1)] \x(t-1).
835:                \eeqs
836: Let $\alpha_i \e_i - \alpha_j \e_j = \mathbf{\alpha}_{ij}$ and $\e_i
837: - \e_j = \e_{ij}$. Then, $\E[A(t-1)^T A(t-1)] = \E[(I -
838: \e_{ij}\alpha_{ij}^T)^T(I - \e_{ij}\alpha_{ij}^T)]$.
839: 
840: Let $E_{ij}$ denote the $n \times n$ matrix whose $ij^{th}$ entry is
841: $1$ and every other entry is $0$.
842: 
843: Then, by expanding, one finds that \beqs \E[A(t-1)^T A(t-1)] & = & I
844: + \sum_i \frac{(1-2\alpha_i)^2 -1}{n} E_{ii} + \sum_{i \neq j}
845: \frac{(1 - (1-2\alpha_i)(1-2\alpha_j)) E_{ij}}{n(n-1)}\\
846: & = & I (1 - \frac{1}{n-1}) + \frac{\mathbf{1}\mathbf{1}^T}{n(n-1)}
847: - \frac{(\mathbf{1}-2\mathbb{\alpha})(\mathbf{1}-2\alpha)^T}{n(n-1)}
848: + \sum_i\frac{(1-2\alpha_i)^2E_{ii}}{n-1}. \eeqs An application of
849: the formula for $\E[\x(t)^T \x(t)|\x(t-1)]$, now gives us the
850: following:
851: 
852: \beq \label{expr} \E[x(t)^T x(t) | x(t-1)] & = & \E[x(t-1)^TA(t-1)^TA(t-1)x(t-1)|x(t-1)]\\
853:                               & = & x(t-1)^T\E[A(t-1)^TA(t-1)]x(t-1)
854:                             \eeq
855: We know that $\forall i,  1-2\alpha_i \in (0, \frac{1}{3})$.
856: 
857: %and so $\|1-2\alpha\| \leq \frac{\sqrt{n}}{3}$. An application of
858: %the Cauchy-Schwarz inequality gives us  $|\x(t-1)^T(\mathbf{1} -
859: %2\alpha)| \leq \|x(t-1)\|\|1-2\alpha\|$, implying the bound
860: %$|\x(t-1)^T(\mathbf{1} - 2\alpha)| \leq
861: %\frac{\sqrt{n}}{3}\|x(t-1)\|.$ Also, $x(t-1)^T \one = 0$. Applying
862: %these to the expression for $\E[A(t-1)^T A(t-1)]$ derived earlier,
863: Let us upper bound $x(t-1)^T\E[A(t-1)^TA(t-1)]x(t-1)$ using the the
864: expression for $\E[A(t-1)^T A(t-1)]$ derived earlier.
865: $$x(t-1)^T I (1 - \frac{1}{n-1}) x(t-1) = (1 -
866: \frac{1}{n-1})\|x(t-1)\|^2,$$
867: $$\frac{x(t-1)^T \mathbf{1} \one^T x(t-1)}{n-1} = 0,$$
868: $$- \frac{x(t-1)^T (\mathbf{1} - 2\alpha)(\one^T - 2\alpha^T)
869: x(t-1)}{n(n-1)} \leq 0 $$ and,
870: $$x(t-1)^T \left(\sum_i \frac{(1-2\alpha_i)^2
871: E_{ii}}{n-1}\right)x(t-1)  \leq \frac{\|x(t-1)\|^2}{9(n-1)}.$$
872: 
873: Adding up the above inequalities, $$\E[x(t)^Tx(t)|x(t-1)] \leq
874: \left(1 - \frac{8}{9(n-1)}\right) x(t-1)^T x(t-1).$$ As a
875: consequence,
876: $$\E[\|x(t)\|^2 \, | \, x(t-1)] < \left(1 - \frac{1}{2n}\right) \|x(t-1)\|^2.$$
877: Successively conditioning on $x(t-2), \dots, x(0)$, we see that
878: $$\E[\|x(t)\|^2] < \left(1 - \frac{1}{2n}\right)^t \|x(0)\|^2.$$
879: This proves the lemma.{\hfill $\Box$}
880: 
881: 
882: An application of Markov's inequality gives us the following
883: corollary.
884: \begin{corollary}\label{c:1}
885: $$\p\left(\|x(t)\| > \epsilon \|x(0)\|\right) \leq \epsilon^{-2}\left(1 -
886: \frac{1}{2n}\right)^t.$$
887: \end{corollary}
888: {\bf Proof:}%Please refer to the Appendix.\\%{\bf Proof of Corollary~\ref{c:1}}\\
889: \beqs \p\left(\|x(t)\| > \epsilon \|x(0)\|\right) &=& \p\left(\frac{\|x(t)\|^2}{\|x(0)\|^2} > \epsilon^2 \right)\\
890:                                                 &\leq& \epsilon^{-2}\E\left(\frac{\|x(t)\|^2}{\|x(0)\|^2}\right) {\hfill (\text{Markov's inequality})} \\
891:                                                 &\leq& \epsilon^{-2}\left(1 - \frac{1}{2n}\right)^t
892: \eeqs    {\hfill $\Box$}
893: 
894: 
895: An application of Markov's inequality gives us the following
896: corollary.
897: \begin{corollary}\label{c:1}
898: $$\p\left(\|x(t)\| > \epsilon \|x(0)\|\right) \leq \epsilon^{-2}\left(1 -
899: \frac{1}{2n}\right)^t.$$
900: \end{corollary}
901: 
902: 
903: We now consider a modified update rule, and prove a lemma similar to
904: Lemma~\ref{l:1}.
905: 
906: 
907: 
908: Let $K_n$ be the complete graph on $n$ vertices $\{1, \dots, n\}.$
909: $\forall i,$ let $\alpha_i \in (\frac{1}{3}, \frac{1}{2}).$ At time
910: $t \geq 0$, for $i = 1, \dots, n$, let node $i$ hold the value
911: $x_i(t)$. Let $n(0), n(1), \dots$ be a sequence of real numbers.
912: Consider the following update rule. If the $t^{th}$ clock tick
913: belongs to node $i$, then, $i$ chooses a node $j$ uniformly at
914: random, and the following update occurs:
915: 
916: \begin{itemize}\label{update}
917: \item $y_i(t) = (1-\alpha_i) y_i(t-1) + \alpha_j y_j(t-1) + n(t-1).$
918: \item $y_j(t) = (1-\alpha_j) y_j(t-1) + \alpha_i y_i(t-1) - n(t-1).$
919: \end{itemize}
920: 
921: \begin{lemma}\label{l:2}
922: Suppose that for each $t$, $|n(t)| < \epsilon$, and that $a
923: > 0$. Then,
924: $$\p\left[\|\y(t)\| > n^{\frac{a}{2}}\left((1-\frac{1}{2n})^{t/2}\|\y(0)\| + 8\sqrt{2} n^{3/2}
925: \epsilon \right)\right] \leq \frac{5}{n^a}.$$
926: %\mathrm{poly}(n)\left((1-\frac{1}{2n})^{\frac{t}{2}}
927: % + \epsilon \right)\|\y(0)\|\right] < \frac{1}{\text{poly(n)}}.$$
928: \end{lemma}
929: {\bf Proof:}%Please refer to the Appendix.\\%{\bf Proof of Lemma~\ref{l:2}}\\
930:  $\y(t) = A(t-1)\y(t-1) + \n(t-1)$,
931: where $A(t) = I - (\alpha_i \e_i - \alpha_j \e_j)(\e_i^T - \e_j^T)$,
932: and $\n(t-1) = n(t-1)(\e_i - \e_j).$ Let $\x(0) = \y(0)$, and let
933: the $\x(t)$ satisfy $\x(t+1) = A(t)\x(t)$ as in Lemma~\ref{l:1}. We
934: observe that
935: $$\y(1) = \x(1) + \n(0)$$ and more generally,
936: $$ \y(t+1) =  \x(t+1) + \n(t) + \sum_{i=0}^{t-1} A(t)A(t-1)
937: \dots A(i+1) \n(i) .$$ An application of the triangle inequality now
938: gives us
939: $$ \|\y(t+1)\| \leq  \|\x(t+1)\| + \|\n(t)\| + \sum_{i=0}^{t-1} \|A(t)A(t-1)
940: \dots A(i+1) \n(i)\| .$$ Our approach to proving this Lemma is to
941: upper bound each term in the right hand side.
942: \begin{observation}\label{o:1}
943: \beqs \p\left[\|\x(t)\|
944:  >  (1-\frac{1}{2n})^{t/2} n^{a/2} \|x(0)\|\right] & \leq &
945: \left((1-\frac{1}{2n})^{t/2}
946: n^{a/2}\right)^{-2}\E\left(\frac{\|x(t)\|^2}{\|x(0)\|^2}\right)\\
947: & \leq & \left((1-\frac{1}{2n})^{t/2} n^{a/2}\right)^{-2} (1 -
948: \frac{1}{2n})^t \\
949: & = & \frac{1}{n^a}.\eeqs
950: \end{observation}
951: The above inequalities follow from Lemma~\ref{l:1} and
952: Corollary~\ref{c:1}. We shall now upper bound the other terms as
953: well with high probability. Using Corollary~\ref{c:1} \beqs
954: \p\left[\frac{\|A(t-1) \dots A(i) \n(i-1)\|}{\|\n(i-1)\|} >
955: (1-\frac{1}{2n})^{\frac{t-i}{4}} n^{\frac{a+1}{2}}\right] & \leq &
956: ((1-\frac{1}{2n})^{\frac{t-i}{4 }} n^{\frac{a+1}{2}})^{-2}\left(1 -
957: \frac{1}{2n}\right)^{t-i}\\
958: & = & n^{-(a+1)}(1-\frac{1}{2n})^{\frac{t-i}{2}}. \eeqs
959: 
960: However,
961: $$\sum_{i=1}^{t-1} (1-\frac{1}{2n})^{\frac{t-i}{2}} n^{-(a+1)} <
962: \frac{4}{n^a}$$ and so,
963: $$ \p\left[\exists_i \left\{\frac{\|A(t-1) \dots A(i)
964: \n(i-1)\|}{\|\n(i-1)\|}
965: > (1-\frac{1}{2n})^{\frac{t-i}{4}} n^{\frac{a+1}{2}}\right\}\right] \leq
966: \frac{4}{n^a}.$$
967: 
968: We next observe that $$\sum_{i \leq t}
969: (1-\frac{1}{2n})^{\frac{t-i}{4}} n^{\frac{a+1}{2}} < 8
970: n^\frac{a+3}{2}.$$ As a consequence we have
971: 
972: \begin{observation}
973: $$ \p\left[\sum_i \frac{\|A(t-1) \dots A(i)
974: \n(i-1)\|}{\|\n(i-1)\|}
975: > 8 n^\frac{a+3}{2}\right] \leq \frac{4}{n^a}.$$
976: \end{observation}
977: 
978: % ((1-\frac{1}{2n})^{\frac{t-i}{4 }} n^{\frac{a+1}{2}})^{-2}\left(1
979: %//-
980: %\frac{1}{2n}\right)^{t-i}\\
981: %& = & n^{-(a+1)}(1-\frac{1}{2n})^{\frac{t-i}{2}}. \eeqs
982: % Therefore
983: %\beqs \p\left[\sum_{i=1}^{t-1} \frac{\|A(t-1) \dots A(i)
984: %\n(i-1)\|}{\|\n(i-1)\|} > \sum_{i=1}^{t-1}
985: %(1-\frac{1}{2n})^{\frac{t-i}{4}} n^{\frac{a+1}{2}}\right] \leq
986: %\p\left[\sum_{i=1}^{t-1} \frac{\|A(t-1) \dots A(i)
987: %\n(i-1)\|}{\|\n(i-1)\|} > \right]\\ & < &  \sum_{i=1}^{t-1}
988: %n^{-(a+1)}(1-\frac{1}{2n})^{\frac{t-i}{2}}\\
989: %& < &  . \eeqs
990: 
991: Once we put the above two observations together and note that
992: $(\forall i) \sqrt{2} \epsilon \geq \|\n(i)\|$, an application of
993: the union bound gives
994: $$ \p\left[\|\y(t)\| > n^{\frac{a}{2}}\left((1-\frac{1}{2n})^{t/2}\|\y(0)\|+
995: 8 \sqrt{2} n^{3/2} \epsilon \right)\right]  \leq \frac{5}{n^a}.$$
996: {\hfill $\Box$}
997: \end{document}
998: