0508:cs0508006/arx.tex

1: \documentclass{cccg05}

2: \usepackage{graphicx,amssymb,amsmath}

3: \usepackage{subfigure}

4: \usepackage{epsfig}

5:

6: %----------------------- Macros and Definitions --------------------------

7:

8: % Add all additional macros here, do NOT include any additional files.

9:

10: \newcommand{\stress}{\mbox{\it stress}}

11: \newcommand{\st}{\mbox{\it st}}

12: \newcommand{\Ar}{\mbox{\it Ar}}

13: \newcommand{\betw}{\mbox{\it betw}}

14:

15: % The environments theorem (Theorem), invar (Invariant), lemma (Lemma),

16: % cor (Corollary), obs (Observation), conj (Conjecture), and prop

17: % (Proposition) are already defined in the cccg05.cls file.

18: % Add additional environments only if you REALLY need them.

19:

20: %----------------------- Title -------------------------------------------

21:

22: \title{A New Approach for Boundary Recognition in Geometric Sensor Networks}

23:

24: \author{S\'andor P.~Fekete\thanks{Department of Mathematical Optimization,

25:         Braunschweig University of Technology, {\tt [s.fekete,a.kroeller@tu-bs.de}}

26:         \and

27:         Michael Kaufmann\thanks{Department of Computer Science, University of T\"ubingen, {\tt [mk,lehmannk@informatik.uni-tuebingen.de}}

28:         \and

29:         Alexander Kr\"oller\footnotemark[1]\ \thanks{Supported by the German Research Foundation (DFG) within the focus program ``Algorithms for Large and Complex Networks'' (SPP 1126), grant Fe407/8-1.}

30:         \and

31:         Katharina Lehmann\footnotemark[2]\ \thanks{Supported by the German Research Foundation (DFG) within the focus program ``Algorithms for Large and Complex

32: Networks'' (SPP 1126), grant Ka812/11-1.}

33: }

34:

35: % Add the appropriate index information!

36:

37: \index{Fekete, S\'andor P.}

38: \index{Kaufmann, Michael}

39: \index{Kr\"oller, Alexander}

40: \index{Lehmann, Katharina}

41:

42: %------------------------------ Text -------------------------------------

43:

44: \begin{document}

45: \maketitle

46:

47: \begin{abstract}

48: We describe a new approach for dealing with the following central

49: problem in the self-organization of a geometric sensor network:

50: Given a polygonal region $R$, and a large, dense set of sensor nodes that are scattered

51: uniformly at random in $R$. There is no central control unit, and nodes can only communicate locally by

52: wireless radio to all other nodes that are within communication radius $r$,

53: without knowing their coordinates or distances to other nodes.

54: The objective is to develop a simple distributed protocol that allows

55: nodes to identify themselves as being located near the boundary of $R$

56: and form connected pieces of the boundary.

57: We give a comparison of several centrality measures commonly

58: used in the analysis of social networks and show that

59: {\em restricted stress centrality} is particularly

60: suited for geometric networks; we provide mathematical as

61: well as experimental evidence for the quality of this measure.

62: \end{abstract}

63:

64: \section{Introduction}

65: \label{sec:intro}

66:

67: %{\bf Sensor Networks.}

68: In recent time, the study of wireless sensor networks (WSN) has become

69: a rapidly developing research area that offers fascinating

70: perspectives for combining technical progress with new applications of

71: distributed computing. Typical scenarios involve a large swarm of

72: small and inexpensive processor nodes, each with limited computing and

73: communication resources, that are distributed in some geometric

74: region; communication is performed by wireless radio with limited

75: range.  As energy consumption is a limiting factor for the lifetime of

76: a node, communication has to be minimized. Upon start-up, the swarm

77: forms a decentralized and self-organizing network that surveys the

78: region.

79:

80: \begin{figure}[t]

81: \begin{center}

82:   \centering

83:   %\hspace*{0.03\textwidth}

84:   \subfigure[60,000 sensor nodes, distributed uniformly at random in a polygonal region.\label{fig:city:b}]{

85:     \epsfig{file=lq-ocp2-60k-70.eps,width=0.70\columnwidth}

86:   }

87:   \subfigure[A zoom into (a) shows the communication graph.\label{fig:city:c}]{

88:     \epsfig{file=lq-ocp3-60k-70.eps,width=0.35\columnwidth}

89:   }

90:   %\hspace*{0.03\textwidth}

91:   \subfigure[A further zoom into (b) shows the communication ranges.\label{fig:city:d}]{

92:     \epsfig{file=lq-ocp4-60k-70.eps,width=0.35\columnwidth}

93:   }

94:   \vspace*{-6mm}

95:   \caption{Scenario of a geometric sensor network, obtained by scattering sensor nodes in the street network surrounding Braunschweig University of Technology.}

96:   \label{fig:city}

97: \end{center}

98:   \vspace*{-6mm}

99: \end{figure}

100:

101: From an algorithmic point of view, the characteristics of a sensor

102: network require working under a paradigm that is different from

103: classical models of computation: Absence of a central control unit,

104: limited capabilities of nodes, and limited communication between nodes

105: require developing new algorithmic ideas that combine methods of

106: distributed computing and network protocols with traditional

107: centralized network algorithms. In other words: How can we use a

108: limited amount of strictly local information in order to achieve

109: distributed knowledge of global network properties?

110:

111: This task is much simpler if the exact

112: location of each node is known. Computing node coordinates

113: has received a considerable amount of attention.

114: Unfortunately, computing exact coordinates requires the use of

115: special location hardware like GPS, or alternatively,

116: scanning devices, imposing physical demands on size and structure

117: of sensor nodes.   As we demonstrated in our paper~\cite{kfb-kl-05},

118: current methods for computing coordinates based on anchor points

119: and distance estimates encounter serious

120: difficulties in the presence of even small inaccuracies, which are

121: unavoidable in practice.

122:

123: As shown in \cite{fkp-nbtrsn-04}, there is a way to sidestep many of the above

124: difficulties, as some structural location aspects do {\em not}

125: depend on coordinates.

126: This is particularly relevant for sensor networks

127: that are deployed in an environment

128: with interesting geometric features. (See \cite{fkp-nbtrsn-04}

129: for a more detailed discussion.) Obviously, scenarios as the one

130: shown in Figure~1 pose a number of interesting

131: geometric questions. Conversely, exploiting the basic fact

132: that the communication graph of a sensor network

133: has a number of geometric properties provides

134: an elegant way to extract structural information.

135:

136: One key aspect of location awareness is {\em boundary recognition},

137: making sensors close to the boundary of the surveyed region

138: aware of their position and letting them form

139: connected {\em boundary strips} along each verge.

140: This is of major importance for keeping track of events entering or

141: leaving the region, as well as for communication with the

142: outside.  Neglecting the existence of holes in the region may also

143: cause problems in communication, as routing along shortest paths tends

144: to put an increased load on nodes along boundaries, exhausting their

145: energy supply prematurely; thus, a moderately-sized hole (caused by

146: obstacles, by an event, or by a cluster of failed nodes) may tend to

147: grow larger and larger.

148:

149: We show that using a combination of geometry, stochastics, and tools

150: from social networks, a considerable amount of location awareness can indeed be

151: achieved in a large swarm of sensor nodes without any use of location

152: hardware. The result is a relatively simple distributed algorithm

153: for boundary recognition in large geometric sensor networks that shows

154: excellent performance for test networks with 80,000 nodes.

155:

156: \section{Centrality Measures for Social Networks}

157: \label{social}

158: A different area studying large and complex graphs is the field

159: of {\em Social Networks}, where nodes represent individuals

160: in a large collective, and edges indicate some interaction between

161: them. (See the recent book \cite{be-namf-05} for an overview and an extensive

162: list of references.) Identifying asymmetries within a network

163: is a natural approach; one particular way of doing this is based

164: on so-called centrality indices, i.e., real-valued functions that

165: assign high values to more ``central'' nodes, while ``boundary'' nodes

166: get low values.

167:

168: In the last five decades, many different centrality

169: indices have been proposed. There are two major classes: One is based

170: on local properties of the graph, so it is particularly suited for

171: typical scenarios of sensor networks and will be discussed in some detail.

172: The other class is based on more global properties, e.g.,

173: the computation of eigenvalues of the adjacency matrix, so it is less

174: useful for our purposes.

175:

176: %\pagebreak

177: %\vspace*{-6mm}

178: Centrality indices of the first class can be subdivided into three

179: subclasses: The first considers the distances to other vertices,

180: the second determines the number of vertices at a given

181: distance, while the third makes use of shortest

182: paths containing a given vertex.

183:

184: Considering the maximum distance to another vertex in the graph

185: (based on hop-count) does not reflect local topological structures

186: in a sensor network; in particular, it fails to indicate closeness

187: to interior boundaries. The size of the $k$-hop neighborhood

188: is better suited, and (for the simple choice $k=1$) was indeed the basis

189: for our approach described in \cite{fkp-nbtrsn-04}, as it is an indicator

190: for the size of the intersection of the communication range of

191: a node with $R$.

192: It is tempting to try to improve the results by increasing $k$,

193: but this is not without drawbacks with respect to topological properties,

194: as a boundary node

195: close to a ``thick'' part of $R$ may get a better value

196: than an interior node that is located in a ``thin'' part of the region.

197: See Figure~\ref{fig:cent:a} for a scenario with 80,000 nodes;

198: index values are represented on a color scale from dark (low)

199: to light (high).

200:

201: \begin{figure}

202: \begin{center}

203:   \centering

204:   \includegraphics[width=0.6\columnwidth]{lq-c-khop-4.eps}

205:   \caption{$k$-hop neighborhood for $k$=4.}

206:   \label{fig:cent:a}

207: \end{center}

208:   \vspace*{-6mm}

209: \end{figure}

210:

211: This leaves the structure of shortest paths. In particular,

212: the {\it stress centrality} $stress(v)$ is defined as the number

213: of shortest paths containing $v$:

214: \begin{equation}

215: \stress(v) := \sum_{s \in V}\sum_{t \not = s \in V} \sigma_{st}(v),

216: \end{equation}

217: where $\sigma_{st}(v)$ denotes the number of shortest paths containing $v$.

218: Only considering vertices within a given distance $\delta$ yields

219: the {\em restricted stress centrality}:

220: \begin{equation}

221: \stress(v, \delta) := \sum_{s \in V_\delta(v)}\sum_{t \not = s \in V_\delta(v)} \sigma_{st}(v).

222: \end{equation}

223:

224: \begin{figure*}

225: \begin{center}

226:   %\centering

227:   %\hspace*{0.03\textwidth}

228:   %\subfigure[$k$-hop neighborhood for $k$=4.\label{fig:cent:a}]{

229:     %\epsfig{file=lq-c-khop-4.eps,width=0.45\textwidth}

230:   %}

231:   \subfigure[Betweenness centrality.\label{fig:cent:b}]{

232:     \epsfig{file=lq-c-between-5.eps,width=0.65\columnwidth}

233:   }

234:   %\\

235:   %\hspace*{0.03\textwidth}

236:   \subfigure[Stress centrality.\label{fig:cent:c}]{

237:     \epsfig{file=lq-c-stress.eps,width=0.65\columnwidth}

238:   }

239:   %\\

240:   %\hspace*{0.03\textwidth}

241:   \subfigure[Restricted stress centrality with threshold filter.\label{fig:cent:d}]{

242:     \epsfig{file=lq-c-stress-thresh.eps,width=0.65\columnwidth}

243:   }

244:   \vspace*{-6mm}

245:   \caption{Performance of different centrality measures, shown for a scenario of 80,000 nodes distributed uniformly at random.}

246:   \label{fig:perform}

247: \end{center}

248:   \vspace*{-6mm}

249: \end{figure*}

250:

251: In the context of a communication network, this measure can be

252: motivated as follows:

253: If each vertex sends a message to every other vertex along all shortest paths,

254: the stress centrality counts how many times vertex $v$ is busy with

255: passing on a message. As there may be quite many shortest paths,

256: it is reasonable to assume that a vertex

257: sends a message to some other vertex and uses any of their shortest paths with the same probability, i.e., $1/\sigma_{st}$, where

258: $\sigma_{st}$ denotes the number of shortest paths between $s$ and $t$.

259: The probability of any vertex $v$ that it has to transport the message is thus

260: given by $\rho_{st}(v):=\frac{\sigma_{st}(v)}{\sigma_{st}}$.

261: The {\it betweenness centrality} $\betw(v)$ is defined as the sum over all $\rho_{st}(v)$:

262: \begin{equation}

263: \betw(v) := \sum_{s \in V}\sum_{t \in V} \rho_{st}(v).

264: \end{equation}

265: See Figure~\ref{fig:cent:b} for the evaluation of betweenness centrality

266: for our example, while Figure~\ref{fig:cent:c} shows the stress centrality.

267: (Again, low values are indicated by dark dots, while high values are represented

268: by light color.)

269: A detailed analysis for restricted stress centrality

270: is given in the following section.

271:

272: %\medskip {\bf Our Results.}  We show that distributed location

273: %awareness can be achieved without the help of location hardware. In

274: %particular:

275: %

276: %\begin{itemize}

277: %\item We describe how to recognize the nodes that are near the

278:   %boundary of the region. The underlying geometric idea is quite

279:   %simple, but it requires some effort on both stochastics and

280:   %communication to make it work.

281: %\end{itemize}

282: %%

283: %The rest of this paper is organized as follows. In Section~\ref{sec:prelim}

284: %we give some basic notation and state our underlying model assumptions.

285: %In Section~\ref{sec:tree} we describe how to obtain an auxiliary

286: %tree structure that is used for computing and distributing

287: %global network parameters. Section~\ref{sec:prob} gives a brief

288: %overview of probabilistic aspects that are used in the rest

289: %of the paper to allow topology recognition. Section~\ref{sec:bound}

290: %describes how to perform boundary recognition, while Section~\ref{sec:high}

291: %gives a sketch of how to compute more advanced properties.

292: %Section~\ref{sec:experiments} describes implementation issues

293: %and shows some of our experiments. Finally, Section~\ref{sec:future}

294: %discusses the possibilities for further progress based on our work.

295: %

296:

297: \section{Using Restricted Stress Centrality}

298: \label{stress}

299: In the context of a sensor network, it takes a number of algorithmic

300: steps to evaluate a measure and use the results for extracting

301: global features like boundaries. Some of those details are described

302: in our paper \cite{fkp-nbtrsn-04}, and can be used analogously for

303: other measures: Using an auxiliary tree structure (which is easy

304: to obtain), we can aggregate local results globally in order

305: to determine appropriate threshold values. Once a threshold has been set,

306: it can be distributed to all nodes in the network; after that, each

307: node simply checks whether its centrality index is above or below

308: the threshold, resulting in a classification as ``interior'' or ``boundary''.

309: A good index must have the following properties:

310: \begin{itemize}

311: \item It should require only simple local computations for each node.

312: \item Setting a good threshold value should be relatively easy.

313: In other words: The distributions for interior nodes and for boundary nodes

314: should be well-separated.

315: \end{itemize}

316:

317: \begin{theorem}

318: \label{th:sep}

319: Using the restricted stress centrality $\stress(v,1)$,

320: nodes are classified correctly with high probability

321: for sufficiently large node density.

322: \end{theorem}

323:

324: See Figure~\ref{fig:cent:d} for the result for restricted stress centrality

325: for relatively moderate density:

326: It can be seen that all boundary nodes are correctly classified. The

327: interior contains a number of false positives, which can be eliminated

328: by additional filters.

329:

330: {\bf Discussion of Theorem~1.}

331: Let $v$ be a node in the network, and let $\delta(v)$ be the number

332: of neighbors of $v$. Furthermore, $\stress(v,1)$ is the number

333: of nonadjacent neighbors of $v$. Then the normalized

334: coefficient $\st(v):=\frac{2\stress(v,1))}{\delta(v)(\delta(v)-1)}$

335: describes the fraction of pairs of neighbors that are nonadjacent,

336: i.e., that have a shortest-path connection via $v$, so

337: $\mathbb{E}[\stress(v,1)]=\mathbb{E}[st(v)]\left(\begin{array}{c}{\mathbb{E}[\delta(v)]}\\2\end{array}\right)$.

338: Now consider any neighbor $w$ of $v$. Let $C(v):=\{p\in R\mid d(p,v)\leq r\}$

339: be the portion of $R$ that is within communication range of $v$.

340: See Figure~\ref{fig:circles}; let $N_w:=C(v)\cap C(w)$, and

341: $M_w:=C(v)\setminus C(w)$. For a uniform random distribution,

342: the expected fraction of neighbors of $v$ that are not adjacent

343: to $w$ corresponds to the ratio of areas

344: $\frac{\Ar(M_w)}{\Ar(C(v))}$.

345: Integrating over all possible positions of $w$, we get

346: an overall expected value

347: $\st(v)=\frac{1}{\Ar(C(v))}\int_{w\in C(v)}\left(\frac{\Ar(M_w)}{\Ar(C(v))}\right)dw$.

348:

349: \begin{figure}[h]

350:   \centering

351:   \includegraphics[width=.45\columnwidth]{lq-circles.eps}

352:   \vspace*{-3mm}

353:   \caption{For any given neighbor $w$ of $v$, the expected fraction of

354:   neighbors of $v$ that are not neighbors of $w$ is given by

355:   $\frac{|M|}{|N\cup M|}$.}

356:   \label{fig:circles}

357:   \vspace*{-3mm}

358: \end{figure}

359:

360: As the size of the areas also depends on the distance $s$ of $v$

361: from the boundary, solving this integral in closed form

362: for all $s$ would require finding a primitive that contains $d$ as an explicit

363: parameter; this appears to be hopeless, even using ideas as described

364: in \cite{geo.prob}. However, for specific values of $s$,

365: an explicit numerical calculation is possible:

366: For $s\geq r=1$ and $d(w,v)=x$ the area of $M_w$ turns out to be

367: $\frac{8\left(\arccos\left(\frac{x}{2}\right)-\frac{1}{2}\sin\left(2\arccos\left(\frac{x}{2}\right)\right)\right)}{3}$.

368: The resulting integral $\sigma=\int_0^1 x\left(1-\frac{2\left(\arccos\left(\frac{x}{2}\right)-\frac{1}{2}\sin\left(2\arccos\left(\frac{x}{2}\right)\right)\right)}{\pi}\right)dx$ can be solved numerically,

369: resulting in a value of $\sigma=0.4134966716$.

370: %Similarly, the resulting value

371: %for $s=0$ is xxx.

372:

373: For determining threshold values for separating interior and

374: boundary values of $\st$, we also need the random distribution

375: of $\st$ for different values of $s$. These distributions

376: can be determined with additional numerical computations; using

377: a Monte-Carlo simulation, we obtained distributions

378: like the ones in Figure~\ref{fig:dist}: Shown are the distributions

379: for 20 expected neighbors (\ref{fig:dist:a})

380: and for 200 expected neighbors (\ref{fig:dist:b}); the left

381: (red) curve shows the distribution of $\st$ for a node $v$ on the

382: boundary, while the right (green/blue) curve shows the distribution

383: completely in the interior of $R$.

384: The probability of error for a specific threshold is given

385: by the normalized area to the right of the threshold below

386: the left curve (false negatives)

387: or by the normalized area to the left of the threshold below

388: the right curve (false positive). Clearly, the error becomes

389: arbitrarily small for large neighborhood size.

390: \QED

391:

392: For intermediate sizes

393: as the one in our example, choosing a relatively large threshold

394: value avoids too many false negatives, at the expense of a limited

395: ratio of false positives.

396: \begin{figure}

397: \begin{center}

398:   \centering

399:   \subfigure[Distributions for neighborhood size 20.\label{fig:dist:a}]{

400:     \epsfig{file=lq-keep_n10000_p0.002_x50000.eps,width=0.8\columnwidth}

401:   }

402:   \subfigure[Distributions for neighborhood size 200.\label{fig:dist:b}]{

403:     \epsfig{file=lq-keep_n100000_p0.002_x50000.eps,width=0.80\columnwidth}

404:   }

405:   \vspace*{-3mm}

406:   \caption{Random distribution of restricted stress centrality for a node on the boundary and in the interior,

407: for different neighborhood sizes.}

408:   \label{fig:dist}

409: \end{center}

410:   \vspace*{-3mm}

411: \end{figure}

412:

413: \section{Algorithm}

414:

415: In \cite{fkp-nbtrsn-04}, we showed how to estimate

416: $\mathbb{E}[\delta(v)]$ for a node $v$ of boundary distance $s\geq r$,

417: i.e., a node on the inside of the network. The algorithm constructs a

418: tree, collects a node degree histogram and floods the result to all

419: nodes. Both the total runtime of the algorithm and the total size of

420: messages is $\mathcal{O}(|V|\log^2|V|)$. Each node stores a constant

421: threshold value

422: $0 < \theta < \sigma$ that has been chosen in advance. If

423: \[ st(v)\leq \theta\left(\begin{array}{c}\mathbb{E}[\delta(v)]\\2\end{array}\right) \;, \]

424: the node declares itself to be a boundary node. In experiments, we

425: found $\theta=1/3$ to be a particularly good choice.

426:

427:

428: \section{Conclusion}

429:

430: We showed that restricted stress centrality is a useful index

431: for extracting topological boundary information from a geometric

432: sensor network, provided that the distribution of nodes follows

433: a suitable random distribution. As this is a rather strong assumption,

434: it appears desirable to come up with more general methods.

435: Moreover, an approach based on random distributions

436: may still fail in some rare cases

437: (even though the probability of failure is extremely low),

438: so it is particularly interesting to develop

439: deterministic methods for boundary recognition.

440: Such an approach is described in our forthcoming paper

441: \cite{fkfp-dbrlgsn-05}.

442:

443: %---------------------------- Bibliography -------------------------------

444:

445: % Please add the contents of the .bbl file

446:

447: \small

448: \bibliographystyle{abbrv}

449: \bibliography{refs}

450:

451: %\begin{thebibliography}{99}

452: %\end{thebibliography}

453:

454: \end{document}

455: