0504:cs0504014/pp.tex

1:

2: %\documentclass[10pt,twocolumn]{IEEEtran}

3: \documentclass[11pt,onecolumn,dvips,draftcls]{IEEEtran}

4:

5: \usepackage{psfig,amsfonts,amsmath,color,amssymb,amsxtra}

6: \usepackage[breaklinks=true, colorlinks=true, linkcolor=black,

7: urlcolor=dblue, citecolor=black, pdfpagemode=None, pdfstartview=FitH]{hyperref}

8: \DeclareOldFontCommand{\rm}{\normalfont\rmfamily}{\mathrm}

9: \DeclareOldFontCommand{\sf}{\normalfont\sffamily}{\mathsf}

10: \DeclareOldFontCommand{\tt}{\normalfont\ttfamily}{\mathtt}

11: \DeclareOldFontCommand{\bf}{\normalfont\bfseries}{\mathbf}

12: \DeclareOldFontCommand{\it}{\normalfont\itshape}{\mathit}

13: \DeclareOldFontCommand{\sl}{\normalfont\slshape}{\@nomath\sl}

14: \DeclareOldFontCommand{\sc}{\normalfont\scshape}{\@nomath\sc}

15: \definecolor{gray}{cmyk}{.2,0.2,.3,.1}

16: \definecolor{dred}{cmyk}{0,0.9,0.4,0.3}

17: \definecolor{dblue}{rgb}{0,0,0.5}

18: \definecolor{dgreen}{rgb}{0,0.3,0}

19: \definecolor{dgray}{rgb}{0.3,0.3,0}

20: \newtheorem{theorem}{Theorem}

21: \newtheorem{lemma}{Lemma}

22: \newtheorem{corollary}{Corollary}

23: \newtheorem{example}{Example}

24: \newtheorem{definition}{Definition}

25: \newcommand{\rend}{\hfill$\square$}

26: \newcommand{\tend}{\hfill$\blacksquare$}

27: \newtheorem{remark}{Remark}

28: \newcommand{\flow}{\varphi}

29: \setlength{\textwidth}{17cm}

30:

31:

32: \title{Network Information Flow \\ with Correlated Sources

33:   \thanks{J.\ Barros was with the

34:   Institute for Communications Engineering, Munich University of Technology,

35:   Munich, Germany.  He is now with the Department of Computer Science,

36:   University of Porto, Portugal.  URL:

37:   %\href{http://cn.ece.cornell.edu/redirect-joao.html}

38:   \href{http://www.dcc.fc.up.pt/\~barros/}

39:   {{\tt http://www.dcc.fc.up.pt/$\sim$barros/}}.

40:   S.\ D.\ Servetto is with

41:   the School of Electrical and Computer Engineering, Cornell University,

42:   Ithaca, NY.  URL:

43:   \href{http://cn.ece.cornell.edu/}{{\tt http://cn.}}

44:   \href{http://cn.ece.cornell.edu/}{{\tt ece.cornell.edu/}}.

45:   Work supported by a scholarship from

46:   the Fulbright commission, and by the National Science Foundation, under

47:   awards CCR-0238271 (CAREER), CCR-0330059, and ANR-0325556.  Previous

48:   conference publications:~\cite{BarrosS:02b, BarrosS:03a, BarrosS:03d,

49:   BarrosS:05}.}}

50: \author{Jo\~{a}o Barros \hspace{2cm} Sergio D.\ Servetto}

51: \date{October 2, 2005.}

52:

53:

54: \begin{document}

55: \maketitle

56:

57: \begin{picture}(0,0)

58: \put(0,220){\tt\small To appear in the IEEE Transactions on Information

59:   Theory.}

60: \end{picture}

61:

62: \vspace{-13mm}

63: \begin{abstract}

64: \noindent\it

65: Consider the following network communication setup, originating

66: in a sensor networking application we refer to as the ``sensor

67: reachback'' problem.  We have a directed graph $G=(V,E)$, where

68: $V = \{v_0v_1...v_n\}$ and $E\subseteq V\times V$.  If $(v_i,v_j)\in E$,

69: then node $i$ can send messages to node $j$ over a discrete memoryless

70: channel $(\mathcal{X}_{ij},p_{ij}(y|x),\mathcal{Y}_{ij})$, of capacity

71: $C_{ij}$.  The channels are independent.  Each node $v_i$ gets to

72: observe a source of information $U_i$ ($i=0...M$), with joint

73: distribution $p(U_0U_1...U_M)$.  Our goal is to solve an incast

74: problem in $G$: nodes exchange messages with their neighbors, and after

75: a finite number of communication rounds, one of the $M+1$ nodes ($v_0$

76: by convention) must have received enough information to reproduce

77: the entire field of observations $(U_0U_1...U_M)$, with arbitrarily

78: small probability of error.  In this paper, we prove that such perfect

79: reconstruction is possible if and only if

80: \[

81:   H(U_S|U_{S^c}) \;\;<\;\; \sum_{i\in S,j\in S^c} C_{ij},

82: \]

83: for all $S \subseteq \{0...M\}$, $S\neq\emptyset$, $0\in S^c$.  Our

84: main finding is that in this setup a general source/channel separation

85: theorem holds, and that Shannon information behaves as a classical

86: network flow, identical in nature to the flow of water in pipes.  At

87: first glance, it might seem surprising that separation holds in a

88: fairly general network situation like the one we study.  A closer look,

89: however, reveals that the reason for this is that our model allows

90: only for independent point-to-point channels between pairs of nodes,

91: and not multiple-access and/or broadcast channels, for which separation

92: is well known not to hold~\cite[pp.\ 448-49]{CoverT:91}.  This

93: ``information as flow'' view provides an algorithmic interpretation

94: for our results, among which perhaps the most important one is the

95: optimality of implementing codes using a {\em layered} protocol stack.

96: \end{abstract}

97:

98: \vspace{1cm}

99: \pagebreak

100:

101:

102: \section{Introduction}

103:

104: \subsection{The Sensor Reachback Problem}

105:

106: Wireless sensor networks made up of small, cheap, and mostly unreliable

107: devices equipped with limited sensing, processing and transmission

108: capabilities, have recently sparked a fair amount of interest in

109: communications problems involving multiple correlated sources and

110: large-scale wireless networks~\cite{AkyildizSSC:02}.  It is envisioned

111: that an important class of applications for such networks involves a

112: dense deployment of a large number of sensors over a fixed area, in

113: which a physical process unfolds---the task of these sensors is then

114: to collect measurements, encode them, and relay them to some data

115: collection point where this data is to be analyzed, and possibly acted

116: upon.  This scenario is illustrated in Fig.~\ref{fig:reachback}.

117:

118: \begin{figure}[ht]

119: \centerline{\psfig{width=12cm,height=4cm,file=reachback.eps}}

120: \caption{A large number of sensors is deployed over a target area.

121:   After collecting the data of interest, the sensors must {\it reach back}

122:   and transmit this information to a single receiver (e.g., an overflying

123:   plane) for further processing.}

124: \label{fig:reachback}

125: \end{figure}

126:

127: There are several aspects that make this communications problem

128: interesting:

129: \begin{itemize}

130: \item {\it Correlated Observations:} If we have a large number of nodes

131:   sensing a physical process within a confined area, it is reasonable to

132:   assume that their measurements are correlated. This correlation may be

133:   exploited for efficient encoding/decoding.

134: \item {\it Cooperation among Nodes:} Before transmitting data to the

135:   remote receiver, the sensor nodes may establish a {\it conference}

136:   to exchange information over the wireless medium and increase their

137:   efficiency or flexibility through cooperation.

138: \item {\it Channel Interference:} If multiple sensor nodes use the wireless

139:   medium at the same time (either for conferencing or reachback), their

140:   signals will necessarily interfere with each other.  Consequently,

141:   reliable communication in a reachback network requires a set of rules

142:   that control (or exploit) the interference in the wireless medium.

143: \end{itemize}

144:

145: In order to capture some of these key aspects, while still being able

146: to provide complete results, we make some modeling assumptions, discussed

147: next.

148:

149: \subsubsection{Source Model}

150:

151: We assume that the sources are memoryless, and thus consider only the

152: spatial correlation of the observed samples and not their temporal

153: dependence (since the latter dependencies could be dealt with by simple

154: extensions of our results to the case of ergodic sources).  Furthermore,

155: each sensor node $v_i$ observes only one component $U_i$ and must

156: transmit enough information to enable the sink node $v_0$ to reconstruct

157: the whole vector $U_1U_2\dots U_M$.  This assumption is the most natural

158: one to make for scenarios in which data is required at a remote location

159: for fusion and further processing, but the data capture process is

160: distributed, with sensors able to gather {\em local} measurements only,

161: and deeply embedded in the environment.

162:

163: A conceptually different approach would be to assume that all sensor

164: nodes get to observe independently corrupted noisy versions of one and

165: the same source of information $U$, and it is this source (and not the

166: noisy measurements) that needs to be estimated at a remote location.

167: This approach seems better suited for applications involving non-homogeneous

168: sensors, where each one of the sensors gets to observe different

169: characteristics of the same source (e.g., multispectral imaging), and

170: therefore leads to a conceptually very different type of sensing

171: applications from those of interest in this work.  Such an approach

172: leads to the so called {\it CEO problem} studied by Berger, Zhang and

173: Viswanathan in~\cite{BergerZV:96}.

174:

175: \subsubsection{Independent Channels}

176:

177: Our motivation to consider a network of independent DMCs is twofold.

178:

179: From a pure information-theoretic point of view independent channels

180: are interesting because, as shown in this paper, this assumption gives

181: rise to long Markov chains which play a central role in our ability to

182: prove the converse part of our coding theorem, and thus obtain conclusive

183: results in terms of capacity.  Moreover, a corollary of said coding

184: theorem does provide a conclusive answer for a special case of the

185: multiple access channel with correlated sources, a problem for which

186: no general converse is known.

187:

188: From a more practical point of view, the assumption of independent

189: channels is valid for any network that controls interference by means

190: of a reservation-based medium-access control protocol (e.g., TDMA).

191: This option seems perfectly reasonable for sensor networking scenarios

192: in which sensors collect data over extended periods of time, and must

193: then transmit their accumulated measurements simultaneously.  In this

194: case, a key assumption in the design of standard random access techniques

195: for multiaccess communication breaks down---the fact that individual

196: nodes will transmit with low probability~\cite[Chapter~4]{BertsekasG:92}.

197: As a result, classical random access would result in too many collisions

198: and hence low throughput.  Alternatively, instead of {\em mitigating}

199: interference, a medium access control (MAC) protocol could attempt to

200: {\em exploit} it, in the form of using cooperation among nodes to generate

201: waveforms that add up constructively at the receiver (cf.~\cite{HuS:03c,

202: HuS:03b, HuS:05}).  Providing an information-theoretic analysis of such

203: cooperation mechanisms would be very desirable, but since it entails

204: dealing with correlated sources and a general multiple access channel,

205: dealing with correlated sources and an array of independent channels

206: constitutes a reasonable first step towards that goal, and is also

207: interesting in its own right, since it provides the ultimate performance

208: limits for an important class of sensor networking problems.

209:

210: \subsubsection{Perfect Reconstruction at the Receiver}

211:

212: In our formulation of the sensor reachback problem, the far receiver

213: is interested in reconstructing the entire field of sensor measurements

214: with arbitrarily small probability of error.  This formulation leads

215: us to a natural {\em capacity} problem, in the classical sense of

216: Shannon.  Alternatively, one could relax the condition of perfect

217: reconstruction, and tolerate some distortion in the reconstruction

218: of the field of measurements at the far receiver, thus leading to

219: the so called {\em Multiterminal Source Coding} problem studied by

220: Berger~\cite{Berger:78}.  This condition could be further relaxed,

221: to require a faithful reproduction of the {\em image} of some function

222: $f$ of the sources, leading to a problem studied extensively by

223: Csiszar, K\"orner and Marton~\cite{CsiszarK:80, KoernerM:79}.

224:

225: \subsection{An Information Theoretic View of Architectural Issues}

226:

227: For large-scale, complex systems of the type of interest in this work,

228: the complexity of basic questions of design and performance analysis

229: appears daunting:

230: \begin{itemize}

231: \item How should nodes cooperate to relay messages to the data collector

232:   node $v_0$?  Should they decode received messages, re-encode them, and

233:   forward to other nodes?  Should they map channel outputs to channel

234:   inputs without attempting to decode?  Should they do something else?

235: \item How should redundancy among the sources be exploited?  Should we

236:   compress the information as much as possible?  Should we leave some

237:   of that redundancy to combat noise in the channels?  Is there a

238:   source/channel separation theorem in these networks?

239: \item How do we measure performance of these networks, what are appropriate

240:   cost metrics?  How do we design networks that are efficient under an

241:   appropriate cost metric?

242: \end{itemize}

243: In~\cite{KawadiaK:04}, a number of examples are identified in which

244: the existence of a simple architecture has played an enabling role in

245: the proliferation of technology: the von Neuman computer architecture,

246: separation of source and channel coding in communications, separation

247: of plant and controller in control systems, and the OSI layered

248: architecture model.  So what all these questions boil down to is

249: an issue similar to those considered in~\cite{KawadiaK:04}: what are

250: appropriate abstractions of the network, similar to the IP protocol

251: stack for the Internet, based on which we can break the design task

252: into independent reusable components, optimize the design of these

253: components, and obtain an {\em efficient} system as a result?  In

254: this work, we show how information theory is indeed capable of

255: providing very meaningful answers to this problem.

256:

257: Information theory, in one of its applications, deals with the analysis

258: of performance of communication systems.  So, to some it may seem the

259: natural theory to turn to for guidance in the task of searching for a

260: suitable network architecture.  However, to others it may seem unnatural

261: to do so: it is well known that information theory and communication

262: networks have not had fruitful interactions in the past, as explained

263: by Ephremides and Hajek~\cite{EphremidesH:98}.  Thus, in the presence

264: of these mixed indicators, we take the stand that indeed information

265: theory has a great deal to offer in the task at hand.  And to justify

266: our position, consider Shannon's model for a communications system, as

267: illustrated in Fig.~\ref{fig:shannon-pt2pt}.

268:

269: \begin{figure}[!ht]

270: \centerline{\psfig{file=shannon-model.eps,height=5cm,width=14cm}}

271: \caption{\small Shannon's model for a point-to-point system.  Top

272:   figure: abstract view, consisting of a source, an

273:   encoder from source symbols to channel symbols, a conditional

274:   probability distribution to model the random dependence of outputs

275:   on inputs, and a decoder to map from received messages back to

276:   source symbols; bottom figure: a capacity-achieving architecture

277:   for this system, in which error control codes are used to create

278:   an illusion of a noiseless bit pipe.}

279: \label{fig:shannon-pt2pt}

280: \end{figure}

281:

282: For this setup, Shannon established that reliable communication of

283: a source over a noisy channel is possible if and only if the entropy

284: rate of the source is less than the capacity of the

285: channel~\cite[Ch.\ 8.13]{CoverT:91}.  This result, known as the

286: source/channel separation theorem, has a double significance.  On

287: one hand, it provides an exact single-letter characterization of

288: conditions under which reliable communication is possible.  On the

289: other hand, and of particular interest to the task at hand for

290: us, it is a statement about the {\em architecture} of an optimal

291: communication system: the encoder/decoder design task can be split

292: into the design and optimization of two independent components.

293: So it is inspired by Shannon's teachings for point-to-point

294: systems that we ask in this work, and answer in the affirmative,

295: the question of whether it is possible or not to derive similar

296: useful architectural guidelines for the class of networks under

297: consideration.

298:

299: \subsection{Related Work}

300: \label{sec:related-work}

301:

302: The problem of communicating distributed correlated sources over a

303: network of point-to-point links is closely related to several classical

304: problems in network information theory.  To set the stage for the main

305: contributions of this paper, we now review related previous work.

306:

307: \subsubsection{Distributed Correlated Sources and Multiple Access}

308:

309: The concept of separate encoding of correlated sources was studied by

310: Slepian and Wolf in their seminal paper~\cite{SlepianW:73b},

311: where they proved that two correlated sources $(U_1U_2)$ drawn i.i.d.\

312: $\sim p(u_1u_2)$ can be compressed at rates $(R_1,R_2)$ if and only if

313: \begin{eqnarray*}

314: R_1 & \geq & H(U_1|U_2) \\

315: R_2 & \geq & H(U_2|U_1) \\

316: R_1+R_2 & \geq & H(U_1U_2).

317: \end{eqnarray*}

318:

319: Assume now that $(U_1U_2)$ are to be transmitted with arbitrarily

320: small probability of error to a joint receiver over a multiple access

321: channel with transition probability $p(y|x_1x_2)$.  Knowing that the

322: capacity of the multiple access channel with independent

323: sources is given by the convex hull of the set of points $(R_1,R_2)$

324: satisfying~\cite[Ch.\ 14.3]{CoverT:91}

325: \begin{eqnarray*}

326: R_1&<&I(X_1;Y|X_2) \\

327: R_2&<&I(X_2;Y|X_1) \\

328: R_1+R_2&<&I(X_1X_2;Y),

329: \end{eqnarray*}

330: it is not difficult to prove that Slepian-Wolf source coding of

331: $(U_1U_2)$ followed by separate channel coding yields the following

332: {\em sufficient} conditions for reliable communication

333: \begin{eqnarray*}

334: H(U_1|U_2) & < & I(X_1;Y|X_2) \\

335: H(U_2|U_1) & < & I(X_2;Y|X_1) \\

336: H(U_1U_2) & < & I(X_1X_2;Y).

337: \end{eqnarray*}

338: These conditions, which basically state that the Slepian-Wolf region

339: and the capacity region of the multiple access channel have a non-empty

340: intersection, are sufficient but not necessary for reliable communication,

341: as shown by Cover, El Gamal, and Salehi with a simple counterexample

342: in~\cite{CoverGS:80}.  In that same paper, the authors introduce a class

343: of {\it correlated} joint source/channel codes, which enables them to

344: increase the region of achievable rates to

345: \begin{eqnarray}

346: H(U_1|U_2)&<&I(X_1;Y|X_2U_2) \label{eq:cegs1}\\

347: H(U_2|U_1)&<&I(X_2;Y|X_1U_1) \label{eq:cegs2}\\

348: H(U_1U_2)&<&I(X_1X_2;Y), \label{eq:cegs3}

349: \end{eqnarray}

350: for some

351: $p(u_1u_2x_1x_2y)=p(u_1u_2)\cdot p(x_1|u_1)\cdot p(x_2|u_2)\cdot p(y|x_1x_2)$.

352: Also in~\cite{CoverGS:80}, the authors generalize this set of sufficient

353: conditions to sources $(U_1U_2)$ with a common part $W=f(U_1)=g(U_2)$,

354: but they were not able to prove a converse, i.e., they were not able

355: to show that their region is indeed the capacity region of the multiple

356: access channel with correlated sources.  Later, it was shown with a

357: carefully constructed example by Dueck in~\cite{Dueck:81} that indeed

358: the region defined by eqns.~(\ref{eq:cegs1})-(\ref{eq:cegs3}) is not

359: tight.  Related problems were considered by Slepian and

360: Wolf~\cite{SlepianW:73}, and Ahlswede and Han~\cite{AhlswedeH:83}.

361: To this date however, the general problem still

362: remains open.

363:

364: Assuming independent sources, Willems investigated a cooperative

365: scenario, in which encoders exchange messages over {\em conference}

366: links of limited capacity prior to transmission over the multiple

367: access channel~\cite{Willems:83}.  In this case, the capacity region

368: is given by

369: \begin{eqnarray*}

370: R_1&<&I(X_1;Y|X_2Z)+C_{12} \\

371: R_2&<&I(X_2;Y|X_1Z)+C_{21} \\

372: R_1+R_2&<&\min\{\;I(X_1X_2;Y|Z)+C_{12}+C_{21},\;I(X_1X_2;Y)\;\},

373: \end{eqnarray*}

374: for some auxiliary random variable $Z$ such that

375: $|\mathcal{Z}|\leq\min(|\mathcal{X}_1|\cdot|\mathcal{X}_2|+2,|\mathcal{Y}|+3)$,

376: and for a joint distribution $p(zx_1x_2y_1y_2)

377: = p(z)\cdot p(x_1|z)\cdot p(x_2|z)\cdot p(y|x_1x_2)$.

378:

379: \subsubsection{Correlated Sources and Networks of DMCs}

380:

381: Very recently, an early paper was brought to our attention, in which

382: Han considers the transmission of correlated sources to a common sink

383: over a network of independent channels~\cite{Han:80}.  Although the

384: problem setup is less general than ours, in that (a) each source block

385: and each transmitted codeword partipate only once in the encoding

386: process, and (b) the intermediate nodes are assumed to decode the

387: data before passing it on, Theorem 3.1 of~\cite{Han:80} is very similar

388: to our Theorem~\ref{thm:main}.

389:

390: Our work, done independently of Han's, differs from it and complements

391: it in the following ways:

392: \begin{itemize}

393: \item Our setup is more general.  We allow for arbitrary forms of joint

394:   source-channel coding to take place inside the network while data flows

395:   towards the decoder, and then {\em prove} that a one-step encoding

396:   process, pure routing, and separate source/channel coding are sufficient.

397:   Han assumes decode-and-forward in his problem statement, as well as

398:   a one-step encoding process.

399: \item The proof techniques are different.  Han takes a purely combinatorial

400:   approach to the problem: he thoroughly exploits the polymatroidal

401:   structure of the capacity function for the network of channels, and the

402:   co-polymatroidal structure for the Slepian-Wolf region.  We establish our

403:   achievability result by explicitly constructing a routing algorithm for

404:   the Slepian-Wolf indices, and our converse by standard methods based on

405:   Fano's inequality.

406: \end{itemize}

407: Furthermore our work, being motivated by a concrete sensor networking

408: application, establishes connections and relevance to practical engineering

409: problems (see Section~\ref{sec:protocol-stack}) that are not a concern

410: in~\cite{Han:80}.

411:

412: \subsubsection{Network Coding}

413:

414: Another closely related problem is the well known {\em network coding}

415: problem, introduced by Ahlswede, Cai, Li and Yeung~\cite{AhlswedeCLY:00}.

416: In that work, the authors establish the need for applying coding

417: operations at intermediate nodes to achieve the max-flow/min-cut bound

418: of a general multicast network.  A converse proof for this problem

419: was provided by Borade~\cite{Borade:02}.  Linear codes were proposed

420: by Li, Yeung and Cai in~\cite{LiYC:03}, and Koetter and M\'edard

421: in~\cite{KoetterM:03}.

422:

423: Effros, M\'edard et al.\ have developed a comprehensive study on separate

424: and joint design of linear source, channel and network codes for networks

425: with correlated sources under the assumption that all operations are

426: defined over a common finite field~\cite{EffrosMHRKK:03}.  For this

427: particular case, optimality of separate linear source and channel coding

428: was observed in the one-receiver instance, but the result

429: of~\cite{EffrosMHRKK:03} does not prove that it holds for general networks

430: and channels with arbitrary input and output alphabets.  Error exponents

431: for multicasting of correlated sources over a network of noiseless channels

432: were given by Ho, M\'edard et al.\ in~\cite{HoMEK:04}, and networks with

433: undirected links were considered by Li and Li in~\cite{LiL:04}.

434:

435: Another problem in which network flow techniques have been found useful

436: is that of finding the maximum stable throughput in certain networks.  In

437: this problem, posed by Gupta and Kumar in~\cite{GuptaK:00}, it is sought

438: to determine the maximum rate at which nodes can inject bits into a

439: network, while keeping the system stable.  This problem was reformulated

440: by Peraki and Servetto as a multicommodity flow problem, for which tight

441: bounds were obtained using elementary counting

442: techniques~\cite{PerakiS:03, PerakiS:04}.

443:

444: \subsection{Main Contributions and Organization of the Paper}

445:

446: Our main original contributions can be summarized as follows:

447: \begin{itemize}

448: \item A general coding theorem yielding necessary and sufficient

449:   conditions for reliable communication of $M+1$ correlated sources

450:   to a common sink over a network of independent DMCs.

451: \item An achievability proof which combines classical coding arguments

452:   with network flow methods and a converse proof that establishes

453:   the optimality of separate source and channel coding.

454: \item A detailed discussion on the engineering implications of our

455:   main result, and the concepts of information-theoretically optimal

456:   network architectures and protocol stacks.

457: \end{itemize}

458:

459: The rest of the paper is organized as follows.  In

460: Section~\ref{sec:coding-theorems} we give formal definitions, to then

461: state and prove our main theorem.  We also look at three special cases:

462: a network with three nodes, the non-cooperative case, and an array of

463: orthogonal Gaussian channels.  In Section~\ref{sec:protocol-stack} we

464: address the practical implications of our main result, by describing

465: an information-theoretically optimal protocol stack, elaborating on the

466: tractability of related network architecture and network optimization

467: problems, and discussing the suboptimality of correlated codes for

468: orthogonal channels.  The paper concludes with

469: Section~\ref{sec:conclusions}.

470:

471:

472: \section{A Coding Theorem for Network Information Flow with Correlated

473:   Sources}

474: \label{sec:coding-theorems}

475:

476: \subsection{Formal Definitions and Statement of the Main Theorem}

477:

478: A {\em network} is modeled as the complete graph on $M+1$ nodes.

479: For each $(v_i,v_j)\in E$ ($0\leq i,j\leq M$), there is a discrete

480: memoryless channel $(\mathcal{X}_{ij},p_{ij}(y|x),\mathcal{Y}_{ij})$,

481: with capacity $C_{ij} = \max_{p_{ij}(x)} I(X_{ij};Y_{ij})$.\footnote{Note

482: that $C_{ij}$ could potentially be zero, thus assuming a complete graph

483: does not mean necessarily that any node can send messages to any other

484: node in one hop.}

485: At each node $v_i\in V$, a random variable $U_i$ is observed

486: ($i=0...M$), drawn i.i.d.\ from a known joint distribution

487: $p(U_0U_1...U_M)$.  Node $v_0$

488: is the {\em decoder} -- the goal in this problem is to

489: find conditions under which $U_1...U_M$ can be reproduced reliably at

490: $v_0$.  We now make this statement more precise, by describing how the

491: nodes communicate and by giving the formal definitions of code,

492: probability of error and reliable communication.

493:

494: Time is discrete.  Every $N$ time steps, node $v_i$ collects a block

495: $U_i^N$ of source symbols -- we refer to the collection of all blocks

496: $[U_0^N(k)U_1^N(k)...U_M^N(k)]$ collected at time $kN$ ($k\geq 1$) as

497: a {\em block of snapshots}.  Node $v_i$ then sends a codeword

498: ${X}_{ij}^N$ to node $v_j$.  This codeword depends on a {\em window}

499: of $K$ previous blocks of source sequences $U_i^N$ observed at node

500: $v_i$, and of $T$ previously received blocks of channel outputs,

501: corresponding to noisy versions of the codewords sent by all nodes to

502: node $v_i$ in the previous $T$ communications steps (corresponding to

503: $NT$ time steps).

504:

505: For a block of snapshots observed at time $kN$, at time $(k+W)N$ (that

506: is, after allowing for a finite but otherwise arbitrary amount of time

507: to elapse,\footnote{During the time that a block of snapshots spends

508: within the network, arbitrarily complex coding operations are allowed

509: within the pipeline: nodes can exchange information, redistribute their

510: load, and in general perform any form of joint source-channel coding

511: operations.  The only constraint imposed is that all information

512: eventually be delivered to destination, within a finite time horizon.}

513: in which the information injected by all nodes

514: reaches $v_0$), an attempt is made to decode at $v_0$.  The decoder produces

515: an estimate of the block of snapshots $U_0^N(k)U_1^N(k)...U_M^N(k)$ based on

516: the local observations $U_0^N(k)$, and the previous $W$ blocks of $N$

517: channel outputs generated by codewords sent to $v_0$ by the other nodes.

518:

519: Thus, a {\em code} for this network consists of:

520: \begin{itemize}

521: \item four integers $N$, $K$, $T$ and $W$;

522: \item encoding functions at each node

523:   \[ g_{ij}:\bigotimes_{l=1}^K\mathcal{U}_i^N \times

524:             \bigotimes_{t=1}^T\bigotimes_{m=0}^M \mathcal{Y}_{mi}^N

525:             \longrightarrow \mathcal{X}_{ij}^N,

526:   \]

527:   for $0 \leq i, j \leq M$.

528: \item the decoding function at node $v_0$:

529:   \[ h: \mathcal{U}_0^N \times

530:         \bigotimes_{w=1}^W\bigotimes_{m=1}^M \mathcal{Y}_{m0}^N

531:         \longrightarrow \bigotimes_{m=1}^M \hat{\mathcal{U}}_m^N.

532:   \]

533: \item the block probability of error:

534:   \[ P_e^{(N)} = P(U_1^N...U_M^N\neq\hat{U}_1^N...\hat{U}_M^N). \]

535: \end{itemize}

536:

537: We say that blocks of snapshots $U_1^N...U_M^N$ can be

538: {\em reliably communicated} to $v_0$ if there exists a sequence of

539: codes as above, with $P_e^{(N)}\to 0$ as $N\to\infty$, for some finite

540: values $K$, $T$ and $W$, all independent of $N$.

541:

542: With these definitions, we are now ready to state our main theorem.

543:

544: \begin{theorem}

545: Let $S$ denote a non-empty subset of node indices that does not contain

546: node $0$: $S \subseteq \{0...M\}$, $S\neq\emptyset$, $0\in S^c$.  Then,

547: it is possible to communicate $U_1...U_M$ reliably to $v_0$ if and

548: only if, for all $S$ as above,

549: \begin{equation}

550:  H(U_S|U_{S^c}) \;\;<\;\; \sum_{i\in S,j\in S^c} C_{ij}.

551:   \label{eq:main}

552: \end{equation}

553: \label{thm:main}

554: \end{theorem}

555:

556: \subsection{Achievability Proof}

557:

558: Our coding strategy is based on separate source and channel coding.

559: We first use capacity attaining channel codes to turn the noisy network

560: into a network of noiseless links (of capacity $C_{ij}$).  Then, we

561: use Slepian-Wolf source codes, jointly with a custom designed routing

562: algorithm, to deliver all this data to destination.  Since the channel

563: coding aspects of the proof are rather straightforward extensions of

564: classical point-to-point arguments, in the following we only focus on

565: the less obvious source coding and routing aspects.

566:

567: \subsubsection{Mechanics of the Coding Strategy}

568:

569: Consider a ``noise-free'' version of the problem formulated above: we

570: still have a complete graph, now with {\em noiseless} links of capacity

571: $C_{ij}$.  Variables $U_i$ are still observed at each node $v_i$, and

572: the goal remains to reproduce all of these at $v_0$.  Each node uses

573: a classical Slepian-Wolf code: there is a source encoder at node $v_i$

574: that maps a sequence $U_i^N$ to an index from the random binning set

575: $\{1,2,\dots,2^{NR_i}\}$, thus compressing the block of observations

576: $U_i^N$ using codes as in~\cite[Thm.\ 14.4.2]{CoverT:91}.  Let

577: $(R_1...R_M)$ denote the rate allocation to each of the nodes.  To

578: achieve perfect reconstruction, these bits must be delivered to node

579: $v_0$.

580:

581: \begin{itemize}

582: \item Set $K=T=1$ -- each block of source symbols and each block of

583:   codewords participates in the encoding process only once.

584: \item To deliver the bin indices produced by the Slepian-Wolf codes

585:   to destination, the noise-free network is regarded as a flow

586:   network~\cite[Ch.\ 26]{CormenLRS:01}.

587:   Let $\flow(v_i,v_j)$ be a feasible flow in this network, with $M$ sources

588:   $v_1...v_M$, supply $R_i$ at source $v_i$, and a single sink $v_0$.

589:   If no such feasible flow exists, the code construction fails.

590: \item If there is a feasible flow $\flow$ then this $\flow$ uniquely

591:   determines, at each node $v_i$, the number of bits that need to be

592:   sent to each of its neighbors -- thus from $\flow$ we derive the

593:   encoding functions $g_{ij}$ as follows:

594:   \begin{itemize}

595:   \item Consider the directed {\em acyclic} graph $G'$ of $G$ induced by

596:     $\flow$, by taking $V(G') = V(G)$, and

597:     $E(G')=\{(v_i,v_j)\in E:\flow(v_i,v_j)>0\}$.  Define a permutation

598:     $\pi:\{0...M\}\to\{0...M\}$, such that

599:     $[v_{\pi(0)}v_{\pi(1)}...v_{\pi(M)}]$ is a {\em topological sort} of

600:     the nodes in $G$, as illustrated in Fig.~\ref{fig:topological-sort}.

601:     \begin{figure}[ht]

602:     \centerline{\psfig{file=topological-sort.eps,width=16cm,height=2.5cm}}

603:     \vspace{-5mm}

604:     \caption{\small A topological sort of the nodes of a directed acyclic

605:       graph is a linear ordering $v_1...v_M$ such that if $(v_i,v_j)$ is

606:       an edge, then $i<j$.}

607:     \label{fig:topological-sort}

608:     \end{figure}

609:   \item Consider a block of snapshots

610:     ${\bf U}(k)=[U_0^N(k)U_1^N(k)...U_M^N(k)]$ captured at time $kN$.

611:     At time $(k+l)N$ (for $l=0...M$), node $v_{\pi(l)}$ will have received

612:     all bits with portions of the encodings of ${\bf U}(k)$ generated by

613:     nodes upstream in the topological order -- thus, together with its own

614:     encoding of $U^N_{\pi(l)}(k)$, all the bits for ${\bf U}(k)$ up

615:     to and including node $v_{\pi(l)}$ will be available there, and thus

616:     can be routed to nodes downstream in the topological order.

617:   \item Consider now all edges of the form $(v_{\pi(k)},v')$ for which

618:     $\flow(v_{\pi(k)},v') > 0$:

619:     \begin{enumerate}

620:     \item Collect the $m=\sum_{v'} \flow(v',v_{\pi(k)})$ information bits

621:       sent by the upstream nodes $v'$.

622:     \item Consider now the set of all downstream nodes $v''$, for which

623:       $\flow(v_{\pi(k)},v'') > 0$.  Due to flow conservation for $\flow$,

624:       $\sum_{v''} \flow(v_{\pi(k)},v'')=m+R_{\pi(k)}$, where

625:       $R_{\pi(k)}$ is the rate allocated to node $v_{\pi(k)}$.

626:     \item For each $v''$ as above, define $g_{\pi(k)v''}^{(k)}$ to be a

627:       message such that $|g_{\pi(k)v''}^{(k)}| = \flow(v_{\pi(k)},v')$.

628:       Partition the $m+R_{\pi(k)}$ available bits according to the values

629:       of $\flow$, and send them downstream, as illustrated in

630:       Fig.~\ref{fig:shuffle}.

631:       \begin{figure}[!ht]

632:       \centerline{\psfig{file=shuffle.eps,height=2.5cm,width=12cm}}

633:       \vspace{-3mm}

634:       \caption{\small To illustrate the operations performed at each node.

635:         In this example, five bits come into node $v_{\pi(k)}$ from

636:         neighbouring nodes, two on the top link and three on the bottom

637:         link.  The information bits from other nodes come in the form of

638:         noisy codewords -- they need to be decoded from the received channel

639:         outputs.  Now, because flow conservation holds for $\flow$, we

640:         know that the aggregate capacity of the three output links will

641:         be at least five bits plus some local bits (the encoding of a

642:         block of local observations $U^N_{\pi(k)}$, denoted by $b_6$ and

643:         $b_7$ here).  So at this point we split those bits in a way such

644:         that the individual capacity constraints of the output links are

645:         not violated, and then they are sent on their way to $v_0$.}

646:       \label{fig:shuffle}

647:       \end{figure}

648:     \end{enumerate}

649:   \end{itemize}

650: \item To decode, at time $(k+M)N$, node $v_0$ does the following:

651:   \begin{itemize}

652:   \item Decode all channel outputs received at time $(k+M-1)N$, to recover

653:     the bits sent by each 1-hop neighbor of the sink.

654:   \item Reassemble the set of bin indices from the segments received

655:     from each neighbor.

656:   \item Perform typical set decoding (as in~\cite[pg.\ 411]{CoverT:91}),

657:     to recover the block of snapshot $[U_1^N(k)...U_M^N(k)]$.

658:   \end{itemize}

659: \end{itemize}

660: An important observation is that, in this setup, network coding (in the

661: sense of~\cite{AhlswedeCLY:00}) is not needed.  This is because we have

662: a case of $M$ sources and a single sink interested in collecting all

663: messages, a case for which it was shown in~\cite{LehmanL:04} that routing

664: alone suffices.

665:

666: Our next task is to find conditions under which this coding strategy

667: results in $P_e^{(N)}\to 0$ as $N\to\infty$.

668:

669: \subsubsection{Analysis of the Probability of Error}

670:

671: The coding strategy proposed above hinges on two main elements:

672: \begin{itemize}

673: \item Slepian-Wolf codes: in this case, we know that provided the rate

674:   vector $(R_1...R_M)$ is such that, for all partitions $S$ of $\{0...M\}$,

675:   $S\neq\emptyset$, $0\in S^c$,

676:   \begin{equation}

677:   \sum_{i\in S} R_i > H(U_S|U_{S^c}),

678:   \label{eq:achievability1}

679:   \end{equation}

680:   then there exist Slepian-Wolf codes with arbitrarily low probability of

681:   error~\cite[Ch.\ 14.4]{CoverT:91}.

682: \item Network flows: from elementary flow concepts we know that if a

683:   flow $\flow$ is feasible in a network $G$, then for all

684:   $S\subseteq\{0...M\}$, $S\neq\emptyset$, $0\in S^c$,

685:   \begin{eqnarray}

686:   \sum_{i\in S} R_i

687:      & \stackrel{(a)}{=} & \sum_{i\in S,j\in V} \flow(v_i,v_j) \nonumber \\

688:      & \stackrel{(b)}{=} & \sum_{i\in S,j\in S^c} \flow(v_i,v_j) \nonumber \\

689:      & \stackrel{(c)}{\leq} & \sum_{i\in S,j\in S^c} C_{ij},

690:      \label{eq:achievability2}

691:   \end{eqnarray}

692:   where $(a)$ and $(b)$ follow from the flow conservation properties of a

693:   feasible flow (all the flow injected by the sources has to go somewhere

694:   in the network, and in particular all of it has to go across a network

695:   cut with the destination on the other side); and $(c)$ follows from the

696:   fact that in any flow network, the capacity of any cut is an upper bound

697:   to the value of any flow.

698: \end{itemize}

699: Thus, from~(\ref{eq:achievability1}) and~(\ref{eq:achievability2}), we

700: conclude that if, for all partitions $S$ as above, we have that

701: \begin{equation}

702:   H(U_S|U_{S^c}) < \sum_{i\in S,j\in S^c} C_{ij},

703: \end{equation}

704: then $P_e^{(N)}\to 0$ as $N\to\infty$.

705:

706: \subsection{Converse Proof}

707:

708: The converse proof is fairly long and tedious, but by virtue of being

709: based on Fano's inequality and standard information-theoretic arguments,

710: it is relatively straightforward -- therefore, we omit it here and

711: provide the technical details in Appendix~\ref{app:proof-converse-mcoop}.

712: At this point however, we would like to sketch out an informal argument

713: on why this converse should hold.

714:

715: Consider an arbitrary network partition $S$ of $\{0...M\}$,

716: $S\neq\emptyset$, $0\in S^c$.  For each such partition we define a

717: two-terminal system, with a ``supersource'' that has access to the

718: whole vector of observations $U_1...U_M$, and a ``supersink'' that

719: has access only to $U_{S^c}$.  The supersource and supersink are

720: connected by an array of parallel DMCs: if $i\in S$ and $j\in S^c$,

721: then $(\mathcal{X}_{ij},p_{ij}(y|x),\mathcal{Y}_{ij})$ from the

722: network is one of the channels in the array.  This is illustrated

723: in Fig.~\ref{fig:oracle}.

724:

725: \begin{figure}[!ht]

726: \centerline{\psfig{file=oracle.eps,height=3cm,width=15cm}}

727: \caption{\small An artificial two-terminal system: all sources in $S$

728:   are treated as a supersource, connected to a supersink made of all

729:   the sinks in $S^c$ by an array of DMCs (those going across the cut).

730:   Intuitively, any necessary condition for this system should also be

731:   necessary for our system (although this requires a formal statement

732:   and proof).  The interesting statement thus is to show that the

733:   set of all conditions obtained in this form (by considering all

734:   possible cuts) is also sufficient.}

735: \label{fig:oracle}

736: \end{figure}

737:

738: Clearly, $H(U_S|U_{S^c}) < \sum_{i\in S,j\in S^c} C_{ij}$ is an outer

739: bound for this two-terminal system (follows directly from the source/channel

740: separation theorem,~\cite[Sec.\ 8.13]{CoverT:91}).  And intuitively,

741: it is also clear that any outer bound for this two-terminal system

742: provides necessary conditions for reliable communication to be possible

743: in our network.  Thus, by considering all possible partitions $(S,S^c)$

744: as above, we obtain a set of necessary conditions matching those of the

745: achievability result.\footnote{We thank our Reviewer B, for suggesting

746: this simple and very clear interpretation for the converse.}

747:

748: We would also like to highlight that, because of the correlation between

749: sources, a simple max-flow/min-cut bounding argument as suggested

750: in~\cite[Section 14.10]{CoverT:91}) is not sufficient to establish the

751: source-channel separation result we seek -- proving said result requires

752: all the steps of a typical converse.

753:

754: A formal proof for this converse is provided in

755: Appendix~\ref{app:proof-converse-mcoop}.

756:

757: \subsection{Special Cases}

758:

759: \subsubsection{A Network with Three Nodes}

760: \label{sec:three-nodes}

761:

762: To provide an illustration of the meaning of Theorem~\ref{thm:main}, and

763: of the optimality of the flow-based solution, we specialize

764: Theorem~\ref{thm:main} to the case of a network with three nodes.  In

765: this case, those conditions become:

766: \begin{eqnarray}

767: H(U_1|U_2U_0) & < & C_{10} + C_{12} \label{eq:3nodes-1} \\

768: H(U_2|U_1U_0) & < & C_{20} + C_{21} \label{eq:3nodes-2} \\

769: H(U_1U_2|U_0) & < & C_{10} + C_{20} \label{eq:3nodes-3}.

770: \end{eqnarray}

771: A network with three nodes as considered here is illustrated in

772: Fig.~\ref{fig:three-nodes}.

773:

774: \begin{figure}[!ht]

775: \centerline{\psfig{file=three-nodes.eps,height=2cm}}

776: \caption{\small A network with three nodes.}

777: \label{fig:three-nodes}

778: \end{figure}

779:

780: Next, we regard the network in Fig.~\ref{fig:three-nodes} as a

781: {\em flow} network~\cite[Ch.\ 26]{CormenLRS:01}: a flow network with

782: two sources ($v_1$ and $v_2$) and a single sink ($v_0$).  Encodings

783: of $U_1$ injected at source $v_1$ at rate $R_1$, and of $U_2$ injected

784: at $v_2$ at rate $R_2$, are the ``objects'' that flow in this network

785: and are to be delivered to the sink $v_0$.  This is illustrated in

786: Fig.~\ref{fig:three-nodes-flownetwork}.

787: \begin{figure}[ht]

788: \centerline{\psfig{file=three-nodes-flownetwork.eps,height=1.8cm}}

789: \caption{\small A flow network with three nodes, supplies $R_1$ and

790:   $R_2$ and nodes $v_1$ and $v_2$, and a sink $v_0$.}

791: \label{fig:three-nodes-flownetwork}

792: \end{figure}

793:

794: In the simple flow network of Fig.~\ref{fig:three-nodes-flownetwork},

795: any feasible flow $\flow$ must satisfy some {\em conservation} equations:

796: \[\begin{array}{rclcl}

797:   R_1 & = & \flow(v_1,v_0)+\flow(v_1,v_2), \\

798:   R_2 & = & \flow(v_2,v_0)+\flow(v_2,v_1), \\

799:   R_1+R_2 & = & \flow(v_1,v_0)+\flow(v_1,v_2)+\flow(v_2,v_0)+\flow(v_2,v_1)

800:           & = & \flow(v_1,v_0)+\flow(v_2,v_0),

801: \end{array}\]

802: where the last equality follows from the fact that flow conservation

803: holds: the total amount of flow injected ($R_1+R_2$) must equal the total

804: amount of flow received by the sink

805: ($\flow(v_1,v_0)+\flow(v_2,v_0)$)~\cite{CormenLRS:01}.  Similarly, any

806: feasible flow must also satisfy all {\em capacity} constraints:

807: \[\begin{array}{rcl}

808:   \flow(v_1,v_0)+\flow(v_1,v_2) & \leq & C_{10}+C_{12}, \\

809:   \flow(v_2,v_0)+\flow(v_2,v_1) & \leq & C_{20}+C_{21}, \\

810:   \flow(v_1,v_0)+\flow(v_2,v_0) & \leq & C_{10}+C_{20}.

811: \end{array}\]

812: Combining these last two sets of constraints, and the conditions from

813: the Slepian-Wolf theorem on feasible $(R_1,R_2)$ pairs, we immediately

814: get

815: \[\begin{array}{rcccl}

816:   H(U_1|U_2U_0) & < & R_1 & \leq & C_{10}+C_{12}, \\

817:   H(U_2|U_1U_0) & < & R_2 & \leq & C_{20}+C_{21}, \\

818:   H(U_1U_2|U_0) & < & R_1+R_2 & \leq & C_{10}+C_{20}.

819: \end{array}\]

820:

821: It is interesting to observe in this argument that the region of

822: achievable rates forms a convex polytope, in which three of its

823: faces come from the Slepian-Wolf conditions, and three come from

824: the capacity constraints.  This polytope is illustrated in

825: Fig.~\ref{fig:polytope}.

826: \begin{figure}[ht]

827: \centerline{\psfig{file=polytope.eps,width=12cm,height=4cm}}

828: \caption{The polytope $\mathcal{R}$ of admissible rates.}

829: \label{fig:polytope}

830: \end{figure}

831: This polytope plays a central role in our analysis: reliable

832: communication is possible {\em if and only if} $\mathcal{R}\neq\emptyset$.

833: Thus, the view of ``information as a flow'' in this class of networks

834: is complete.

835:

836: \subsubsection{No Cooperation and No Side Information at $v_0$}

837:

838: We consider now the special case of $M$ {\em non-}cooperating nodes and

839: one sink, as illustrated in Fig.~\ref{fig:m-nodes}.

840: Necessary and sufficient conditions for reliable communication

841: under this scenario follow naturally from our main theorem by setting

842: $C_{ij}=0$ for all $j\neq 0$, and $|\mathcal{U}_0 | = 1$.

843:

844: \begin{figure}[!ht]

845: \centerline{\psfig{file=m-nodes.eps,width=12cm,height=2.5cm}}

846: \caption{\small M non-cooperating nodes.}

847: \label{fig:m-nodes}

848: \end{figure}

849:

850: \begin{corollary}

851: \label{cor:indep}

852: The sources $U_1, U_2,\dots, U_M$ can be communicated reliably over an

853: array of independent channels of capacity $C_{i0}$, $i=1\dots M$, if and

854: only if

855: \[

856:   H(U_S|U_{S^c})<\sum_{i\in S}C_{i0},

857: \]

858: for all subsets $S\subseteq\{1,2,\dots,M\}$, $S\neq\emptyset$.

859: \end{corollary}

860:

861: An illustration of this corollary for two sources $U_1$ and $U_2$ is

862: shown in Fig.~\ref{fig:SWMAC1}.

863: \begin{figure}[ht]

864: \centerline{\psfig{width=8cm,height=5cm,file=swmac1.eps}

865:             \psfig{width=8cm,height=5cm,file=swmac2.eps}}

866: \caption[Relationship between the Slepian-Wolf region and the capacity

867:   region for two independent channels.]

868:   {Relationship between the Slepian-Wolf region and the capacity

869:   region for two independent channels. In the left figure, as

870:   $H(U_1|U_2)<C_{10}$ and $H(U_2|U_1)<C_{20}$ the two regions intersect

871:   and therefore reliable communication is possible. The figure on the

872:   right shows the case in which $H(U_2|U_1)>C_{20}$ and there is no

873:   intersection between the two regions.}

874: \label{fig:SWMAC1}

875: \end{figure}

876: When we have two independent channels with capacities $C_{10}$ and

877: $C_{20}$, the capacity region becomes a rectangle with side lengths

878: $C_{10}$ and $C_{20}$~\cite[Chapter~14.3]{CoverT:91}.

879: Also shown is the Slepian-Wolf region of achievable rates for separate

880: encoding of correlated sources.

881: Clearly, $H(U_1U_2)<C_{10}+C_{20}$ is a necessary

882: condition for reliable communication as a consequence of Shannon's joint

883: source and channel coding theorem for point-to-point communication.

884: Assuming that this is the case, consider now the following possibilities:

885: \begin{itemize}

886: \item $H(U_1)<C_{10}$ and $H(U_2)<C_{20}$.  The Slepian-Wolf region and the

887:   capacity region

888:   intersect, so any point $(R_1,R_2)$ in this intersection makes reliable

889:   communication possible.  Alternatively, we can argue that reliable

890:   transmission of $U_1$ and $U_2$ is possible even with independent decoders,

891:   therefore a joint decoder will also achieve an error-free reconstruction

892:   of the source.

893: \item $H(U_1)>C_{10}$ and $H(U_2)>C_{20}$.  Since $H(U_1U_2)<C_{10}+C_{20}$

894:   there is always at least one point of intersection between the Slepian-Wolf

895:   region and the capacity region, so reliable communication is possible.

896: \item $H(U_1)<C_{10}$ and $H(U_2)>C_{20}$ (or vice versa).  If

897:   $H(U_2|U_1)<C_{20}$ (or if $H(U_1|U_2)<C_{10}$) then the two regions will

898:   intersect.  On the other hand, if $H(U_2|U_1)>C_{20}$ (or if

899:   $H(U_1|U_2)>C_{10}$), then there are no intersection

900:   points, but it is not immediately clear whether reliable communication

901:   is possible or not (see Fig. \ref{fig:SWMAC1}), since examples are known

902:   in which the intersection between the capacity region of the multiple

903:   access channel and the Slepian-Wolf region of the correlated sources

904:   is empty and still reliable communication is possible~\cite{CoverGS:80}.

905: \end{itemize}

906: Corollary~\ref{cor:indep} gives a definite answer to this last question:

907: in the special case of correlated sources and independent channels an

908: intersection between the capacity region and the Slepian-Wolf rate regions

909: is not only sufficient, but also a necessary condition for reliable

910: communication to be possible---in this case, separation holds.

911:

912: \subsubsection{Arrays of Gaussian Channels}

913:

914: We should also mention that Theorem~\ref{thm:main} applies to other

915: channel models that are relevant in practice, for instance Gaussian channels

916: with orthogonal multiple access.  For simplicity, we illustrate

917: this issue  in the context of

918: Corollary~\ref{cor:indep}. The capacity of the Gaussian

919: multiple access channel with $M$ independent sources is given by

920: \[

921:   \sum_{i\in S} R_i

922:     \leq \frac{1}{2}\log\left(1+\frac{\sum_{i\in S}P_i}{\sigma^2}\right),

923: \]

924: for all $S\subseteq\{1...M\}$, $S\neq\emptyset$, and where $\sigma^2$ and

925: $P_i$ are the noise power and the power of the $i$-th user

926: respectively~\cite[pp.\ 378-379]{CoverT:91}.  If we use orthogonal accessing

927: (e.g.~TDMA), and assign different time slots to each of the transmitters,

928: then the Gaussian multiple access channel is reduced to an array of $M$

929: independent single-user Gaussian channels each with capacity

930: \[

931:   C_{i0} =

932:   \tau_{i0}\cdot\frac{1}{2}\log\bigg(1+\frac{P_{i0}}{\sigma^2\tau_{i0}}\bigg),

933:   \qquad 1\le i\le M,

934: \]

935: where $\tau_{i0}$ is the time fraction allocated to source user $i$ to

936: communicate with the data collector node $v_0$, and $P_{i0}$ is the

937: corresponding power allocation.

938:

939: Applying Theorem~\ref{thm:main}, we obtain the reachback capacity

940: of the Gaussian channel with orthogonal accessing.\footnote{The

941: generalization of Theorem~\ref{thm:main} for channels with real-valued

942: output alphabets can be easily obtained using the techniques

943: in~\cite[Sec.\ 9.2 \& Ch.\ 10]{CoverT:91}.}  Then, reliable

944: communication is possible if and only if

945: \[

946:    H(U_S|U_{S^c}) \leq

947:      \sum_{i\in S}\frac{\tau_{i0}}{2}

948:                   \log\bigg(1+\frac{P_{i0}}{\sigma^2\tau_{i0}}\bigg),

949: \]

950: for all subsets $S\subseteq\{1,2,\dots,M\}$, $S\neq\emptyset$.

951:

952:

953: \section{Practical/Engineering Implications of Theorem~\ref{thm:main}}

954: \label{sec:protocol-stack}

955:

956: \subsection{An Information Theoretically Optimal Protocol Stack}

957:

958: We believe that the fact that in networks of point-to-point noisy

959: links with one sink

960:  Shannon information has the exact same properties of classical

961: network flows is of particular {\em practical} relevance.  This is

962: so because there is a rich {\em algorithmic} theory associated with

963: it, which allows us to cast standard information theoretic problems

964: into the language of flows and optimization.  Perhaps most relevant

965: among these is is the optimality of implementing codes using a

966: {\em layered} protocol stack, as illustrated in

967: Fig.~\ref{fig:layers-3nodes}.

968:

969: \begin{figure}[!ht]

970: \centerline{\psfig{file=layers-3nodes.eps,width=15cm,height=12cm}}

971: \caption{\small Abstractions that follow from the achievability proof,

972:   illustrated here for three nodes.  At the physical layer there are

973:   nodes with power constraints, a data field of which these nodes collect

974:   samples in space and time, and a gateway node that will deliver all

975:   this data to destination.  On top of this physical substrate, we

976:   construct a sequence of abstractions: noiseless point-to-point links

977:   of a given capacity (the {\em Link Layer}); a flow network (the

978:   {\em Network Layer}); a set of connections (the {\em Transport Layer});

979:   and a set of distributed signal processing algorithms for sampling,

980:   compression and interpolation of the space/time continuous process

981:   (the {\em Presentation Layer}).  In the end, an approximate

982:   representation of the underlying data field is delivered to

983:   applications.}

984: \label{fig:layers-3nodes}

985: \end{figure}

986:

987: As discussed in the Introduction, the decision to turn a wireless

988: network into a network of point-to-point links is an arbitrary one.

989: But, due to complexity and/or economic considerations, this arbitrary

990: decision is one made very often, and thus we believe it is of great

991: practical interest to understand what are appropriate design criteria

992: for such networks.  And our Theorem~\ref{thm:main} offers valuable

993: insights in this regard -- {\em if} we decide to define a link-layer

994: based on a MAC protocol that deals with interference by suppressing

995: it, {\em then all remaining layers in Fig.~\ref{fig:layers-3nodes}

996: follow from the achievability proof of Theorem~\ref{thm:main}.}  We

997: see therefore that indeed, in this class of networks,

998: Fig.~\ref{fig:layers-3nodes} provides a set of abstractions analogous

999: to those of Fig.~\ref{fig:shannon-pt2pt} for classical two-terminal

1000: systems.

1001:

1002: \subsection{Algorithmic/Computational Issues}

1003: \label{sec:algorithmic-issues}

1004:

1005: As an illustration of the benefits of the ``information as flow''

1006: interpretation for our results, in this subsection we outline some

1007: initial results on an optimal routing problem.  This topic however

1008: will be developed in full depth elsewhere.

1009:

1010: \subsubsection{Optimization Aspects of Protocol Design}

1011:

1012: A natural question that follows from our previous developments is

1013: one of {\em optimization}: given a non-empty feasibility polytope

1014: $\mathcal{R}$, we have the freedom of choosing among multiple

1015: assignments of values to flow variables, and thus it is only natural

1016: to ask if there is an optimal flow.  To this end, we define a cost

1017: function $\kappa$ as follows:

1018: \[

1019:   \kappa(\flow) = \sum_{(v_i,v_j)\in E} c(v_i,v_j)\cdot\flow(v_i,v_j),

1020: \]

1021: where $c(v_i,v_j)$ is a constant that, multiplied by the total number

1022: of bits $\flow(v_i,v_j)$ that a flow $\flow$ assigns to an edge

1023: $(v_i,v_j)$, determines the cost of sending all that information over

1024: the channel $(\mathcal{X}_{ij},p_{ij}(y|x),\mathcal{Y}_{ij})$.  The

1025: resulting optimization problem is shown in Fig.~\ref{fig:lp-optrouting}.

1026:

1027: \begin{figure}[ht]

1028: \begin{center}

1029: \fbox{\begin{minipage}{11.7cm}

1030: min \hspace{5mm} $\sum_{(v_i,v_j)\in E}\;\;c(v_i,v_j)\cdot\flow(v_i,v_j)$ \\

1031: $\;$ subject to: \vspace{-1mm}

1032: \[\begin{array}{lll}

1033:  & \mbox{\tiny\sl Standard flow constraints (capacity / skew symmetry / flow conservation)} \\

1034:  & \flow(v_i,v_j) \leq C_{ij}, & 0\leq i,j\leq M. \\

1035:  & \flow(v_i,v_j) = -\flow(v_j,v_i), & 0\leq i,j\leq M. \\

1036:  & \sum_{v\in V} \flow(v_i,v) = 0, & 1\leq i\leq M. \\

1037:  & \mbox{\tiny\sl Rate admissibility constraints} \\

1038:  & H(U_S|U_{S^c}) < \sum_{i\in S} \flow(s,v_i)

1039:                   \leq \sum_{i\in S,j\in S^c} C_{ij},

1040:    & S\subseteq\{1...M\}, S\neq\emptyset. \\

1041:  & \flow(s,v_i) = R_i, & 1\leq i\leq M.

1042: \end{array}\]

1043: \end{minipage}}\end{center}

1044: \vspace{-1mm}

1045: \caption{\small Linear programming formulation for the assignment of

1046:   values to flow variables (observe the introduction of a ``supersource''

1047:   $s$, which supplies $R_i$ units of flow to $v_i$).  A solution to this

1048:   problem provides optimal routes (those with positive flow assignment)

1049:   and loads on each link.  Note as well that, by choosing $c(v_i,v_j)=0$

1050:   for all $(v_i,v_j)\in E$, this LP is solvable if and only if

1051:   $\mathcal{R}\neq\emptyset$ -- that is, the decision problem for reliable

1052:   communication (i.e., for whether a given load $p(U_0U_1...U_M)$ can be

1053:   carried over a given network $G$) admits a linear programming formulation

1054:   too.}

1055: \label{fig:lp-optrouting}

1056: \end{figure}

1057:

1058: The choice of a linear cost model in this setup can be justified based

1059: on a number of reasons.  First of all, linearity is a very natural

1060: assumption: in simple language, it says that it costs twice as much to

1061: double the amount of information sent on any channel.  For example, we

1062: could take $c(v_i,v_j)$ to be the {\em minimum energy per information

1063: bit} required for reliable communication over the DMC from $v_i$ to

1064: $v_j$~\cite{Verdu:02}, and then $\kappa(\flow)$ would give us the sum of

1065: the energy consumed by all nodes when transporting data as dictated

1066: by a particular flow $\flow$.  Specifically in the context of routing

1067: problems, another important consideration is that the main drawback

1068: often cited for solving optimal routing problems based on network flow

1069: formulations is given by the fact that cost functions such as $\kappa$

1070: only optimize {\em average} levels of link traffic, ignoring other

1071: traffic statistics~\cite[pg.\ 436]{BertsekasG:92}.  But this is not

1072: at all an issue here, since the values of flow variables (i.e.,

1073: Shannon information) are already average quantities themselves.

1074:

1075: \subsubsection{A Routing Example}

1076:

1077: As one example of the usefulness of the LP formulation in

1078: Fig.~\ref{fig:lp-optrouting}, we consider next the problem of designing

1079: efficient mechanisms for data aggregation, as motivated

1080: in~\cite{IntanagonwiwatGEHS:03}.  There has been a fair amount of work

1081: reported in the networking literature, on the design and performance

1082: analysis of {\em tree} structures for aggregation---for example, the

1083: work of Goel and Estrin on the construction of trees that perform well

1084: simultaneously under multiple concave costs~\cite{GoelE:03}.  Based on

1085: our LP formulation, we construct two examples which show the extent to

1086: which trees could give rise to suboptimalities, as opposed to other

1087: topological structures.  And we start by showing an example in which,

1088: although $\mathcal{R}\neq\emptyset$, there are no feasible trees.  This

1089: case is illustrated in Fig.~\ref{fig:trees-stink-1}.

1090:

1091: \begin{figure}[ht]

1092: \centerline{\psfig{file=trees1.eps,width=5cm,height=3cm}

1093:             \psfig{file=trees1a.eps,width=5cm,height=3cm}

1094:             \psfig{file=trees1b.eps,width=5cm,height=3cm}}

1095: \vspace{-2mm}

1096: \caption{To illustrate a solvable problem that cannot be solved using trees.

1097:   Left: a flow network; middle/right: the decomposition of a feasible

1098:   flow into two single flows, showing how much of the flow injected at each

1099:   source is sent over which link ($x/c$ next to an edge means that the

1100:   edge carries $x$ units of flow, and has capacity $c$).}

1101: \label{fig:trees-stink-1}

1102: \end{figure}

1103:

1104: As illustrated in Fig.~\ref{fig:trees-stink-1}, a solution to the

1105: transport problem exists.  However, it is easy to check that if we

1106: constrain data to flow along trees, none of the three possible trees

1107: ($\{(v_1,v_0);(v_2,v_0)\}$, or $\{(v_1,v_2);(v_2,v_0)\}$, or

1108: $\{(v_2,v_1);(v_1,v_0)\}$) are feasible: in all cases, there is one

1109: link for which the capacity constraint is violated.

1110:

1111: Next we consider a case where feasible trees exist, but the lowest

1112: cost of any tree differs from the optimal cost by an arbitrarily large

1113: factor.  This case is illustrated in Fig.~\ref{fig:trees-stink-2}.

1114:

1115: \begin{figure}[ht]

1116: \centerline{\psfig{file=trees2.eps,width=5cm,height=3cm}

1117:             \psfig{file=trees2a.eps,width=5cm,height=3cm}

1118:             \psfig{file=trees2b.eps,width=5cm,height=3cm}}

1119: \vspace{-2mm}

1120: \caption{To illustrate a problem in which trees are very expensive.

1121:   Left: a flow network with costs; right: an optimal solution to the

1122:   linear program in Fig.~\ref{fig:lp-optrouting}.  Such a case could

1123:   arise, e.g., in a situation where there is heavy interference in

1124:   the direct path from $v_1$ to $v_0$.}

1125: \label{fig:trees-stink-2}

1126: \end{figure}

1127:

1128: In this case, there exists only one feasible tree:

1129: $\{(v_1,v_0);(v_2,v_0)\}$, with cost $\ell(1+\epsilon)+1$.  However,

1130: because of the ``expensive'' link $(v_1,v_0)$ along which the tree is

1131: forced to send all its data, the cost is significantly increased:

1132: by splitting the encoding of $U_1$

1133: as illustrated in Fig.~\ref{fig:trees-stink-2}, the cost incurred into

1134: by this structure would be $\epsilon\ell+3$.  Hence, we see that in

1135: this case, the cost of the best feasible tree is

1136: $\frac{\ell(1+\epsilon)+1}{\epsilon\ell+3}$ times larger than that

1137: of an optimal solution allowing splits.  And this

1138: ``overpayment factor'' could be significant: when $\ell$ is large,

1139: this is $\approx 1+\frac 1 \epsilon$, and it grows unbound for

1140: small $\epsilon$.

1141:

1142: Note as well that any time that a network is operated close to capacity,

1143: it will be necessary to split flows.  And that is a situation likely to

1144: be encountered often in power-constrained networks, since minimum energy

1145: designs will necessarily result in links being allocated the least amount

1146: of power needed to carry a given traffic load.  Thus, we see that these

1147: examples above are {\em not} pathological cases of limited practical

1148: interest, but instead, they are good representatives of situations likely

1149: to be encountered often in practice.

1150:

1151: \subsection{Suboptimality of Correlated Codes for Orthogonal Channels}

1152:

1153: The key ingredient of the achievability proof presented by Cover,

1154: El Gamal and Salehi for the multiple access channel with correlated

1155: sources is the generation of random codes, whose codewords $X_i^N$ are

1156: statistically dependent on the source sequences $U_i^N$~\cite{CoverGS:80}.

1157: This property, which is achieved by drawing the codewords according to

1158: $\prod_{j=1}^{N}p(x_{ij}|u_{ij})$ with $u_{ij}$ and $x_{ij}$ denoting

1159: the $j$-th element of $U_i^N$ and $X_i^N$, respectively, implies that

1160: $U_i^N$ and $X_i^N$ are jointly typical with high probability.  Since

1161: the source sequences $U_1^N$ and $U_2^N$ are correlated, the codewords

1162: $X_1^N(U_1^N)$ and $X_2^N(U_2^N)$ are also correlated, and so we speak

1163: of {\it correlated codes}.  This class of random codes, which is treated

1164: in more general terms in~\cite{AhlswedeH:83}, can be viewed as joint

1165: source and channel codes that preserve the given correlation structure

1166: of the source sequences, based upon which the decoder can lower the

1167: probability of error.

1168:

1169: The class of correlated codes is of interest to us because of two

1170: main reasons:

1171: \begin{itemize}

1172: \item From a practical point of view, correlated codes have a very

1173:   strong appeal: sensor nodes with limited processing capabilities may

1174:   be forced to use very simple codes that do not eliminate correlations

1175:   between measurements prior to transmission~\cite{BarrosTL:03} (e.g.,

1176:   a simple scalar quantizer and simple BPSK modulation).

1177: \item From a theoretical point of view, since these codes yield the

1178:   largest known admissibility region for the problem of communicating

1179:   distributed sources over multiple-access channels, it would be interesting

1180:   to know how these codes fare in our context, where we know separate

1181:   source and channel coding to achieve optimality.

1182: \end{itemize}

1183: Thus, specializing the achievability proof of~\cite{CoverGS:80} to the

1184: case of $M$ independent channels, we get the following result.

1185:

1186: \begin{corollary}[From Theorem 1 of~\cite{CoverGS:80}]

1187: \label{cor:Machievable}

1188: A set of correlated sources $[U_1U_2...U_M]$ can be communicated

1189: reliably over independent channels

1190: $(\mathcal{X}_1,p(y_1|x_1),\mathcal{Y}_1)\dots

1191: (\mathcal{X}_M,p(y_M|x_M),\mathcal{Y}_M)$ to a sink $v_0$, if

1192: \[

1193:    H(U_S|U_{S^c})<\sum_{i\in S} I(X_i;Y_0|U_{S^c}),

1194: \]

1195: for all subsets $S\subseteq\{1,2,\dots,M\}$, $S\neq\emptyset$.

1196: \end{corollary}

1197: \begin{proof}

1198: This result can be obtained from the $M$-source version of the main theorem

1199: in ~\cite{CoverGS:80}, by specializing it to a multiple access channel with

1200: conditional probability distribution

1201: \[ p(y|x_1x_2...x_M)

1202:      = p(y_1y_2\dots y_M|x_1x_2\dots x_M) = \prod_{i=1}^Mp(y_i|x_i).

1203: \]

1204: \end{proof}

1205:

1206: Part of the reason why we feel this is an interesting result is that the

1207: main theorem in~\cite{CoverGS:80} does {\em not} immediately specialize

1208: to Corollary~\ref{cor:indep}: whereas the achievability results do

1209: coincide,~\cite{CoverGS:80} does not provide a converse.  To illustrate

1210: this point better, we focus now on the case of $M=2$:

1211: \begin{itemize}

1212: \item In general, we have that

1213:   $I(X_1X_2;Y_1Y_2) \leq I(X_1;Y_1)+I(X_2;Y_2)$, for any

1214:   $p(u_1u_2x_1x_2)p(y_1|x_1)p(y_2|x_2)$; but for this upper bound on

1215:   the sum-rate to be achieved, we must take

1216:   $p(u_1u_2x_1x_2) = p(u_1u_2)p(x_1)p(x_2)$ -- that is, the codewords

1217:   must be drawn independently of the source.  And for this special case,

1218:   our Theorem~\ref{thm:main} does provide a converse.

1219: \item As argued earlier, due to practical considerations it may not be

1220:   feasible to remove correlations in the source before choosing channel

1221:   codewords, in which case we face a situation where correlated codes

1222:   are used, despite their obvious suboptimality.  In this case, it is

1223:   of interest to determine the rate losses resulting from the use of

1224:   correlated codes, defined as $\Delta_1 = I(X_1;Y_1)-I(X_1;Y_1|U_2)$,

1225:   $\Delta_2 = I(X_2;Y_2)-I(X_2;Y_2|U_1)$, and

1226:   $\Delta_0 = I(X_1;Y_1)+I(X_2;Y_2)-I(X_1X_2;Y_1Y_2)$.  Straightforward

1227:   manipulations show that $\Delta_1 = I(Y_1;U_2)$, $\Delta_2 = I(Y_2;U_1)$,

1228:   and $\Delta_0 = I(Y_1;Y_2)$.

1229: \item Since $\Delta_i\geq 0$, $i\in\{0,1,2\}$ (mutual information is

1230:   always nonnegative), we conclude that the region of achievable rates

1231:   given by Corollary~\ref{cor:Machievable} is contained in the region

1232:   defined by Corollary~\ref{cor:indep}.  Furthermore, we find that the

1233:   rate loss terms have a simple, intuitive interpretation: $\Delta_0$

1234:   is the loss in sum rate due to the dependencies between the outputs

1235:   of different channels, and $\Delta_1$ (or $\Delta_2$) represent the

1236:   rate loss due to the dependencies between the outputs of channel $1$

1237:   (or $2$) and the source transmitted over channel $2$ (or $1$).  All

1238:   these terms become zero if, instead of using correlated codes, we fix

1239:   $p(x_1)p(x_2)$ and remove the correlation between the source blocks

1240:   before transmission over the channels.

1241: \end{itemize}

1242: At first glance, this observation may seem somewhat surprising, since

1243: the problem addressed by Corollary~\ref{cor:indep} is a special case

1244: of the multiple access channel with correlated sources considered

1245: in~\cite{CoverGS:80}, where it is shown that in the general case

1246: correlated codes outperform the concatenation of Slepian-Wolf codes

1247: (independent codewords) and optimal channel codes.  The crucial

1248: difference between the two problems is the presence (or absence)

1249: of interference in the channel.  Albeit somewhat informally, we can

1250: state that correlated codes are advantageous when the transmitted

1251: codewords are combined in the channel through interference, which

1252: is obviously not the case in our problem.  Practical code constructions

1253: built around this observation have been reported in~\cite{BarrosTL:03}.

1254:

1255:

1256: \section{Conclusions}

1257: \label{sec:conclusions}

1258:

1259: \subsection{Summary}

1260:

1261: In this paper we have considered the problem of encoding a set of

1262: distributed correlated sources for delivery to a single data collector

1263: node over a network of DMCs.  For this setup we were able to obtain

1264: single-letter information theoretic conditions that provide an exact

1265: characterization of the admissibility problem.  Two important conclusions

1266: follow from the achievability proof:

1267: \begin{itemize}

1268: \item Separate source/channel coding is optimal in any network with one

1269:   sink in which interference is dealt with at the MAC layer by creating

1270:   independent links among nodes.

1271: \item In such networks, the properties of Shannon information are

1272:   exactly identical to those of water in pipes -- information is a

1273:   flow.

1274: \end{itemize}

1275:

1276: \subsection{Discussion}

1277:

1278: A few interesting observations follow from our results:

1279:

1280: \begin{itemize}

1281: \item It is a well known fact that turning a multiple access channel

1282:   into an array of orthogonal channels by using a suitable MAC protocol

1283:   is a suboptimal strategy in general, in the sense that the set of

1284:   rates that are achievable with orthogonal access is strictly contained

1285:   in the Ahlswede-Liao capacity region~\cite[Ch.\ 14.3]{CoverT:91}.

1286:   However, despite its inherent suboptimality, there are strong economic

1287:   incentives for the deployment of networks based on such technologies,

1288:   related to the low complexity and cost of existing solutions, as well

1289:   as experience in the fabrication and operation of such systems.  As

1290:   a result, most existing standard implementations we are aware of

1291:   (e.g., the IEEE 802.11 and 802.15.* families, or Bluetooth), are

1292:   based on variants of protocols like TDMA/FDMA/CDMA or Aloha, that

1293:   treat interference among users as noise or collisions, and deal with

1294:   it by creating orthogonal links.  We feel therefore that some of the

1295:   interest in our results stems from the fact that they provide a thorough

1296:   analysis for what we deem to be, with high likelihood, the vast majority

1297:   of wireless communication networks to be deployed for the foreseeable

1298:   future.

1299: \item A basic question follows from the results in this paper: when

1300:   exactly does Shannon information act like a classical flow in a network

1301:   setup?  In this paper, we showed that far more often than common wisdom

1302:   would suggest:

1303:   for {\em any} network made up of independent links and one sink,

1304:   Shannon information is a flow.  The assumption of independence among

1305:   channels is crucial, since well known counterexamples hold without

1306:   it~\cite{CoverGS:80}.  But, as argued before, far from being just some

1307:   technical assumption needed for the theory to hold, independent channels

1308:   arise naturally in practical applications.  In establishing the flow

1309:   properties of information, we showed how some well understood network

1310:   flow tools can be applied to address network design problems that

1311:   have traditionally been difficult to deal with using standard tools

1312:   in network information theory, and we illustrated this with a simple

1313:   example involving optimal routing.  In particular we showed that, at

1314:   least from an information theoretic point of view, there is little

1315:   justification for the common practice of designing {\em trees} for

1316:   collecting data picked up by a sensor network, thus opening up

1317:   interesting problems of protocol design.

1318: \item In retrospect, perhaps the results we prove in this paper should

1319:   not have been surprising.  In the context of two-terminal networks, we

1320:   do know the following:

1321:   \begin{itemize}

1322:   \item Feedback does not increase the capacity.  Therefore, the capacity

1323:     of individual links is unaffected by the ability of our codes to

1324:     establish a conference mechanism among nodes.

1325:   \item Compression rates are not reduced by explicit cooperation, as it

1326:     follows from the Slepian-Wolf theorem: the minimum rate required to

1327:     communicate $U_1$ to a decoder that has access to side-information

1328:     $U_0$ is $H(U_1|U_0)$, and knowledge of $U_0$ does not reduce the

1329:     rates needed for coding $U_1$.  Therefore, the amount of information

1330:     that needs to flow through our network is not reduced either by the

1331:     ability of nodes to establish conferences.

1332:   \end{itemize}

1333:   Of course the statements above only hold for individual links, and a

1334:   proof was needed to carry that intuition to the general network setup

1335:   considered in this work.  But those observations we think are the

1336:   key to understanding why our results hold.

1337: \end{itemize}

1338:

1339: \subsection{Future Work}

1340:

1341: After having established coding theorems for the problem of network

1342: information flow with correlated sources, a natural question that arises:

1343: what if, in a given scenario, $\mathcal{R}=\emptyset$?  In that case,

1344: the best we can hope for is to reconstruct an {\em approximation} to

1345: the original source message --- and the answer is given by rate-distortion

1346: theory~\cite{Berger:71}.  The rate-distortion formulation of our

1347: problem in the case of non-cooperating encoders is equivalent to the

1348: well known (and still open) {\em Multiterminal Source Coding}

1349: problem~\cite{Berger:78}.  Our current efforts are focused on completing

1350: work on the rate/distortion problem, and on fully developing the ideas

1351: outlined in Section~\ref{sec:algorithmic-issues} (e.g., to deal with

1352: problems of the type considered in~\cite{Chiang:05}).

1353:

1354: \section*{Acknowledgements}

1355:

1356: The authors most gratefully acknowledge discussions with Neri Merhav,

1357: whose insightful comments on an earlier version of this manuscript led

1358: to substantial improvements, as well as the valuable feedback from all

1359: reviewers (and particularly from reviewer B).  They also wish to thank

1360: Toby Berger and Te Sun Han for helpful discussions, and Joachim Hagenauer

1361: for financial support without which they would have not been able to

1362: work together.  The second author is also grateful to Mung Chiang, Eric

1363: Friedman, \'Eva Tardos and Sergio Verd\'u, for useful discussions and

1364: feedback on this work.

1365:

1366:

1367: \appendix

1368:

1369: \subsection{Converse Proof for Theorem~\ref{thm:main}}

1370: \label{app:proof-converse-mcoop}

1371:

1372: \subsubsection{Preliminaries}

1373:

1374: Assume there exists a sequence of codes such that the decoder at $v_0$

1375: is capable of producing a perfect reconstruction of blocks of $N$ snapshots

1376: ${\bf U} = [U_0^NU_1^N...U_M^N]$, with $P_e^{(N)}\to 0$ as $N\to\infty$.

1377: Consider now decoding $L$ blocks of $N$ snapshots (indexed by $l=0...L-1$):

1378: \begin{itemize}

1379: \item The $1$-st block of snapshots ($l=0$) is computed based on

1380:   messages $Y_{i0}^N$ received by $v_0$ from all nodes $v_i$ at

1381:   times $kN$ ($k=0\,...\,W\!-\!1$).

1382: \item The $2$-nd block of snapshots ($l=1$) is computed based on

1383:   messages $Y_{i0}^N$ received by $v_0$ from all nodes $v_i$ at

1384:   times $kN$ ($k=1\,...\,W$).

1385: \item[] $\vdots$

1386: \item The $L$-th block of snapshots ($l=L-1$) is computed based on

1387:   messages $Y_{i0}^N$ received by $v_0$ from all nodes $v_i$ at

1388:   times $kN$ ($k=L\!-\!1\,...\,W\!+\!(L\!-\!2)$).

1389: \end{itemize}

1390: Thus, we regard the network as a {\em pipeline}, in which ``packets''

1391: (i.e., blocks of $N$ source symbols injected by each source) take

1392: $NW$ units of time to flow, and each source gets to inject $L$ packets

1393: total.  We are interested in the behavior of this pipeline in the

1394: regime of large $L$.

1395:

1396: For any fixed $L$, the probability of {\em at least one} of the $L$ blocks

1397: being decoded in error is $P_e^{(LN)} = 1-(1-P_e^{(N)})^L$.  Thus, from the

1398: existence of a code with low {\em block} probability of error we

1399: can infer the existence of codes for which the probability of error

1400: for the entire pipeline is low as well, by considering a large enough

1401: block length $N$.

1402:

1403: We begin with Fano's inequality. If there

1404: is a suitable code as defined in the problem statement, then we must

1405: have

1406: \begin{equation}

1407:   H(U_1^{LN}U_2^{LN}\dots U_M^{LN}

1408:     | \hat{U}_1^{LN}\hat{U}_2^{LN}\dots \hat{U}_M^{LN})

1409:   \;\; \leq \;\;

1410:   P_e^{(LN)} \log\left(|{\mathcal U}_1^{LN}\!\times{\mathcal U}_2^{LN}

1411:                        \!\times\dots\times{\mathcal U}_M^{LN}|\right)

1412:                        + h(P_e^{(LN)}),

1413:   \label{eq:fano2}

1414: \end{equation}

1415: where $h(\cdot)$ denotes the binary entropy function, and

1416: $\hat U_{i}^{LN}=(\hat U_{i}^N(1),\hat U_{i}^N(2),\dots,\hat U_{i}^N(L))$

1417: denotes $L$ blocks of $N$ snapshots reconstructed at $v_0$.

1418: For convenience, we define also

1419: \[ \delta(P_e^{(LN)}) \;\; = \;\;

1420:    \left(P_e^{(LN)}\log\left(|{\mathcal U}_1^{LN}\times{\mathcal U}_2^{LN}

1421:    \times\dots\times{\mathcal U}_M^{LN}|\right)+h(P_e^{(LN)})\right)/LN.

1422: \]

1423: It follows from eqn.~(\ref{eq:fano2}) that

1424: \begin{eqnarray*}

1425: \lefteqn{H(U_1^{LN}U_2^{LN}\dots U_M^{LN}|U_0^{LN}Y_{10}^{BN}Y_{20}^{BN}\dots Y_{M0}^{BN})} \\

1426: %  & = & H(U_1^{LN}U_2^{LN}\dots U_M^{LN}|U_0^{LN}Y_{10}^{BN}Y_{20}^{BN}\dots

1427:  %         Y_{M0}^{BN}h^{(0)}(Y_{10}^{WN}Y_{20}^{WN}\dots Y_{M0}^{WN})\dots

1428: %h^{(L-1)}(Y_{10}^{WN}Y_{20}^{WN}\dots Y_{M0}^{WN})) \\

1429:  &\stackrel{(a)}{=} & H(U_1^{LN}U_2^{LN}\dots U_M^{LN}|U_0^{LN}Y_{10}^{BN}Y_{20}^{BN}\dots

1430:           Y_{M0}^{BN}\hat{U}_1^{LN}\hat{U}_2^{LN}\dots \hat{U}_M^{LN})\\

1431:   & \leq & H(U_1^{LN}U_2^{LN}\dots U_M^{LN}|\hat{U}_1^{LN}\hat{U}_2^{LN}\dots \hat{U}_M^{LN}) \\

1432:   & \leq & LN \delta(P_e^{(LN)}),

1433: \end{eqnarray*}

1434: where $Y_{ij}^{BN}=(Y_{ij}^N(1),Y_{ij}^N(2),\dots,Y_{ij}^N(B))$ denotes

1435: $B=W+(L-1)$ blocks of $N$ channel outputs observed by node $v_j$ while

1436: communicating with node $v_i$, and (a) follows from the fact that the

1437: estimates $\hat U_i^{LN}$, $i=1\dots M$, are functions of $U_0^{LN}$ and

1438: of the received channel outputs $Y_{i0}^{BN}$, $i=1\dots M$.  From the

1439: chain rule for entropy, from the fact that conditioning does not increase

1440: entropy, and for any $S\subseteq {\cal M}=\{0...M\}$, $S\neq\emptyset$,

1441: $0\in S^c$, it follows that

1442: \begin{equation}

1443: \label{eq:ineq}

1444:  H(U^{LN}_S|U_{S^c}^{LN}Y_{S\to S^c}^{BN}Y_{S^c\to S^c}^{BN})

1445:  \;\; \leq \;\;

1446:  H(U^{LN}_S|U_{S^c}^{LN}Y_{S\to 0}^{BN}Y_{S^c\backslash\{0\}\to 0}^{BN})

1447:  \;\; \leq \;\;

1448:  LN \delta(P_e^{(LN)}).

1449: \end{equation}

1450: Let the set of $B$ codewords sent by

1451: the nodes in a subset $A$ to the nodes in a subset $D$ be

1452: \[X_{A\to D}^{BN}=\{X_{ij}^{BN}:i\in A \textrm{ and } j\in D\},\]

1453: and, likewise, the corresponding channel outputs be denoted as

1454: \[Y_{A\to D}^{BN}=\{Y_{ij}^{BN}:i\in A \textrm{ and } j\in D\}.\]

1455:

1456: We will make use of the following lemmas.

1457:

1458: \begin{lemma}\label{lemma:mi}

1459: Let $X_{S\to S^c}$ be a set of channel inputs and $Y_{S\to S^c}$ be

1460: a set of channel outputs of an array of independent channels

1461: $\{{\cal X}_{ij},p_{ij}(y|x),{\cal Y}_{ij}\}$, $\forall i\in S$

1462: and $\forall j\in S^c$.  Then,

1463: \begin{equation}\label{eq:lemma}

1464: I(X_{S\to S^c};Y_{S\to S^c})\leq \sum_{{i\in S},{j\in S^c}} I(X_{ij};Y_{ij}).

1465: \end{equation}

1466: \end{lemma}

1467: \begin{proof}

1468: Without loss of generality, assume that $S=\{1,\dots, x_0\}$ and

1469: $S^c=\{x_0+1,\dots, M\}$.  From the definition of mutual information, it

1470: follows that

1471: \begin{eqnarray*}

1472: I(X_{S\to S^c};Y_{S\to S^c})&=&H(Y_{S\to S^c})-H(Y_{S\to S^c}|X_{S\to S^c}).

1473: \end{eqnarray*}

1474: Expanding the first term on the right handside, we get

1475: \begin{eqnarray*}

1476: H(Y_{S\to S^c})&=&H(Y_{1\to S^c}Y_{2\to S^c}\dots Y_{x_0\to S^c})\\

1477: &\leq& \sum_{i\in S} H(Y_{i\to S^c})\\

1478: &=& \sum_{i\in S} H(Y_{i\to x_0+1}Y_{i\to x_0+2}\dots Y_{i\to M})\\

1479: &\leq& \sum_{i\in S,j\in S^c} H(Y_{ij})\\

1480: \end{eqnarray*}

1481: Similarly, the second term reduces to

1482: \begin{eqnarray*}

1483: \lefteqn{H(Y_{S\to S^c}|X_{S\to S^c})}\\

1484: &=&H(Y_{1\to S^c}Y_{2\to S^c}\dots Y_{x_0\to S^c}| X_{1\to S^c}X_{2\to S^c}\dots X_{x_0\to S^c})\\

1485: &=& H(Y_{1\to S^c}| X_{1\to S^c}X_{2\to S^c}\dots X_{x_0\to S^c})+\sum_{i=2}^{x_0} H(Y_{i\to S^c}| X_{1\to S^c}X_{2\to S^c}\dots X_{x_0\to S^c}Y_{1\to S^c}\dots Y_{i-1\to S^c})\\

1486: &=&  H(Y_{1\to S^c}| X_{1\to S^c})+\sum_{i=2}^{x_0} H(Y_{i\to S^c}| X_{i\to S^c})\\

1487: &=& \sum_{i\in S}  H(Y_{i\to S^c}| X_{i\to S^c})\\

1488: &=& \sum_{i\in S}  H(Y_{i\to x_0+1}Y_{i\to x_0+2}\dots Y_{i\to M}| X_{i\to x_0+1}X_{i\to x_0+2}\dots X_{i\to M})\\

1489: &=& \sum_{i\in S} \bigg(H(Y_{i\to x_0+1}| X_{i\to x_0+1}X_{i\to x_0+2}\dots X_{i\to M})\\&& +\!\!\sum_{j=x_0+2}^M  H(Y_{i\to j}| X_{i\to x_0+1}X_{i\to x_0+2}\dots X_{i\to M})Y_{i\to x_0+1}\dots Y_{i\to j-1})\bigg)\\

1490: &=& \sum_{i\in S} \bigg(H(Y_{i\to x_0+1}| X_{i\to x_0+1})+ \sum_{j=x_0+2}^M  H(Y_{i\to j}| X_{i\to j})\bigg)\\

1491: &=& \sum_{i\in S,j\in S^c} H(Y_{ij}| X_{ij}).

1492: \end{eqnarray*}

1493: Combining the two expressions, we get

1494: \[

1495:   I(X_{S\to S^c};Y_{S\to S^c})

1496:     \;\; \leq \;\; \sum_{i\in S,j\in S^c} H(Y_{ij})-H(Y_{ij}|X_{ij})

1497:     \;\; = \;\; \sum_{i\in S,j\in S^c} I(X_{ij};Y_{ij}),

1498: \]

1499: thus proving the lemma.

1500: \end{proof}

1501:

1502: \begin{lemma}

1503: \label{lemma:chain1}

1504: $U_S^{LN}\rightarrow(U_{S^c}^{LN}Y_{S\to S^c}^{BN})\rightarrow

1505: Y_{S^c\to S^c}^{BN}$ forms a Markov chain.

1506: \end{lemma}

1507: \begin{proof}

1508: We begin by expanding $p(u_S^{LN}u_{S^c}^{LN}y_{S\to

1509: S^c}^{BN}y_{S^c\to S^c}^{BN})$ according to

1510: \begin{eqnarray*}

1511: p(u_S^{LN}u_{S^c}^{LN}y_{S\to S^c}^{BN}y_{S^c\to S^c}^{BN})

1512: &=&p(u_{S}^{LN}) \cdot p(u_{S^c}^{LN}y_{S\to S^c}^{BN}|u_S^{LN})

1513: \cdot p(y_{S^c\to S^c}^{BN}|u_S^{LN}u_{S^c}^{LN}y_{S\to S^c}^{BN}).

1514: \end{eqnarray*}

1515: To prove that $U_S^{LN}$ can be removed from the last factor in

1516: the previous expression, we will use an induction argument on the

1517: length of the pipeline, $L$, and window sizes, $K$ and $T$.

1518:

1519: Fix $(S,S^c)$ and $i,j\in S^c$. Let $L=K=T=1$. The encoding functions

1520: produce $g_{ij}(U_i^N)=X_{i\to j}^N$, which result in the channel

1521: outputs $Y_{i\to j}^N$ after transmission over the DMC between

1522: nodes $i$ and $j$. In shorthand, we write

1523: \[ g_{ij}(U_i^N)

1524:    \;\; = \;\; X_{i\to j}^N

1525:    \;\; \stackrel{\textrm{\tiny DMC}}{\longrightarrow} \;\; Y_{i\to j}^N.

1526: \]

1527: Thus, the first block of channel inputs

1528: $X_{S^c\to S^c}^{1\dots N}$

1529: generated in the node set $S_c$ depends only on source symbols

1530:  $U_{S^c}^{1\dots N}$ available in $S_c$. Moreover,

1531: since the channels are DMCs, the

1532: channel outputs depend only on the channel inputs.

1533: Thus, we conclude that $U_{S}^{1\dots N}$ and $Y_{S^c\to S^c}^{1\dots N}$

1534: are independent given  $U_{S^c}^{1\dots N}$.

1535:

1536: Since we consider a pipeline of length $L=1$, there are no more blocks

1537: to inject, but not all data may have arrived to destination, so we have

1538: to allow for a few ($W$, to be precise) extra transmissions.  By ``flushing

1539: the pipeline'', we have

1540: \[ g_{ij}(Y_{S\to i}^{1\dots N}Y_{S^c\to i}^{1\dots N})

1541:    \;\; = \;\; X_{i\to j}^{N+1...2N}

1542:    \;\; \stackrel{\textrm{\tiny DMC}}{\longrightarrow} \;\;

1543:    Y_{i\to j}^{N+1\dots 2N}.

1544: \]

1545: It follows that $Y_{S^c\to S^c}^{N+1\dots 2N}$ is independent of

1546: $U_S^{1\dots N}$ given $Y_{S\to S^c}^{1\dots N}$ and $U_{S^c}^{1\dots N}$.

1547: Similarly, we have

1548: \[ g_{ij}(Y_{S\to i}^{(W-2)N+1\dots(W-1)N}

1549:           Y_{S^c\to i}^{(W-2)N+1\dots (W-1)N})

1550:    \;\; = \;\; X_{i\to j}^{(W-1)N+1\dots WN}

1551:    \;\; \stackrel{\textrm{\tiny DMC}}{\longrightarrow} \;\;

1552:    Y_{i\to j}^{(W-1)N+1\dots WN},

1553: \]

1554: from which we conclude that

1555: $Y_{S^c\to S^c}^{(W-1)N+1\dots WN}$ is independent of $U_S^{1\dots N}$

1556: given $Y_{S\to S^c}^{(W-2)N+1\dots (W-1)N}$ and $U_{S^c}^{1\dots N}$.

1557: Thus, for $K=T=L=1$, and $W$ arbitrary,\footnote{Since $W$ is the delay

1558: used to allow data to flow to the destination, it would not be reasonable

1559: to perform induction on $W$ for a given fixed network. Instead we take

1560: $W$ as a parameter, which must be greater or equal to the diameter of

1561: the network.} the Markov chain in the lemma holds (with $B=L+W-1$).

1562:

1563: To proceed with the inductive proof, we still take $K=T=1$, $(S,S^c)$

1564: fixed, $i,j\in S^c$, but $L$ is now arbitrary.  By inductive hypothesis,

1565: we have the following Markov chain

1566: \[ U_S^{(L-1)N}

1567:    \;\; \rightarrow \;\; (U_{S^c}^{(L-1)N}Y_{S\to S^c}^{(B-1)N})

1568:    \;\; \rightarrow \;\; Y_{S^c\to S^c}^{(B-1)N}.\]

1569: Encoding and transmission of the last block of each source yields

1570: \[ g_{ij}(U_i^{(L-1)N+1...LN}Y_{S\to i}^{(L-1)N+1\dots LN}

1571:    Y_{S^c\to i}^{(L-1)N+1\dots LN})

1572:    \;\; = \;\; X_{i\to j}^{LN+1\dots (L+1)N}

1573:    \;\; \stackrel{\textrm{\tiny DMC}}{\longrightarrow} \;\;

1574:    Y_{i\to j}^{LN+1\dots (L+1)N},

1575: \]

1576: such that for the last block, we have that

1577: \[ U_S^{LN} \;\; \rightarrow \;\; (U_{S^c}^{LN}Y_{S\to S^c}^{(L+1)N})

1578:    \;\; \rightarrow \;\; Y_{S^c\to S^c}^{(L+1)N}.\]

1579: This is not yet the sought Markov chain, as we still need to flush the

1580: pipe.  But similarly to how it was done for the base case of this inductive

1581: argument, we have that

1582: \[\begin{array}{ccccc}

1583:   g_{ij}(Y_{S\to i}^{LN+1\dots (L+1)N}Y_{S^c\to i}^{LN+1\dots (L+1)N})

1584:   & = & X_{i\to j}^{(L+1)N+1\dots (L+2)N}

1585:   & \stackrel{\textrm{\tiny DMC}}{\longrightarrow} &

1586:    Y_{i\to j}^{(L+1)N+1\dots (L+2)N}, \\

1587:   & & \vdots & &\\

1588:   g_{ij}(Y_{S\to i}^{(B-2)N+1\dots (B-1)N}Y_{S^c\to i}^{(B-2)N+1\dots (B-1)N})

1589:   & = & X_{i\to j}^{(B-1)N+1\dots BN}

1590:   & \stackrel{\textrm{\tiny DMC}}{\longrightarrow} &

1591:    Y_{i\to j}^{(B-1)N+1\dots BN},

1592: \end{array}\]

1593: and therefore, now yes, we have that

1594: $Y_{S^c\to S^c}^{BN}$ is independent of $U_S^{1\dots N}$

1595: given $Y_{S\to S^c}^{BN}$ and $U_{S^c}^{1\dots N}$.

1596:

1597: The proof of the lemma is completed by performing the exact same

1598: induction steps on $K$ and $T$ as done on $L$.  For brevity, those

1599: same steps are omitted from this proof.

1600: \end{proof}

1601:

1602: \subsubsection{Main Proof}

1603:

1604: We now take an arbitrary non-empty subset $S \subseteq {\cal M}=\{0...M\}$,

1605: $S\neq\emptyset$, $0\in S^c$. and start by bounding $H(U_S^{LN})$ according

1606: to

1607: \begin{eqnarray*}

1608: H(U_S^{LN})

1609:   &=& I\big(U_S^{LN};U_{S^c}^{LN}Y_{S\to S^c}^{BN}Y_{S^c\to S^c}^{BN}\big)\;+\;

1610:       H\big(U_S^{LN}|U_{S^c}^{LN}Y_{S\to S^c}^{BN}Y_{S^c\to S^c}^{BN}\big) \\

1611:   &\stackrel{(a)}{\leq}&

1612:     I\big(U_S^{LN};U_{S^c}^{LN}Y_{S\to S^c}^{BN}Y_{S^c\to S^c}^{BN}\big)

1613:     \;+\; LN\delta(P_e^{(LN)}) \\

1614:   &=& I\big(U_S^{LN};U_{S^c}^{LN}\big)

1615:     \;+\;I(U_S^{LN};Y_{S\to S^c}^{BN}|U_{S^c}^{LN})

1616:     \;+\;I(U_S^{LN};Y_{S^c\to S^c}^{BN}|U_{S^c}^{LN}Y_{S\to S^c}^{BN})

1617:     \:+\;LN\delta(P_e^{(LN)}),

1618: \end{eqnarray*}

1619: where (a) follows from~(\ref{eq:ineq}).  From Lemma~\ref{lemma:chain1}, we

1620: have that $I(U_S^{LN};Y_{S^c\to S^c}^{BN}|U_{S^c}^{LN}Y_{S\to S^c}^{BN}) = 0$,

1621: and so we get

1622: \begin{equation}

1623: H(U_S^{LN}) \;\; \leq \;\;

1624:   I(U_S^{LN};U_{S^c}^{LN}) \; + \; I(U_S^{LN};Y_{S\to S^c}^{BN}|U_{S^c}^{LN})

1625:   \; + \; LN\delta(P_e^{(LN)}).

1626: \label{eq:almostend}

1627: \end{equation}

1628:

1629: Developing the second term on the right handside yields:

1630: \begin{eqnarray*}

1631: \lefteqn{I(U_S^{LN};Y_{S\to S^c}^{BN}|U_{S^c}^{LN})} \\

1632: & = & \sum_{k=1}^{BN}I(U_S^{LN};Y_{S\to S^c}(k)|U_{S^c}^{LN}Y_{S\to S^c}^{k-1})\\

1633: & \leq & \sum_{k=1}^{BN}I(U_S^{LN};Y_{S\to

1634: S^c}(k)|U_{S^c}^{LN}Y_{S\to

1635: S^c}^{k-1}) +\sum_{k=1}^{BN}I(X_{S\to

1636: S^c}(k);Y_{S\to

1637: S^c}(k)|U_{S^c}^{LN}Y_{S\to

1638: S^c}^{k-1}U_S^{LN})

1639: \\

1640: &=&\sum_{k=1}^{BN}I(X_{S\to S^c}(k)U_S^{LN};Y_{S\to

1641: S^c}(k)|U_{S^c}^{LN}Y_{S\to

1642: S^c}^{k-1})

1643: \\

1644: &=&\sum_{k=1}^{BN}I(X_{S\to S^c}(k);Y_{S\to S^c}(k)|U_{S^c}^{LN}Y_{S\to S^c}^{k-1}) +I(U_S^{LN};Y_{S\to S^c}(k)|U_{S^c}^{LN}Y_{S\to S^c}^{k-1}X_{S\to S^c}(k))\\

1645: \end{eqnarray*}\begin{eqnarray*}

1646: &\stackrel{(a)}{=}&\sum_{k=1}^{BN}I(X_{S\to S^c}(k);Y_{S\to S^c}(k)|U_{S^c}^{LN}Y_{S\to S^c}^{k-1})\\

1647: &=&\sum_{k=1}^{BN}H(Y_{S\to

1648: S^c}(k)|U_{S^c}^{LN}Y_{S\to

1649: S^c}^{k-1}) -

1650: H(Y_{S\to S^c}(k)|U_{S^c}^{LN}Y_{S\to S^c}^{k-1}X_{S\to S^c}(k))\\

1651: &\stackrel{(b)}{=}&\sum_{k=1}^{BN}H(Y_{S\to S^c}(k)|U_{S^c}^{LN}Y_{S\to S^c}^{k-1})-H(Y_{S\to S^c}(k)|X_{S\to S^c}(k))\\

1652: &\stackrel{(c)}{\leq}&\sum_{k=1}^{BN}H(Y_{S\to

1653: S^c}(k))-H(Y_{S\to

1654: S^c}(k)|X_{S\to S^c}(k))\\

1655: &=&\sum_{k=1}^{BN}I(X_{S\to S^c}(k);Y_{S\to S^c}(k))\\

1656: &\stackrel{(d)}{\leq}&\sum_{k=1}^{BN}\sum_{i\in S, j\in S^c} I(X_{ij}(k);Y_{ij}(k))\\

1657: &=&\sum_{i\in S, j\in S^c} \sum_{k=1}^{BN} I(X_{ij}(k);Y_{ij}(k))\\

1658: &\leq&\sum_{i\in S, j\in S^c}  BNC_{ij}

1659: \end{eqnarray*}

1660: where we use the following arguments:

1661: \begin{itemize}

1662: \item[(a)] given the channel inputs  $X_{S\to S^c}(i)$ the

1663:   channel outputs $Y_{S\to S^c}(i)$ are independent of all

1664:   other random variables;

1665: \item[(b)] same as (a);

1666: \item[(c)] conditioning does not increase the entropy;

1667: \item[(d)] direct application of lemma \ref{lemma:mi}.

1668: \end{itemize}

1669: Substituting in (\ref{eq:almostend}) yields

1670: \begin{eqnarray*}

1671:  H(U_S^{LN}) &\leq& I(U_S^{LN};U_{S^c}^{LN}) + \sum_{i\in S, j\in S^c} BNC_{ij}

1672:                     + LN\delta(P_e^{(LN)}).

1673: \end{eqnarray*}

1674: Using the fact that the sources are drawn i.i.d., this last expression

1675: can be rewritten as

1676: \[

1677:   LNH(U_S) \;\;\leq\;\; LNI(U_S;U_{S^c})

1678:                + \sum_{i\in S, j\in S^c} BNC_{ij}

1679:                 + LN\delta(P_e^{(LN)}),

1680: \]

1681: or equivalently,

1682: \[

1683: H(U_S|U_{S^c}) \;\;\leq\;\;  \frac{B}{L}\sum_{i\in S, j\in S^c}

1684: C_{ij} + \delta(P_e^{(LN)})

1685: \;\;\leq\;\; \frac{(W+L-1)}{L} \sum_{i\in S, j\in S^c} C_{ij}+\delta(P_e^{(LN)})\\

1686: \]

1687: Finally, we observe that this inequality holds for all finite values

1688: of $L$.  Thus, it must also be the case that

1689: \begin{eqnarray*}

1690: H(U_S|U_{S^c})

1691:   & < & \inf_{L=1,2,...}

1692:            \frac{(W+L-1)}{L} \sum_{i\in S, j\in S^c} C_{ij}+\delta(P_e^{(LN)}) \\

1693:   & = &    \sum_{i\in S, j\in S^c} C_{ij}+\delta(P_e^{(LN)}).

1694: \end{eqnarray*}

1695: But since $\delta(P_e^{(LN)})$ goes to zero as $P_e^{(N)}\rightarrow 0$,

1696: we get

1697: \[ H(U_S|U_{S^c}) \;\;<\;\; \sum_{i\in S, j\in S^c} C_{ij}, \]

1698: thus concluding the proof.  \tend

1699:

1700:

1701: %\bibliographystyle{unsrt}

1702: %\bibliography{library}

1703: \begin{thebibliography}{10}

1704:

1705: \bibitem{BarrosS:02b}

1706: J.~Barros and S.~D. Servetto.

1707: \newblock {On the Capacity of the Reachback Channel in Wireless Sensor

1708:   Networks}.

1709: \newblock In {\em Proc. IEEE Int. Workshop Multimedia Sig. Proc.}, US Virgin

1710:   Islands, 2002.

1711: \newblock Invited paper to the special session on {\em Signal Processing for

1712:   Wireless Networks}.

1713:

1714: \bibitem{BarrosS:03a}

1715: J.~Barros and S.~D. Servetto.

1716: \newblock {Reachback Capacity with Non-Interfering Nodes}.

1717: \newblock In {\em Proc. IEEE Int. Symp. Inform. Theory (ISIT)}, Yokohama,

1718:   Japan, 2003.

1719:

1720: \bibitem{BarrosS:03d}

1721: J.~Barros and S.~D. Servetto.

1722: \newblock {Coding Theorems for the Sensor Reachback Problem with Partially

1723:   Cooperating Nodes}.

1724: \newblock In {\em Discrete Mathematics and Theoretical Computer Science

1725:   (DIMACS) series on Network Information Theory}, Piscataway, NJ, 2003.

1726:

1727: \bibitem{BarrosS:05}

1728: J.~Barros and S.~D. Servetto.

1729: \newblock {A Coding Theorem for Network Information Flow with Correlated

1730:   Sources}.

1731: \newblock In {\em Proc. IEEE Int. Symp. Inform. Theory (ISIT)}, Adelaide,

1732:   Australia, 2005.

1733:

1734: \bibitem{CoverT:91}

1735: T.~M. Cover and J.~Thomas.

1736: \newblock {\em {Elements of Information Theory}}.

1737: \newblock John Wiley and Sons, Inc., 1991.

1738:

1739: \bibitem{AkyildizSSC:02}

1740: I.~F. Akyildiz, W.~Su, Y.~Sankarasubramaniam, and E.~Cayirci.

1741: \newblock {A Survey on Sensor Networks}.

1742: \newblock {\em IEEE Communications Mag.}, 40(8):102--114, 2002.

1743:

1744: \bibitem{BergerZV:96}

1745: T.~Berger, Z.~Zhang, and H.~Viswanathan.

1746: \newblock {The CEO Problem}.

1747: \newblock {\em IEEE Trans. Inform. Theory}, 42(3):887--902, 1996.

1748:

1749: \bibitem{BertsekasG:92}

1750: D.~Bertsekas and R.~Gallager.

1751: \newblock {\em {Data Networks (2nd ed)}}.

1752: \newblock Prentice Hall, 1992.

1753:

1754: \bibitem{HuS:03c}

1755: A.~Hu and S.~D. Servetto.

1756: \newblock {dFSK: {\em Distributed} Frequency Shift Keying Modulation in Dense

1757:   Sensor Networks}.

1758: \newblock In {\em Proc. IEEE Int. Conf. Commun. (ICC)}, Paris, France, 2004.

1759:

1760: \bibitem{HuS:03b}

1761: A.~Hu and S.~D. Servetto.

1762: \newblock {Algorithmic Aspects of the Time Synchronization Problem in

1763:   Large-Scale Sensor Networks}.

1764: \newblock {\em ACM/Kluwer Mobile Networks and Applications}, 10:491--503, 2005.

1765: \newblock Special issue with selected (and revised) papers from ACM WSNA 2003.

1766:

1767: \bibitem{HuS:05}

1768: A.~Hu and S.~D. Servetto.

1769: \newblock {On the Scalability of Cooperative Time Synchronization in

1770:   Pulse-Connected Networks}.

1771: \newblock IEEE Trans. Inform. Theory, to appear. Available from

1772:   \href{http://cn.ece.cornell.edu/}{{\tt http://cn.ece.cornell.edu/}}.

1773:

1774: \bibitem{Berger:78}

1775: T.~Berger.

1776: \newblock {\em The Information Theory Approach to Communications (G. Longo,

1777:   ed.)}, chapter Multiterminal Source Coding.

1778: \newblock Springer-Verlag, 1978.

1779:

1780: \bibitem{CsiszarK:80}

1781: I.~Csisz\'ar and J.\ K{\"o}rner.

1782: \newblock {Towards a General Theory of Source Networks}.

1783: \newblock {\em IEEE Trans. Inform. Theory}, 26(2):155--166, 1980.

1784:

1785: \bibitem{KoernerM:79}

1786: J.\ K{\"o}rner and K.~Marton.

1787: \newblock {How to Encode the Modulo-Two Sum of Binary Sources}.

1788: \newblock {\em IEEE Trans. Inform. Theory}, 25(2):219--221, 1979.

1789:

1790: \bibitem{KawadiaK:04}

1791: V.~Kawadia and P.~R. Kumar.

1792: \newblock {A Cautionary Perspective on Cross-Layer Design}.

1793: \newblock IEEE Wireless Comm. Mag., 2004. Available from

1794:   \href{http://decision.csl.uiuc.edu/~prkumar/ps_files/cross-layer-design.pdf}%

1795: {{\tt http://decision.csl.uiuc.edu/\~{}prkumar/}}.

1796:

1797: \bibitem{EphremidesH:98}

1798: A.~Ephremides and B.~Hajek.

1799: \newblock {Information Theory and Communication Networks: An Unconsummated

1800:   Union}.

1801: \newblock {\em IEEE Trans. Inform. Theory}, 44(6):2416--2434, 1998.

1802:

1803: \bibitem{SlepianW:73b}

1804: D.~Slepian and J.~K. Wolf.

1805: \newblock {Noiseless Coding of Correlated Information Sources}.

1806: \newblock {\em IEEE Trans. Inform. Theory}, IT-19(4):471--480, 1973.

1807:

1808: \bibitem{CoverGS:80}

1809: T.~M. Cover, A.~A. {El Gamal}, and M.~Salehi.

1810: \newblock {Multiple Access Channels with Arbitrarily Correlated Sources}.

1811: \newblock {\em IEEE Trans. Inform. Theory}, IT-26(6):648--657, 1980.

1812:

1813: \bibitem{Dueck:81}

1814: G.~Dueck.

1815: \newblock {A Note on the Multiple Access Channel with Correlated Sources}.

1816: \newblock {\em IEEE Trans. Inform. Theory}, IT-27(2):232--235, 1981.

1817:

1818: \bibitem{SlepianW:73}

1819: D.~Slepian and J.~K. Wolf.

1820: \newblock {A Coding Theorem for Multiple Access Channels with Correlated

1821:   Sources}.

1822: \newblock {\em Bell Syst. Tech. J.}, 52(7):1037--1076, 1973.

1823:

1824: \bibitem{AhlswedeH:83}

1825: R.~Ahlswede and T.~S. Han.

1826: \newblock {On Source Coding with Side Information via a Multiple-Access

1827:   Channel, and Related Problems in Multi-User Information Theory}.

1828: \newblock {\em IEEE Trans. Inform. Theory}, 29(3):396--411, 1983.

1829:

1830: \bibitem{Willems:83}

1831: F.~M.~J. Willems.

1832: \newblock {The Discrete Memoryless Multiple Access Channel with Partially

1833:   Cooperating Encoders}.

1834: \newblock {\em IEEE Trans. Inform. Theory}, 29(3):441--445, 1983.

1835:

1836: \bibitem{Han:80}

1837: T.~S. Han.

1838: \newblock {Slepian-Wolf-Cover Theorem for a Network of Channels}.

1839: \newblock {\em Inform. Contr.}, 47(1):67--83, 1980.

1840:

1841: \bibitem{AhlswedeCLY:00}

1842: R.~Ahlswede, N.~Cai, S.-Y.~R. Li, and R.~W. Yeung.

1843: \newblock {Network Information Flow}.

1844: \newblock {\em IEEE Trans. Inform. Theory}, 46(4):1204--1216, 2000.

1845:

1846: \bibitem{Borade:02}

1847: S.~Borade.

1848: \newblock {Network Information Flow: Limits and Achievability}.

1849: \newblock In {\em Proc. IEEE Int. Symp. Inform. Theory (ISIT)}, Lausanne,

1850:   Switzerland, 2002.

1851:

1852: \bibitem{LiYC:03}

1853: S.-Y.~R. Li, R.~W. Yeung, and N.~Cai.

1854: \newblock {Linear Network Coding}.

1855: \newblock {\em IEEE Trans. Inform. Theory}, 49(2):371--381, 2003.

1856:

1857: \bibitem{KoetterM:03}

1858: R.~Koetter and M.~M\'edard.

1859: \newblock {An Algebraic Approach to Network Coding}.

1860: \newblock {\em IEEE/ACM Trans. Networking}, 11(5):782--795, 2003.

1861:

1862: \bibitem{EffrosMHRKK:03}

1863: M.~Effros, M.~M\'edard, T.~Ho, S.~Ray, D.~Karger, and R.~Koetter.

1864: \newblock {Linear Network Codes: A Unified Framework for Source, Channel, and

1865:   Network Coding}.

1866: \newblock In {\em Discrete Mathematics and Theoretical Computer Science

1867:   (DIMACS) series on Network Information Theory}, Piscataway, NJ, 2003.

1868:

1869: \bibitem{HoMEK:04}

1870: T.~Ho, M.~M\'edard, M.~Effros, and R.~Koetter.

1871: \newblock {Network Coding for Correlated Sources}.

1872: \newblock In {\em Proc. 38th Annual Conf. Inform. Sciences Syst. (CISS)},

1873:   Princeton, NJ, March 2004.

1874:

1875: \bibitem{LiL:04}

1876: Z.~Li and B.~Li.

1877: \newblock {Network Coding in Undirected Networks}.

1878: \newblock In {\em Proc. 38th Annual Conf. Inform. Sciences Syst. (CISS)},

1879:   Princeton, NJ, March 2004.

1880:

1881: \bibitem{GuptaK:00}

1882: P.~Gupta and P.~R. Kumar.

1883: \newblock {The Capacity of Wireless Networks}.

1884: \newblock {\em IEEE Trans. Inform. Theory}, 46(2):388--404, 2000.

1885:

1886: \bibitem{PerakiS:03}

1887: C.~Peraki and S.~D. Servetto.

1888: \newblock {On the Maximum Stable Throughput Problem in Random Networks with

1889:   Directional Antennas}.

1890: \newblock In {\em Proc. ACM MobiHoc}, Annapolis, MD, 2003.

1891:

1892: \bibitem{PerakiS:04}

1893: C.~Peraki and S.~D. Servetto.

1894: \newblock {Capacity, Stability and Flows in Large-Scale Random Networks}.

1895: \newblock In {\em Proc. IEEE Inform. Theory Workshop (ITW)}, San Antonio, TX,

1896:   2004.

1897:

1898: \bibitem{CormenLRS:01}

1899: T.~H. Cormen, C.~E. Leiserson, R.~L. Rivest, and C.~Stein.

1900: \newblock {\em {Introduction to Algorithms (2nd ed)}}.

1901: \newblock MIT Press, 2001.

1902:

1903: \bibitem{LehmanL:04}

1904: A.~R. Lehman and E.~Lehman.

1905: \newblock {Complexity Classification of Network Information Flow Problems}.

1906: \newblock In {\em Proc. ACM/SIAM Symp. Discr. Alg. (SODA)}, 2004.

1907:

1908: \bibitem{Verdu:02}

1909: S.\ Verd\'u.

1910: \newblock {Spectral Efficiency in the Wideband Regime}.

1911: \newblock {\em IEEE Trans. Inform. Theory}, 48(6):1319--1343, 2002.

1912:

1913: \bibitem{IntanagonwiwatGEHS:03}

1914: C.~Intanagonwiwat, R.~Govindan, D.~Estrin, J.~Heidemann, and F.~Silva.

1915: \newblock {Directed Diffusion for Wireless Sensor Networking}.

1916: \newblock {\em IEEE/ACM Trans. Networking}, 11(1):2--16, 2003.

1917:

1918: \bibitem{GoelE:03}

1919: A.~Goel and D.~Estrin.

1920: \newblock {Simultaneous Optimization for Concave Costs: Single Sink Aggregation

1921:   or Single Source Buy-at-Bulk}.

1922: \newblock In {\em Proc. ACM/SIAM Symp. Discr. Alg. (SODA)}, Baltimore, MD,

1923:   2003.

1924:

1925: \bibitem{BarrosTL:03}

1926: J.~Barros, M.~T{\"u}chler, and S.~P. Lee.

1927: \newblock {Scalable Source/Channel Decoding for Large-Scale Sensor Networks}.

1928: \newblock In {\em Proc. Int. Conf. Commun. (ICC)}, Paris, France, 2004.

1929:

1930: \bibitem{Berger:71}

1931: T.~Berger.

1932: \newblock {\em {Rate Distortion Theory: A Mathematical Basis for Data

1933:   Compression}}.

1934: \newblock Prentice-Hall, Inc., 1971.

1935:

1936: \bibitem{Chiang:05}

1937: M.~Chiang.

1938: \newblock {Balancing Transport and Physical Layers in Wireless Multihop

1939:   Networks: Jointly Optimal Congestion Control and Power Control}.

1940: \newblock {\em IEEE. J. Select. Areas Commun.}, 23(1):104--116, 2005.

1941:

1942: \end{thebibliography}

1943:

1944:

1945: \end{document}

1946:

1947: