cs0504014/pp.tex
1: 
2: %\documentclass[10pt,twocolumn]{IEEEtran}
3: \documentclass[11pt,onecolumn,dvips,draftcls]{IEEEtran}
4: 
5: \usepackage{psfig,amsfonts,amsmath,color,amssymb,amsxtra}
6: \usepackage[breaklinks=true, colorlinks=true, linkcolor=black,
7: urlcolor=dblue, citecolor=black, pdfpagemode=None, pdfstartview=FitH]{hyperref}
8: \DeclareOldFontCommand{\rm}{\normalfont\rmfamily}{\mathrm}
9: \DeclareOldFontCommand{\sf}{\normalfont\sffamily}{\mathsf}
10: \DeclareOldFontCommand{\tt}{\normalfont\ttfamily}{\mathtt}
11: \DeclareOldFontCommand{\bf}{\normalfont\bfseries}{\mathbf}
12: \DeclareOldFontCommand{\it}{\normalfont\itshape}{\mathit}
13: \DeclareOldFontCommand{\sl}{\normalfont\slshape}{\@nomath\sl}
14: \DeclareOldFontCommand{\sc}{\normalfont\scshape}{\@nomath\sc}
15: \definecolor{gray}{cmyk}{.2,0.2,.3,.1}
16: \definecolor{dred}{cmyk}{0,0.9,0.4,0.3}
17: \definecolor{dblue}{rgb}{0,0,0.5}
18: \definecolor{dgreen}{rgb}{0,0.3,0}
19: \definecolor{dgray}{rgb}{0.3,0.3,0}
20: \newtheorem{theorem}{Theorem}
21: \newtheorem{lemma}{Lemma}
22: \newtheorem{corollary}{Corollary}
23: \newtheorem{example}{Example}
24: \newtheorem{definition}{Definition}
25: \newcommand{\rend}{\hfill$\square$}
26: \newcommand{\tend}{\hfill$\blacksquare$}
27: \newtheorem{remark}{Remark}
28: \newcommand{\flow}{\varphi}
29: \setlength{\textwidth}{17cm}
30: 
31: 
32: \title{Network Information Flow \\ with Correlated Sources
33:   \thanks{J.\ Barros was with the
34:   Institute for Communications Engineering, Munich University of Technology,
35:   Munich, Germany.  He is now with the Department of Computer Science,
36:   University of Porto, Portugal.  URL:
37:   %\href{http://cn.ece.cornell.edu/redirect-joao.html}
38:   \href{http://www.dcc.fc.up.pt/\~barros/}
39:   {{\tt http://www.dcc.fc.up.pt/$\sim$barros/}}.
40:   S.\ D.\ Servetto is with
41:   the School of Electrical and Computer Engineering, Cornell University,
42:   Ithaca, NY.  URL:
43:   \href{http://cn.ece.cornell.edu/}{{\tt http://cn.}}
44:   \href{http://cn.ece.cornell.edu/}{{\tt ece.cornell.edu/}}.
45:   Work supported by a scholarship from
46:   the Fulbright commission, and by the National Science Foundation, under
47:   awards CCR-0238271 (CAREER), CCR-0330059, and ANR-0325556.  Previous
48:   conference publications:~\cite{BarrosS:02b, BarrosS:03a, BarrosS:03d,
49:   BarrosS:05}.}}
50: \author{Jo\~{a}o Barros \hspace{2cm} Sergio D.\ Servetto}
51: \date{October 2, 2005.}
52: 
53: 
54: \begin{document}
55: \maketitle
56: 
57: \begin{picture}(0,0)
58: \put(0,220){\tt\small To appear in the IEEE Transactions on Information
59:   Theory.}
60: \end{picture}
61: 
62: \vspace{-13mm}
63: \begin{abstract}
64: \noindent\it
65: Consider the following network communication setup, originating
66: in a sensor networking application we refer to as the ``sensor
67: reachback'' problem.  We have a directed graph $G=(V,E)$, where
68: $V = \{v_0v_1...v_n\}$ and $E\subseteq V\times V$.  If $(v_i,v_j)\in E$,
69: then node $i$ can send messages to node $j$ over a discrete memoryless
70: channel $(\mathcal{X}_{ij},p_{ij}(y|x),\mathcal{Y}_{ij})$, of capacity
71: $C_{ij}$.  The channels are independent.  Each node $v_i$ gets to
72: observe a source of information $U_i$ ($i=0...M$), with joint
73: distribution $p(U_0U_1...U_M)$.  Our goal is to solve an incast
74: problem in $G$: nodes exchange messages with their neighbors, and after
75: a finite number of communication rounds, one of the $M+1$ nodes ($v_0$
76: by convention) must have received enough information to reproduce
77: the entire field of observations $(U_0U_1...U_M)$, with arbitrarily
78: small probability of error.  In this paper, we prove that such perfect
79: reconstruction is possible if and only if
80: \[
81:   H(U_S|U_{S^c}) \;\;<\;\; \sum_{i\in S,j\in S^c} C_{ij},
82: \]
83: for all $S \subseteq \{0...M\}$, $S\neq\emptyset$, $0\in S^c$.  Our
84: main finding is that in this setup a general source/channel separation
85: theorem holds, and that Shannon information behaves as a classical
86: network flow, identical in nature to the flow of water in pipes.  At
87: first glance, it might seem surprising that separation holds in a
88: fairly general network situation like the one we study.  A closer look,
89: however, reveals that the reason for this is that our model allows
90: only for independent point-to-point channels between pairs of nodes,
91: and not multiple-access and/or broadcast channels, for which separation
92: is well known not to hold~\cite[pp.\ 448-49]{CoverT:91}.  This
93: ``information as flow'' view provides an algorithmic interpretation
94: for our results, among which perhaps the most important one is the
95: optimality of implementing codes using a {\em layered} protocol stack.
96: \end{abstract}
97: 
98: \vspace{1cm}
99: \pagebreak
100: 
101: 
102: \section{Introduction}
103: 
104: \subsection{The Sensor Reachback Problem}
105: 
106: Wireless sensor networks made up of small, cheap, and mostly unreliable
107: devices equipped with limited sensing, processing and transmission
108: capabilities, have recently sparked a fair amount of interest in
109: communications problems involving multiple correlated sources and
110: large-scale wireless networks~\cite{AkyildizSSC:02}.  It is envisioned
111: that an important class of applications for such networks involves a
112: dense deployment of a large number of sensors over a fixed area, in
113: which a physical process unfolds---the task of these sensors is then
114: to collect measurements, encode them, and relay them to some data
115: collection point where this data is to be analyzed, and possibly acted
116: upon.  This scenario is illustrated in Fig.~\ref{fig:reachback}.
117: 
118: \begin{figure}[ht]
119: \centerline{\psfig{width=12cm,height=4cm,file=reachback.eps}}
120: \caption{A large number of sensors is deployed over a target area.
121:   After collecting the data of interest, the sensors must {\it reach back}
122:   and transmit this information to a single receiver (e.g., an overflying
123:   plane) for further processing.}
124: \label{fig:reachback}
125: \end{figure}
126: 
127: There are several aspects that make this communications problem
128: interesting:
129: \begin{itemize}
130: \item {\it Correlated Observations:} If we have a large number of nodes
131:   sensing a physical process within a confined area, it is reasonable to
132:   assume that their measurements are correlated. This correlation may be
133:   exploited for efficient encoding/decoding.
134: \item {\it Cooperation among Nodes:} Before transmitting data to the
135:   remote receiver, the sensor nodes may establish a {\it conference}
136:   to exchange information over the wireless medium and increase their
137:   efficiency or flexibility through cooperation.
138: \item {\it Channel Interference:} If multiple sensor nodes use the wireless
139:   medium at the same time (either for conferencing or reachback), their
140:   signals will necessarily interfere with each other.  Consequently,
141:   reliable communication in a reachback network requires a set of rules
142:   that control (or exploit) the interference in the wireless medium.
143: \end{itemize}
144: 
145: In order to capture some of these key aspects, while still being able
146: to provide complete results, we make some modeling assumptions, discussed
147: next.
148: 
149: \subsubsection{Source Model}
150: 
151: We assume that the sources are memoryless, and thus consider only the
152: spatial correlation of the observed samples and not their temporal
153: dependence (since the latter dependencies could be dealt with by simple
154: extensions of our results to the case of ergodic sources).  Furthermore,
155: each sensor node $v_i$ observes only one component $U_i$ and must
156: transmit enough information to enable the sink node $v_0$ to reconstruct
157: the whole vector $U_1U_2\dots U_M$.  This assumption is the most natural
158: one to make for scenarios in which data is required at a remote location
159: for fusion and further processing, but the data capture process is
160: distributed, with sensors able to gather {\em local} measurements only,
161: and deeply embedded in the environment.
162: 
163: A conceptually different approach would be to assume that all sensor
164: nodes get to observe independently corrupted noisy versions of one and
165: the same source of information $U$, and it is this source (and not the
166: noisy measurements) that needs to be estimated at a remote location.
167: This approach seems better suited for applications involving non-homogeneous
168: sensors, where each one of the sensors gets to observe different
169: characteristics of the same source (e.g., multispectral imaging), and
170: therefore leads to a conceptually very different type of sensing
171: applications from those of interest in this work.  Such an approach
172: leads to the so called {\it CEO problem} studied by Berger, Zhang and
173: Viswanathan in~\cite{BergerZV:96}.
174: 
175: \subsubsection{Independent Channels}
176: 
177: Our motivation to consider a network of independent DMCs is twofold.
178: 
179: From a pure information-theoretic point of view independent channels
180: are interesting because, as shown in this paper, this assumption gives
181: rise to long Markov chains which play a central role in our ability to
182: prove the converse part of our coding theorem, and thus obtain conclusive
183: results in terms of capacity.  Moreover, a corollary of said coding
184: theorem does provide a conclusive answer for a special case of the
185: multiple access channel with correlated sources, a problem for which
186: no general converse is known.
187: 
188: From a more practical point of view, the assumption of independent
189: channels is valid for any network that controls interference by means
190: of a reservation-based medium-access control protocol (e.g., TDMA).
191: This option seems perfectly reasonable for sensor networking scenarios
192: in which sensors collect data over extended periods of time, and must
193: then transmit their accumulated measurements simultaneously.  In this
194: case, a key assumption in the design of standard random access techniques
195: for multiaccess communication breaks down---the fact that individual
196: nodes will transmit with low probability~\cite[Chapter~4]{BertsekasG:92}.
197: As a result, classical random access would result in too many collisions
198: and hence low throughput.  Alternatively, instead of {\em mitigating}
199: interference, a medium access control (MAC) protocol could attempt to
200: {\em exploit} it, in the form of using cooperation among nodes to generate
201: waveforms that add up constructively at the receiver (cf.~\cite{HuS:03c,
202: HuS:03b, HuS:05}).  Providing an information-theoretic analysis of such
203: cooperation mechanisms would be very desirable, but since it entails
204: dealing with correlated sources and a general multiple access channel,
205: dealing with correlated sources and an array of independent channels
206: constitutes a reasonable first step towards that goal, and is also
207: interesting in its own right, since it provides the ultimate performance
208: limits for an important class of sensor networking problems.
209: 
210: \subsubsection{Perfect Reconstruction at the Receiver}
211: 
212: In our formulation of the sensor reachback problem, the far receiver
213: is interested in reconstructing the entire field of sensor measurements
214: with arbitrarily small probability of error.  This formulation leads
215: us to a natural {\em capacity} problem, in the classical sense of
216: Shannon.  Alternatively, one could relax the condition of perfect
217: reconstruction, and tolerate some distortion in the reconstruction
218: of the field of measurements at the far receiver, thus leading to
219: the so called {\em Multiterminal Source Coding} problem studied by
220: Berger~\cite{Berger:78}.  This condition could be further relaxed,
221: to require a faithful reproduction of the {\em image} of some function
222: $f$ of the sources, leading to a problem studied extensively by
223: Csiszar, K\"orner and Marton~\cite{CsiszarK:80, KoernerM:79}.
224: 
225: \subsection{An Information Theoretic View of Architectural Issues}
226: 
227: For large-scale, complex systems of the type of interest in this work,
228: the complexity of basic questions of design and performance analysis
229: appears daunting:
230: \begin{itemize}
231: \item How should nodes cooperate to relay messages to the data collector
232:   node $v_0$?  Should they decode received messages, re-encode them, and
233:   forward to other nodes?  Should they map channel outputs to channel
234:   inputs without attempting to decode?  Should they do something else?
235: \item How should redundancy among the sources be exploited?  Should we
236:   compress the information as much as possible?  Should we leave some
237:   of that redundancy to combat noise in the channels?  Is there a
238:   source/channel separation theorem in these networks?
239: \item How do we measure performance of these networks, what are appropriate
240:   cost metrics?  How do we design networks that are efficient under an
241:   appropriate cost metric?
242: \end{itemize}
243: In~\cite{KawadiaK:04}, a number of examples are identified in which
244: the existence of a simple architecture has played an enabling role in
245: the proliferation of technology: the von Neuman computer architecture,
246: separation of source and channel coding in communications, separation
247: of plant and controller in control systems, and the OSI layered
248: architecture model.  So what all these questions boil down to is
249: an issue similar to those considered in~\cite{KawadiaK:04}: what are
250: appropriate abstractions of the network, similar to the IP protocol
251: stack for the Internet, based on which we can break the design task
252: into independent reusable components, optimize the design of these
253: components, and obtain an {\em efficient} system as a result?  In
254: this work, we show how information theory is indeed capable of
255: providing very meaningful answers to this problem.
256: 
257: Information theory, in one of its applications, deals with the analysis
258: of performance of communication systems.  So, to some it may seem the
259: natural theory to turn to for guidance in the task of searching for a
260: suitable network architecture.  However, to others it may seem unnatural
261: to do so: it is well known that information theory and communication
262: networks have not had fruitful interactions in the past, as explained
263: by Ephremides and Hajek~\cite{EphremidesH:98}.  Thus, in the presence
264: of these mixed indicators, we take the stand that indeed information
265: theory has a great deal to offer in the task at hand.  And to justify
266: our position, consider Shannon's model for a communications system, as
267: illustrated in Fig.~\ref{fig:shannon-pt2pt}.
268:  
269: \begin{figure}[!ht]
270: \centerline{\psfig{file=shannon-model.eps,height=5cm,width=14cm}}
271: \caption{\small Shannon's model for a point-to-point system.  Top
272:   figure: abstract view, consisting of a source, an
273:   encoder from source symbols to channel symbols, a conditional
274:   probability distribution to model the random dependence of outputs
275:   on inputs, and a decoder to map from received messages back to
276:   source symbols; bottom figure: a capacity-achieving architecture
277:   for this system, in which error control codes are used to create
278:   an illusion of a noiseless bit pipe.}
279: \label{fig:shannon-pt2pt}
280: \end{figure}
281:  
282: For this setup, Shannon established that reliable communication of
283: a source over a noisy channel is possible if and only if the entropy
284: rate of the source is less than the capacity of the
285: channel~\cite[Ch.\ 8.13]{CoverT:91}.  This result, known as the
286: source/channel separation theorem, has a double significance.  On
287: one hand, it provides an exact single-letter characterization of
288: conditions under which reliable communication is possible.  On the
289: other hand, and of particular interest to the task at hand for
290: us, it is a statement about the {\em architecture} of an optimal
291: communication system: the encoder/decoder design task can be split
292: into the design and optimization of two independent components.
293: So it is inspired by Shannon's teachings for point-to-point
294: systems that we ask in this work, and answer in the affirmative,
295: the question of whether it is possible or not to derive similar
296: useful architectural guidelines for the class of networks under
297: consideration.
298: 
299: \subsection{Related Work}
300: \label{sec:related-work}
301: 
302: The problem of communicating distributed correlated sources over a
303: network of point-to-point links is closely related to several classical
304: problems in network information theory.  To set the stage for the main
305: contributions of this paper, we now review related previous work.
306: 
307: \subsubsection{Distributed Correlated Sources and Multiple Access}
308: 
309: The concept of separate encoding of correlated sources was studied by
310: Slepian and Wolf in their seminal paper~\cite{SlepianW:73b},
311: where they proved that two correlated sources $(U_1U_2)$ drawn i.i.d.\
312: $\sim p(u_1u_2)$ can be compressed at rates $(R_1,R_2)$ if and only if
313: \begin{eqnarray*}
314: R_1 & \geq & H(U_1|U_2) \\
315: R_2 & \geq & H(U_2|U_1) \\
316: R_1+R_2 & \geq & H(U_1U_2).
317: \end{eqnarray*}
318: 
319: Assume now that $(U_1U_2)$ are to be transmitted with arbitrarily
320: small probability of error to a joint receiver over a multiple access
321: channel with transition probability $p(y|x_1x_2)$.  Knowing that the
322: capacity of the multiple access channel with independent
323: sources is given by the convex hull of the set of points $(R_1,R_2)$
324: satisfying~\cite[Ch.\ 14.3]{CoverT:91}
325: \begin{eqnarray*}
326: R_1&<&I(X_1;Y|X_2) \\
327: R_2&<&I(X_2;Y|X_1) \\
328: R_1+R_2&<&I(X_1X_2;Y),
329: \end{eqnarray*}
330: it is not difficult to prove that Slepian-Wolf source coding of
331: $(U_1U_2)$ followed by separate channel coding yields the following
332: {\em sufficient} conditions for reliable communication
333: \begin{eqnarray*}
334: H(U_1|U_2) & < & I(X_1;Y|X_2) \\
335: H(U_2|U_1) & < & I(X_2;Y|X_1) \\
336: H(U_1U_2) & < & I(X_1X_2;Y).
337: \end{eqnarray*}
338: These conditions, which basically state that the Slepian-Wolf region
339: and the capacity region of the multiple access channel have a non-empty
340: intersection, are sufficient but not necessary for reliable communication,
341: as shown by Cover, El Gamal, and Salehi with a simple counterexample
342: in~\cite{CoverGS:80}.  In that same paper, the authors introduce a class
343: of {\it correlated} joint source/channel codes, which enables them to
344: increase the region of achievable rates to
345: \begin{eqnarray}
346: H(U_1|U_2)&<&I(X_1;Y|X_2U_2) \label{eq:cegs1}\\
347: H(U_2|U_1)&<&I(X_2;Y|X_1U_1) \label{eq:cegs2}\\
348: H(U_1U_2)&<&I(X_1X_2;Y), \label{eq:cegs3}
349: \end{eqnarray}
350: for some
351: $p(u_1u_2x_1x_2y)=p(u_1u_2)\cdot p(x_1|u_1)\cdot p(x_2|u_2)\cdot p(y|x_1x_2)$.
352: Also in~\cite{CoverGS:80}, the authors generalize this set of sufficient
353: conditions to sources $(U_1U_2)$ with a common part $W=f(U_1)=g(U_2)$,
354: but they were not able to prove a converse, i.e., they were not able
355: to show that their region is indeed the capacity region of the multiple
356: access channel with correlated sources.  Later, it was shown with a
357: carefully constructed example by Dueck in~\cite{Dueck:81} that indeed
358: the region defined by eqns.~(\ref{eq:cegs1})-(\ref{eq:cegs3}) is not
359: tight.  Related problems were considered by Slepian and
360: Wolf~\cite{SlepianW:73}, and Ahlswede and Han~\cite{AhlswedeH:83}.  
361: To this date however, the general problem still
362: remains open.
363: 
364: Assuming independent sources, Willems investigated a cooperative
365: scenario, in which encoders exchange messages over {\em conference}
366: links of limited capacity prior to transmission over the multiple
367: access channel~\cite{Willems:83}.  In this case, the capacity region
368: is given by
369: \begin{eqnarray*}
370: R_1&<&I(X_1;Y|X_2Z)+C_{12} \\
371: R_2&<&I(X_2;Y|X_1Z)+C_{21} \\
372: R_1+R_2&<&\min\{\;I(X_1X_2;Y|Z)+C_{12}+C_{21},\;I(X_1X_2;Y)\;\},
373: \end{eqnarray*}
374: for some auxiliary random variable $Z$ such that
375: $|\mathcal{Z}|\leq\min(|\mathcal{X}_1|\cdot|\mathcal{X}_2|+2,|\mathcal{Y}|+3)$,
376: and for a joint distribution $p(zx_1x_2y_1y_2)
377: = p(z)\cdot p(x_1|z)\cdot p(x_2|z)\cdot p(y|x_1x_2)$.
378: 
379: \subsubsection{Correlated Sources and Networks of DMCs}
380: 
381: Very recently, an early paper was brought to our attention, in which 
382: Han considers the transmission of correlated sources to a common sink
383: over a network of independent channels~\cite{Han:80}.  Although the
384: problem setup is less general than ours, in that (a) each source block
385: and each transmitted codeword partipate only once in the encoding
386: process, and (b) the intermediate nodes are assumed to decode the
387: data before passing it on, Theorem 3.1 of~\cite{Han:80} is very similar
388: to our Theorem~\ref{thm:main}.
389: 
390: Our work, done independently of Han's, differs from it and complements
391: it in the following ways:
392: \begin{itemize}
393: \item Our setup is more general.  We allow for arbitrary forms of joint
394:   source-channel coding to take place inside the network while data flows
395:   towards the decoder, and then {\em prove} that a one-step encoding
396:   process, pure routing, and separate source/channel coding are sufficient.
397:   Han assumes decode-and-forward in his problem statement, as well as
398:   a one-step encoding process.
399: \item The proof techniques are different.  Han takes a purely combinatorial
400:   approach to the problem: he thoroughly exploits the polymatroidal
401:   structure of the capacity function for the network of channels, and the
402:   co-polymatroidal structure for the Slepian-Wolf region.  We establish our
403:   achievability result by explicitly constructing a routing algorithm for
404:   the Slepian-Wolf indices, and our converse by standard methods based on
405:   Fano's inequality.
406: \end{itemize}
407: Furthermore our work, being motivated by a concrete sensor networking
408: application, establishes connections and relevance to practical engineering
409: problems (see Section~\ref{sec:protocol-stack}) that are not a concern
410: in~\cite{Han:80}.
411: 
412: \subsubsection{Network Coding}
413: 
414: Another closely related problem is the well known {\em network coding}
415: problem, introduced by Ahlswede, Cai, Li and Yeung~\cite{AhlswedeCLY:00}.
416: In that work, the authors establish the need for applying coding
417: operations at intermediate nodes to achieve the max-flow/min-cut bound
418: of a general multicast network.  A converse proof for this problem
419: was provided by Borade~\cite{Borade:02}.  Linear codes were proposed
420: by Li, Yeung and Cai in~\cite{LiYC:03}, and Koetter and M\'edard
421: in~\cite{KoetterM:03}.
422: 
423: Effros, M\'edard et al.\ have developed a comprehensive study on separate
424: and joint design of linear source, channel and network codes for networks
425: with correlated sources under the assumption that all operations are
426: defined over a common finite field~\cite{EffrosMHRKK:03}.  For this
427: particular case, optimality of separate linear source and channel coding
428: was observed in the one-receiver instance, but the result
429: of~\cite{EffrosMHRKK:03} does not prove that it holds for general networks
430: and channels with arbitrary input and output alphabets.  Error exponents
431: for multicasting of correlated sources over a network of noiseless channels
432: were given by Ho, M\'edard et al.\ in~\cite{HoMEK:04}, and networks with
433: undirected links were considered by Li and Li in~\cite{LiL:04}.
434: 
435: Another problem in which network flow techniques have been found useful
436: is that of finding the maximum stable throughput in certain networks.  In
437: this problem, posed by Gupta and Kumar in~\cite{GuptaK:00}, it is sought
438: to determine the maximum rate at which nodes can inject bits into a
439: network, while keeping the system stable.  This problem was reformulated
440: by Peraki and Servetto as a multicommodity flow problem, for which tight
441: bounds were obtained using elementary counting
442: techniques~\cite{PerakiS:03, PerakiS:04}.
443: 
444: \subsection{Main Contributions and Organization of the Paper}
445: 
446: Our main original contributions can be summarized as follows:
447: \begin{itemize}
448: \item A general coding theorem yielding necessary and sufficient
449:   conditions for reliable communication of $M+1$ correlated sources
450:   to a common sink over a network of independent DMCs.
451: \item An achievability proof which combines classical coding arguments
452:   with network flow methods and a converse proof that establishes
453:   the optimality of separate source and channel coding. 
454: \item A detailed discussion on the engineering implications of our
455:   main result, and the concepts of information-theoretically optimal
456:   network architectures and protocol stacks.
457: \end{itemize}
458: 
459: The rest of the paper is organized as follows.  In
460: Section~\ref{sec:coding-theorems} we give formal definitions, to then
461: state and prove our main theorem.  We also look at three special cases:
462: a network with three nodes, the non-cooperative case, and an array of
463: orthogonal Gaussian channels.  In Section~\ref{sec:protocol-stack} we
464: address the practical implications of our main result, by describing
465: an information-theoretically optimal protocol stack, elaborating on the
466: tractability of related network architecture and network optimization
467: problems, and discussing the suboptimality of correlated codes for
468: orthogonal channels.  The paper concludes with
469: Section~\ref{sec:conclusions}.
470: 
471: 
472: \section{A Coding Theorem for Network Information Flow with Correlated
473:   Sources}
474: \label{sec:coding-theorems}
475: 
476: \subsection{Formal Definitions and Statement of the Main Theorem}
477: 
478: A {\em network} is modeled as the complete graph on $M+1$ nodes.
479: For each $(v_i,v_j)\in E$ ($0\leq i,j\leq M$), there is a discrete
480: memoryless channel $(\mathcal{X}_{ij},p_{ij}(y|x),\mathcal{Y}_{ij})$,
481: with capacity $C_{ij} = \max_{p_{ij}(x)} I(X_{ij};Y_{ij})$.\footnote{Note
482: that $C_{ij}$ could potentially be zero, thus assuming a complete graph
483: does not mean necessarily that any node can send messages to any other
484: node in one hop.}
485: At each node $v_i\in V$, a random variable $U_i$ is observed
486: ($i=0...M$), drawn i.i.d.\ from a known joint distribution
487: $p(U_0U_1...U_M)$.  Node $v_0$
488: is the {\em decoder} -- the goal in this problem is to
489: find conditions under which $U_1...U_M$ can be reproduced reliably at
490: $v_0$.  We now make this statement more precise, by describing how the
491: nodes communicate and by giving the formal definitions of code,
492: probability of error and reliable communication.
493: 
494: Time is discrete.  Every $N$ time steps, node $v_i$ collects a block
495: $U_i^N$ of source symbols -- we refer to the collection of all blocks
496: $[U_0^N(k)U_1^N(k)...U_M^N(k)]$ collected at time $kN$ ($k\geq 1$) as
497: a {\em block of snapshots}.  Node $v_i$ then sends a codeword
498: ${X}_{ij}^N$ to node $v_j$.  This codeword depends on a {\em window}
499: of $K$ previous blocks of source sequences $U_i^N$ observed at node
500: $v_i$, and of $T$ previously received blocks of channel outputs,
501: corresponding to noisy versions of the codewords sent by all nodes to
502: node $v_i$ in the previous $T$ communications steps (corresponding to
503: $NT$ time steps).
504: 
505: For a block of snapshots observed at time $kN$, at time $(k+W)N$ (that
506: is, after allowing for a finite but otherwise arbitrary amount of time
507: to elapse,\footnote{During the time that a block of snapshots spends
508: within the network, arbitrarily complex coding operations are allowed
509: within the pipeline: nodes can exchange information, redistribute their
510: load, and in general perform any form of joint source-channel coding
511: operations.  The only constraint imposed is that all information
512: eventually be delivered to destination, within a finite time horizon.}
513: in which the information injected by all nodes 
514: reaches $v_0$), an attempt is made to decode at $v_0$.  The decoder produces
515: an estimate of the block of snapshots $U_0^N(k)U_1^N(k)...U_M^N(k)$ based on
516: the local observations $U_0^N(k)$, and the previous $W$ blocks of $N$
517: channel outputs generated by codewords sent to $v_0$ by the other nodes.
518: 
519: Thus, a {\em code} for this network consists of:
520: \begin{itemize}
521: \item four integers $N$, $K$, $T$ and $W$;
522: \item encoding functions at each node
523:   \[ g_{ij}:\bigotimes_{l=1}^K\mathcal{U}_i^N \times
524:             \bigotimes_{t=1}^T\bigotimes_{m=0}^M \mathcal{Y}_{mi}^N
525:             \longrightarrow \mathcal{X}_{ij}^N,
526:   \]
527:   for $0 \leq i, j \leq M$.
528: \item the decoding function at node $v_0$:
529:   \[ h: \mathcal{U}_0^N \times
530:         \bigotimes_{w=1}^W\bigotimes_{m=1}^M \mathcal{Y}_{m0}^N
531:         \longrightarrow \bigotimes_{m=1}^M \hat{\mathcal{U}}_m^N.
532:   \]
533: \item the block probability of error:
534:   \[ P_e^{(N)} = P(U_1^N...U_M^N\neq\hat{U}_1^N...\hat{U}_M^N). \]
535: \end{itemize}
536: 
537: We say that blocks of snapshots $U_1^N...U_M^N$ can be
538: {\em reliably communicated} to $v_0$ if there exists a sequence of
539: codes as above, with $P_e^{(N)}\to 0$ as $N\to\infty$, for some finite
540: values $K$, $T$ and $W$, all independent of $N$.
541: 
542: With these definitions, we are now ready to state our main theorem.
543: 
544: \begin{theorem}
545: Let $S$ denote a non-empty subset of node indices that does not contain
546: node $0$: $S \subseteq \{0...M\}$, $S\neq\emptyset$, $0\in S^c$.  Then,
547: it is possible to communicate $U_1...U_M$ reliably to $v_0$ if and
548: only if, for all $S$ as above,
549: \begin{equation}
550:  H(U_S|U_{S^c}) \;\;<\;\; \sum_{i\in S,j\in S^c} C_{ij}.
551:   \label{eq:main}
552: \end{equation}
553: \label{thm:main}
554: \end{theorem}
555: 
556: \subsection{Achievability Proof}
557: 
558: Our coding strategy is based on separate source and channel coding.
559: We first use capacity attaining channel codes to turn the noisy network
560: into a network of noiseless links (of capacity $C_{ij}$).  Then, we
561: use Slepian-Wolf source codes, jointly with a custom designed routing
562: algorithm, to deliver all this data to destination.  Since the channel
563: coding aspects of the proof are rather straightforward extensions of
564: classical point-to-point arguments, in the following we only focus on
565: the less obvious source coding and routing aspects.
566: 
567: \subsubsection{Mechanics of the Coding Strategy}
568: 
569: Consider a ``noise-free'' version of the problem formulated above: we
570: still have a complete graph, now with {\em noiseless} links of capacity
571: $C_{ij}$.  Variables $U_i$ are still observed at each node $v_i$, and
572: the goal remains to reproduce all of these at $v_0$.  Each node uses
573: a classical Slepian-Wolf code: there is a source encoder at node $v_i$
574: that maps a sequence $U_i^N$ to an index from the random binning set
575: $\{1,2,\dots,2^{NR_i}\}$, thus compressing the block of observations
576: $U_i^N$ using codes as in~\cite[Thm.\ 14.4.2]{CoverT:91}.  Let
577: $(R_1...R_M)$ denote the rate allocation to each of the nodes.  To
578: achieve perfect reconstruction, these bits must be delivered to node
579: $v_0$.
580: 
581: \begin{itemize}
582: \item Set $K=T=1$ -- each block of source symbols and each block of
583:   codewords participates in the encoding process only once.
584: \item To deliver the bin indices produced by the Slepian-Wolf codes
585:   to destination, the noise-free network is regarded as a flow
586:   network~\cite[Ch.\ 26]{CormenLRS:01}.
587:   Let $\flow(v_i,v_j)$ be a feasible flow in this network, with $M$ sources
588:   $v_1...v_M$, supply $R_i$ at source $v_i$, and a single sink $v_0$.
589:   If no such feasible flow exists, the code construction fails.
590: \item If there is a feasible flow $\flow$ then this $\flow$ uniquely
591:   determines, at each node $v_i$, the number of bits that need to be
592:   sent to each of its neighbors -- thus from $\flow$ we derive the
593:   encoding functions $g_{ij}$ as follows:
594:   \begin{itemize}
595:   \item Consider the directed {\em acyclic} graph $G'$ of $G$ induced by
596:     $\flow$, by taking $V(G') = V(G)$, and
597:     $E(G')=\{(v_i,v_j)\in E:\flow(v_i,v_j)>0\}$.  Define a permutation
598:     $\pi:\{0...M\}\to\{0...M\}$, such that
599:     $[v_{\pi(0)}v_{\pi(1)}...v_{\pi(M)}]$ is a {\em topological sort} of
600:     the nodes in $G$, as illustrated in Fig.~\ref{fig:topological-sort}.
601:     \begin{figure}[ht]
602:     \centerline{\psfig{file=topological-sort.eps,width=16cm,height=2.5cm}}
603:     \vspace{-5mm}
604:     \caption{\small A topological sort of the nodes of a directed acyclic
605:       graph is a linear ordering $v_1...v_M$ such that if $(v_i,v_j)$ is
606:       an edge, then $i<j$.}
607:     \label{fig:topological-sort}
608:     \end{figure}
609:   \item Consider a block of snapshots
610:     ${\bf U}(k)=[U_0^N(k)U_1^N(k)...U_M^N(k)]$ captured at time $kN$.
611:     At time $(k+l)N$ (for $l=0...M$), node $v_{\pi(l)}$ will have received
612:     all bits with portions of the encodings of ${\bf U}(k)$ generated by
613:     nodes upstream in the topological order -- thus, together with its own
614:     encoding of $U^N_{\pi(l)}(k)$, all the bits for ${\bf U}(k)$ up
615:     to and including node $v_{\pi(l)}$ will be available there, and thus
616:     can be routed to nodes downstream in the topological order.
617:   \item Consider now all edges of the form $(v_{\pi(k)},v')$ for which
618:     $\flow(v_{\pi(k)},v') > 0$:
619:     \begin{enumerate}
620:     \item Collect the $m=\sum_{v'} \flow(v',v_{\pi(k)})$ information bits
621:       sent by the upstream nodes $v'$.
622:     \item Consider now the set of all downstream nodes $v''$, for which
623:       $\flow(v_{\pi(k)},v'') > 0$.  Due to flow conservation for $\flow$,
624:       $\sum_{v''} \flow(v_{\pi(k)},v'')=m+R_{\pi(k)}$, where
625:       $R_{\pi(k)}$ is the rate allocated to node $v_{\pi(k)}$.
626:     \item For each $v''$ as above, define $g_{\pi(k)v''}^{(k)}$ to be a
627:       message such that $|g_{\pi(k)v''}^{(k)}| = \flow(v_{\pi(k)},v')$.
628:       Partition the $m+R_{\pi(k)}$ available bits according to the values
629:       of $\flow$, and send them downstream, as illustrated in
630:       Fig.~\ref{fig:shuffle}.
631:       \begin{figure}[!ht]
632:       \centerline{\psfig{file=shuffle.eps,height=2.5cm,width=12cm}}
633:       \vspace{-3mm}
634:       \caption{\small To illustrate the operations performed at each node.
635:         In this example, five bits come into node $v_{\pi(k)}$ from 
636:         neighbouring nodes, two on the top link and three on the bottom
637:         link.  The information bits from other nodes come in the form of
638:         noisy codewords -- they need to be decoded from the received channel
639:         outputs.  Now, because flow conservation holds for $\flow$, we
640:         know that the aggregate capacity of the three output links will
641:         be at least five bits plus some local bits (the encoding of a
642:         block of local observations $U^N_{\pi(k)}$, denoted by $b_6$ and
643:         $b_7$ here).  So at this point we split those bits in a way such
644:         that the individual capacity constraints of the output links are
645:         not violated, and then they are sent on their way to $v_0$.}
646:       \label{fig:shuffle}
647:       \end{figure}
648:     \end{enumerate}
649:   \end{itemize}
650: \item To decode, at time $(k+M)N$, node $v_0$ does the following:
651:   \begin{itemize}
652:   \item Decode all channel outputs received at time $(k+M-1)N$, to recover
653:     the bits sent by each 1-hop neighbor of the sink.
654:   \item Reassemble the set of bin indices from the segments received
655:     from each neighbor.
656:   \item Perform typical set decoding (as in~\cite[pg.\ 411]{CoverT:91}),
657:     to recover the block of snapshot $[U_1^N(k)...U_M^N(k)]$.
658:   \end{itemize}
659: \end{itemize}
660: An important observation is that, in this setup, network coding (in the
661: sense of~\cite{AhlswedeCLY:00}) is not needed.  This is because we have
662: a case of $M$ sources and a single sink interested in collecting all
663: messages, a case for which it was shown in~\cite{LehmanL:04} that routing
664: alone suffices.
665: 
666: Our next task is to find conditions under which this coding strategy
667: results in $P_e^{(N)}\to 0$ as $N\to\infty$.
668: 
669: \subsubsection{Analysis of the Probability of Error}
670: 
671: The coding strategy proposed above hinges on two main elements:
672: \begin{itemize}
673: \item Slepian-Wolf codes: in this case, we know that provided the rate
674:   vector $(R_1...R_M)$ is such that, for all partitions $S$ of $\{0...M\}$,
675:   $S\neq\emptyset$, $0\in S^c$,
676:   \begin{equation}
677:   \sum_{i\in S} R_i > H(U_S|U_{S^c}),
678:   \label{eq:achievability1}
679:   \end{equation}
680:   then there exist Slepian-Wolf codes with arbitrarily low probability of
681:   error~\cite[Ch.\ 14.4]{CoverT:91}.
682: \item Network flows: from elementary flow concepts we know that if a
683:   flow $\flow$ is feasible in a network $G$, then for all
684:   $S\subseteq\{0...M\}$, $S\neq\emptyset$, $0\in S^c$,
685:   \begin{eqnarray}
686:   \sum_{i\in S} R_i
687:      & \stackrel{(a)}{=} & \sum_{i\in S,j\in V} \flow(v_i,v_j) \nonumber \\
688:      & \stackrel{(b)}{=} & \sum_{i\in S,j\in S^c} \flow(v_i,v_j) \nonumber \\
689:      & \stackrel{(c)}{\leq} & \sum_{i\in S,j\in S^c} C_{ij},
690:      \label{eq:achievability2}
691:   \end{eqnarray}
692:   where $(a)$ and $(b)$ follow from the flow conservation properties of a
693:   feasible flow (all the flow injected by the sources has to go somewhere
694:   in the network, and in particular all of it has to go across a network
695:   cut with the destination on the other side); and $(c)$ follows from the
696:   fact that in any flow network, the capacity of any cut is an upper bound
697:   to the value of any flow.
698: \end{itemize}
699: Thus, from~(\ref{eq:achievability1}) and~(\ref{eq:achievability2}), we
700: conclude that if, for all partitions $S$ as above, we have that
701: \begin{equation}
702:   H(U_S|U_{S^c}) < \sum_{i\in S,j\in S^c} C_{ij},
703: \end{equation}
704: then $P_e^{(N)}\to 0$ as $N\to\infty$.
705: 
706: \subsection{Converse Proof}
707: 
708: The converse proof is fairly long and tedious, but by virtue of being
709: based on Fano's inequality and standard information-theoretic arguments,
710: it is relatively straightforward -- therefore, we omit it here and
711: provide the technical details in Appendix~\ref{app:proof-converse-mcoop}.
712: At this point however, we would like to sketch out an informal argument
713: on why this converse should hold.
714: 
715: Consider an arbitrary network partition $S$ of $\{0...M\}$,
716: $S\neq\emptyset$, $0\in S^c$.  For each such partition we define a
717: two-terminal system, with a ``supersource'' that has access to the
718: whole vector of observations $U_1...U_M$, and a ``supersink'' that
719: has access only to $U_{S^c}$.  The supersource and supersink are
720: connected by an array of parallel DMCs: if $i\in S$ and $j\in S^c$,
721: then $(\mathcal{X}_{ij},p_{ij}(y|x),\mathcal{Y}_{ij})$ from the
722: network is one of the channels in the array.  This is illustrated
723: in Fig.~\ref{fig:oracle}.
724: 
725: \begin{figure}[!ht]
726: \centerline{\psfig{file=oracle.eps,height=3cm,width=15cm}}
727: \caption{\small An artificial two-terminal system: all sources in $S$
728:   are treated as a supersource, connected to a supersink made of all
729:   the sinks in $S^c$ by an array of DMCs (those going across the cut).
730:   Intuitively, any necessary condition for this system should also be
731:   necessary for our system (although this requires a formal statement
732:   and proof).  The interesting statement thus is to show that the
733:   set of all conditions obtained in this form (by considering all
734:   possible cuts) is also sufficient.}
735: \label{fig:oracle}
736: \end{figure}
737: 
738: Clearly, $H(U_S|U_{S^c}) < \sum_{i\in S,j\in S^c} C_{ij}$ is an outer
739: bound for this two-terminal system (follows directly from the source/channel
740: separation theorem,~\cite[Sec.\ 8.13]{CoverT:91}).  And intuitively,
741: it is also clear that any outer bound for this two-terminal system
742: provides necessary conditions for reliable communication to be possible
743: in our network.  Thus, by considering all possible partitions $(S,S^c)$
744: as above, we obtain a set of necessary conditions matching those of the
745: achievability result.\footnote{We thank our Reviewer B, for suggesting
746: this simple and very clear interpretation for the converse.}
747: 
748: We would also like to highlight that, because of the correlation between
749: sources, a simple max-flow/min-cut bounding argument as suggested
750: in~\cite[Section 14.10]{CoverT:91}) is not sufficient to establish the
751: source-channel separation result we seek -- proving said result requires
752: all the steps of a typical converse.
753: 
754: A formal proof for this converse is provided in
755: Appendix~\ref{app:proof-converse-mcoop}.
756: 
757: \subsection{Special Cases}
758: 
759: \subsubsection{A Network with Three Nodes}
760: \label{sec:three-nodes}
761: 
762: To provide an illustration of the meaning of Theorem~\ref{thm:main}, and
763: of the optimality of the flow-based solution, we specialize
764: Theorem~\ref{thm:main} to the case of a network with three nodes.  In
765: this case, those conditions become:
766: \begin{eqnarray}
767: H(U_1|U_2U_0) & < & C_{10} + C_{12} \label{eq:3nodes-1} \\
768: H(U_2|U_1U_0) & < & C_{20} + C_{21} \label{eq:3nodes-2} \\
769: H(U_1U_2|U_0) & < & C_{10} + C_{20} \label{eq:3nodes-3}.
770: \end{eqnarray}
771: A network with three nodes as considered here is illustrated in
772: Fig.~\ref{fig:three-nodes}.
773: 
774: \begin{figure}[!ht]
775: \centerline{\psfig{file=three-nodes.eps,height=2cm}}
776: \caption{\small A network with three nodes.}
777: \label{fig:three-nodes}
778: \end{figure}
779: 
780: Next, we regard the network in Fig.~\ref{fig:three-nodes} as a
781: {\em flow} network~\cite[Ch.\ 26]{CormenLRS:01}: a flow network with
782: two sources ($v_1$ and $v_2$) and a single sink ($v_0$).  Encodings
783: of $U_1$ injected at source $v_1$ at rate $R_1$, and of $U_2$ injected
784: at $v_2$ at rate $R_2$, are the ``objects'' that flow in this network
785: and are to be delivered to the sink $v_0$.  This is illustrated in
786: Fig.~\ref{fig:three-nodes-flownetwork}.
787: \begin{figure}[ht]
788: \centerline{\psfig{file=three-nodes-flownetwork.eps,height=1.8cm}}
789: \caption{\small A flow network with three nodes, supplies $R_1$ and
790:   $R_2$ and nodes $v_1$ and $v_2$, and a sink $v_0$.}
791: \label{fig:three-nodes-flownetwork}
792: \end{figure}
793: 
794: In the simple flow network of Fig.~\ref{fig:three-nodes-flownetwork},
795: any feasible flow $\flow$ must satisfy some {\em conservation} equations:
796: \[\begin{array}{rclcl}
797:   R_1 & = & \flow(v_1,v_0)+\flow(v_1,v_2), \\
798:   R_2 & = & \flow(v_2,v_0)+\flow(v_2,v_1), \\
799:   R_1+R_2 & = & \flow(v_1,v_0)+\flow(v_1,v_2)+\flow(v_2,v_0)+\flow(v_2,v_1)
800:           & = & \flow(v_1,v_0)+\flow(v_2,v_0),
801: \end{array}\]
802: where the last equality follows from the fact that flow conservation
803: holds: the total amount of flow injected ($R_1+R_2$) must equal the total
804: amount of flow received by the sink
805: ($\flow(v_1,v_0)+\flow(v_2,v_0)$)~\cite{CormenLRS:01}.  Similarly, any
806: feasible flow must also satisfy all {\em capacity} constraints:
807: \[\begin{array}{rcl}
808:   \flow(v_1,v_0)+\flow(v_1,v_2) & \leq & C_{10}+C_{12}, \\
809:   \flow(v_2,v_0)+\flow(v_2,v_1) & \leq & C_{20}+C_{21}, \\
810:   \flow(v_1,v_0)+\flow(v_2,v_0) & \leq & C_{10}+C_{20}.
811: \end{array}\]
812: Combining these last two sets of constraints, and the conditions from
813: the Slepian-Wolf theorem on feasible $(R_1,R_2)$ pairs, we immediately
814: get
815: \[\begin{array}{rcccl}
816:   H(U_1|U_2U_0) & < & R_1 & \leq & C_{10}+C_{12}, \\
817:   H(U_2|U_1U_0) & < & R_2 & \leq & C_{20}+C_{21}, \\
818:   H(U_1U_2|U_0) & < & R_1+R_2 & \leq & C_{10}+C_{20}.
819: \end{array}\]
820: 
821: It is interesting to observe in this argument that the region of
822: achievable rates forms a convex polytope, in which three of its
823: faces come from the Slepian-Wolf conditions, and three come from
824: the capacity constraints.  This polytope is illustrated in
825: Fig.~\ref{fig:polytope}.
826: \begin{figure}[ht]
827: \centerline{\psfig{file=polytope.eps,width=12cm,height=4cm}}
828: \caption{The polytope $\mathcal{R}$ of admissible rates.}
829: \label{fig:polytope}
830: \end{figure}
831: This polytope plays a central role in our analysis: reliable
832: communication is possible {\em if and only if} $\mathcal{R}\neq\emptyset$.
833: Thus, the view of ``information as a flow'' in this class of networks
834: is complete.
835: 
836: \subsubsection{No Cooperation and No Side Information at $v_0$}
837: 
838: We consider now the special case of $M$ {\em non-}cooperating nodes and
839: one sink, as illustrated in Fig.~\ref{fig:m-nodes}.
840: Necessary and sufficient conditions for reliable communication
841: under this scenario follow naturally from our main theorem by setting
842: $C_{ij}=0$ for all $j\neq 0$, and $|\mathcal{U}_0 | = 1$.
843: 
844: \begin{figure}[!ht]
845: \centerline{\psfig{file=m-nodes.eps,width=12cm,height=2.5cm}}
846: \caption{\small M non-cooperating nodes.}
847: \label{fig:m-nodes}
848: \end{figure}
849: 
850: \begin{corollary}
851: \label{cor:indep}
852: The sources $U_1, U_2,\dots, U_M$ can be communicated reliably over an
853: array of independent channels of capacity $C_{i0}$, $i=1\dots M$, if and
854: only if
855: \[
856:   H(U_S|U_{S^c})<\sum_{i\in S}C_{i0},
857: \]
858: for all subsets $S\subseteq\{1,2,\dots,M\}$, $S\neq\emptyset$.
859: \end{corollary}
860: 
861: An illustration of this corollary for two sources $U_1$ and $U_2$ is
862: shown in Fig.~\ref{fig:SWMAC1}.
863: \begin{figure}[ht]
864: \centerline{\psfig{width=8cm,height=5cm,file=swmac1.eps}
865:             \psfig{width=8cm,height=5cm,file=swmac2.eps}}
866: \caption[Relationship between the Slepian-Wolf region and the capacity
867:   region for two independent channels.]
868:   {Relationship between the Slepian-Wolf region and the capacity
869:   region for two independent channels. In the left figure, as
870:   $H(U_1|U_2)<C_{10}$ and $H(U_2|U_1)<C_{20}$ the two regions intersect
871:   and therefore reliable communication is possible. The figure on the
872:   right shows the case in which $H(U_2|U_1)>C_{20}$ and there is no
873:   intersection between the two regions.}
874: \label{fig:SWMAC1}
875: \end{figure}
876: When we have two independent channels with capacities $C_{10}$ and
877: $C_{20}$, the capacity region becomes a rectangle with side lengths
878: $C_{10}$ and $C_{20}$~\cite[Chapter~14.3]{CoverT:91}.
879: Also shown is the Slepian-Wolf region of achievable rates for separate
880: encoding of correlated sources.
881: Clearly, $H(U_1U_2)<C_{10}+C_{20}$ is a necessary
882: condition for reliable communication as a consequence of Shannon's joint
883: source and channel coding theorem for point-to-point communication.
884: Assuming that this is the case, consider now the following possibilities:
885: \begin{itemize}
886: \item $H(U_1)<C_{10}$ and $H(U_2)<C_{20}$.  The Slepian-Wolf region and the
887:   capacity region
888:   intersect, so any point $(R_1,R_2)$ in this intersection makes reliable
889:   communication possible.  Alternatively, we can argue that reliable
890:   transmission of $U_1$ and $U_2$ is possible even with independent decoders,
891:   therefore a joint decoder will also achieve an error-free reconstruction
892:   of the source.
893: \item $H(U_1)>C_{10}$ and $H(U_2)>C_{20}$.  Since $H(U_1U_2)<C_{10}+C_{20}$
894:   there is always at least one point of intersection between the Slepian-Wolf
895:   region and the capacity region, so reliable communication is possible.
896: \item $H(U_1)<C_{10}$ and $H(U_2)>C_{20}$ (or vice versa).  If
897:   $H(U_2|U_1)<C_{20}$ (or if $H(U_1|U_2)<C_{10}$) then the two regions will
898:   intersect.  On the other hand, if $H(U_2|U_1)>C_{20}$ (or if
899:   $H(U_1|U_2)>C_{10}$), then there are no intersection
900:   points, but it is not immediately clear whether reliable communication
901:   is possible or not (see Fig. \ref{fig:SWMAC1}), since examples are known
902:   in which the intersection between the capacity region of the multiple
903:   access channel and the Slepian-Wolf region of the correlated sources
904:   is empty and still reliable communication is possible~\cite{CoverGS:80}.
905: \end{itemize}
906: Corollary~\ref{cor:indep} gives a definite answer to this last question:
907: in the special case of correlated sources and independent channels an
908: intersection between the capacity region and the Slepian-Wolf rate regions
909: is not only sufficient, but also a necessary condition for reliable
910: communication to be possible---in this case, separation holds.
911: 
912: \subsubsection{Arrays of Gaussian Channels}
913: 
914: We should also mention that Theorem~\ref{thm:main} applies to other
915: channel models that are relevant in practice, for instance Gaussian channels
916: with orthogonal multiple access.  For simplicity, we illustrate 
917: this issue  in the context of
918: Corollary~\ref{cor:indep}. The capacity of the Gaussian
919: multiple access channel with $M$ independent sources is given by
920: \[
921:   \sum_{i\in S} R_i
922:     \leq \frac{1}{2}\log\left(1+\frac{\sum_{i\in S}P_i}{\sigma^2}\right),
923: \]
924: for all $S\subseteq\{1...M\}$, $S\neq\emptyset$, and where $\sigma^2$ and
925: $P_i$ are the noise power and the power of the $i$-th user
926: respectively~\cite[pp.\ 378-379]{CoverT:91}.  If we use orthogonal accessing
927: (e.g.~TDMA), and assign different time slots to each of the transmitters,
928: then the Gaussian multiple access channel is reduced to an array of $M$
929: independent single-user Gaussian channels each with capacity
930: \[
931:   C_{i0} =
932:   \tau_{i0}\cdot\frac{1}{2}\log\bigg(1+\frac{P_{i0}}{\sigma^2\tau_{i0}}\bigg),
933:   \qquad 1\le i\le M,
934: \]
935: where $\tau_{i0}$ is the time fraction allocated to source user $i$ to
936: communicate with the data collector node $v_0$, and $P_{i0}$ is the
937: corresponding power allocation.
938: 
939: Applying Theorem~\ref{thm:main}, we obtain the reachback capacity
940: of the Gaussian channel with orthogonal accessing.\footnote{The
941: generalization of Theorem~\ref{thm:main} for channels with real-valued
942: output alphabets can be easily obtained using the techniques
943: in~\cite[Sec.\ 9.2 \& Ch.\ 10]{CoverT:91}.}  Then, reliable
944: communication is possible if and only if
945: \[
946:    H(U_S|U_{S^c}) \leq
947:      \sum_{i\in S}\frac{\tau_{i0}}{2}
948:                   \log\bigg(1+\frac{P_{i0}}{\sigma^2\tau_{i0}}\bigg),
949: \]
950: for all subsets $S\subseteq\{1,2,\dots,M\}$, $S\neq\emptyset$.
951: 
952: 
953: \section{Practical/Engineering Implications of Theorem~\ref{thm:main}}
954: \label{sec:protocol-stack}
955: 
956: \subsection{An Information Theoretically Optimal Protocol Stack}
957: 
958: We believe that the fact that in networks of point-to-point noisy
959: links with one sink
960:  Shannon information has the exact same properties of classical
961: network flows is of particular {\em practical} relevance.  This is
962: so because there is a rich {\em algorithmic} theory associated with
963: it, which allows us to cast standard information theoretic problems
964: into the language of flows and optimization.  Perhaps most relevant
965: among these is is the optimality of implementing codes using a
966: {\em layered} protocol stack, as illustrated in
967: Fig.~\ref{fig:layers-3nodes}.
968:  
969: \begin{figure}[!ht]
970: \centerline{\psfig{file=layers-3nodes.eps,width=15cm,height=12cm}}
971: \caption{\small Abstractions that follow from the achievability proof,
972:   illustrated here for three nodes.  At the physical layer there are
973:   nodes with power constraints, a data field of which these nodes collect
974:   samples in space and time, and a gateway node that will deliver all
975:   this data to destination.  On top of this physical substrate, we
976:   construct a sequence of abstractions: noiseless point-to-point links
977:   of a given capacity (the {\em Link Layer}); a flow network (the
978:   {\em Network Layer}); a set of connections (the {\em Transport Layer});
979:   and a set of distributed signal processing algorithms for sampling,
980:   compression and interpolation of the space/time continuous process
981:   (the {\em Presentation Layer}).  In the end, an approximate
982:   representation of the underlying data field is delivered to
983:   applications.}
984: \label{fig:layers-3nodes}
985: \end{figure}
986:  
987: As discussed in the Introduction, the decision to turn a wireless
988: network into a network of point-to-point links is an arbitrary one.
989: But, due to complexity and/or economic considerations, this arbitrary
990: decision is one made very often, and thus we believe it is of great
991: practical interest to understand what are appropriate design criteria
992: for such networks.  And our Theorem~\ref{thm:main} offers valuable
993: insights in this regard -- {\em if} we decide to define a link-layer
994: based on a MAC protocol that deals with interference by suppressing
995: it, {\em then all remaining layers in Fig.~\ref{fig:layers-3nodes}
996: follow from the achievability proof of Theorem~\ref{thm:main}.}  We
997: see therefore that indeed, in this class of networks,
998: Fig.~\ref{fig:layers-3nodes} provides a set of abstractions analogous
999: to those of Fig.~\ref{fig:shannon-pt2pt} for classical two-terminal
1000: systems.
1001: 
1002: \subsection{Algorithmic/Computational Issues}
1003: \label{sec:algorithmic-issues}
1004: 
1005: As an illustration of the benefits of the ``information as flow''
1006: interpretation for our results, in this subsection we outline some
1007: initial results on an optimal routing problem.  This topic however
1008: will be developed in full depth elsewhere.
1009: 
1010: \subsubsection{Optimization Aspects of Protocol Design}
1011: 
1012: A natural question that follows from our previous developments is
1013: one of {\em optimization}: given a non-empty feasibility polytope
1014: $\mathcal{R}$, we have the freedom of choosing among multiple
1015: assignments of values to flow variables, and thus it is only natural
1016: to ask if there is an optimal flow.  To this end, we define a cost
1017: function $\kappa$ as follows:
1018: \[
1019:   \kappa(\flow) = \sum_{(v_i,v_j)\in E} c(v_i,v_j)\cdot\flow(v_i,v_j),
1020: \]
1021: where $c(v_i,v_j)$ is a constant that, multiplied by the total number
1022: of bits $\flow(v_i,v_j)$ that a flow $\flow$ assigns to an edge
1023: $(v_i,v_j)$, determines the cost of sending all that information over
1024: the channel $(\mathcal{X}_{ij},p_{ij}(y|x),\mathcal{Y}_{ij})$.  The
1025: resulting optimization problem is shown in Fig.~\ref{fig:lp-optrouting}.
1026: 
1027: \begin{figure}[ht]
1028: \begin{center}
1029: \fbox{\begin{minipage}{11.7cm}
1030: min \hspace{5mm} $\sum_{(v_i,v_j)\in E}\;\;c(v_i,v_j)\cdot\flow(v_i,v_j)$ \\
1031: $\;$ subject to: \vspace{-1mm}
1032: \[\begin{array}{lll}
1033:  & \mbox{\tiny\sl Standard flow constraints (capacity / skew symmetry / flow conservation)} \\
1034:  & \flow(v_i,v_j) \leq C_{ij}, & 0\leq i,j\leq M. \\
1035:  & \flow(v_i,v_j) = -\flow(v_j,v_i), & 0\leq i,j\leq M. \\
1036:  & \sum_{v\in V} \flow(v_i,v) = 0, & 1\leq i\leq M. \\
1037:  & \mbox{\tiny\sl Rate admissibility constraints} \\
1038:  & H(U_S|U_{S^c}) < \sum_{i\in S} \flow(s,v_i)
1039:                   \leq \sum_{i\in S,j\in S^c} C_{ij},
1040:    & S\subseteq\{1...M\}, S\neq\emptyset. \\
1041:  & \flow(s,v_i) = R_i, & 1\leq i\leq M.
1042: \end{array}\]
1043: \end{minipage}}\end{center}
1044: \vspace{-1mm}
1045: \caption{\small Linear programming formulation for the assignment of
1046:   values to flow variables (observe the introduction of a ``supersource''
1047:   $s$, which supplies $R_i$ units of flow to $v_i$).  A solution to this
1048:   problem provides optimal routes (those with positive flow assignment)
1049:   and loads on each link.  Note as well that, by choosing $c(v_i,v_j)=0$
1050:   for all $(v_i,v_j)\in E$, this LP is solvable if and only if
1051:   $\mathcal{R}\neq\emptyset$ -- that is, the decision problem for reliable
1052:   communication (i.e., for whether a given load $p(U_0U_1...U_M)$ can be
1053:   carried over a given network $G$) admits a linear programming formulation
1054:   too.}
1055: \label{fig:lp-optrouting}
1056: \end{figure}
1057: 
1058: The choice of a linear cost model in this setup can be justified based
1059: on a number of reasons.  First of all, linearity is a very natural
1060: assumption: in simple language, it says that it costs twice as much to
1061: double the amount of information sent on any channel.  For example, we
1062: could take $c(v_i,v_j)$ to be the {\em minimum energy per information
1063: bit} required for reliable communication over the DMC from $v_i$ to
1064: $v_j$~\cite{Verdu:02}, and then $\kappa(\flow)$ would give us the sum of
1065: the energy consumed by all nodes when transporting data as dictated
1066: by a particular flow $\flow$.  Specifically in the context of routing
1067: problems, another important consideration is that the main drawback
1068: often cited for solving optimal routing problems based on network flow
1069: formulations is given by the fact that cost functions such as $\kappa$
1070: only optimize {\em average} levels of link traffic, ignoring other
1071: traffic statistics~\cite[pg.\ 436]{BertsekasG:92}.  But this is not
1072: at all an issue here, since the values of flow variables (i.e.,
1073: Shannon information) are already average quantities themselves.
1074: 
1075: \subsubsection{A Routing Example}
1076: 
1077: As one example of the usefulness of the LP formulation in
1078: Fig.~\ref{fig:lp-optrouting}, we consider next the problem of designing
1079: efficient mechanisms for data aggregation, as motivated
1080: in~\cite{IntanagonwiwatGEHS:03}.  There has been a fair amount of work
1081: reported in the networking literature, on the design and performance
1082: analysis of {\em tree} structures for aggregation---for example, the
1083: work of Goel and Estrin on the construction of trees that perform well
1084: simultaneously under multiple concave costs~\cite{GoelE:03}.  Based on
1085: our LP formulation, we construct two examples which show the extent to
1086: which trees could give rise to suboptimalities, as opposed to other
1087: topological structures.  And we start by showing an example in which,
1088: although $\mathcal{R}\neq\emptyset$, there are no feasible trees.  This
1089: case is illustrated in Fig.~\ref{fig:trees-stink-1}.
1090: 
1091: \begin{figure}[ht]
1092: \centerline{\psfig{file=trees1.eps,width=5cm,height=3cm}
1093:             \psfig{file=trees1a.eps,width=5cm,height=3cm}
1094:             \psfig{file=trees1b.eps,width=5cm,height=3cm}}
1095: \vspace{-2mm}
1096: \caption{To illustrate a solvable problem that cannot be solved using trees.
1097:   Left: a flow network; middle/right: the decomposition of a feasible
1098:   flow into two single flows, showing how much of the flow injected at each
1099:   source is sent over which link ($x/c$ next to an edge means that the
1100:   edge carries $x$ units of flow, and has capacity $c$).}
1101: \label{fig:trees-stink-1}
1102: \end{figure}
1103: 
1104: As illustrated in Fig.~\ref{fig:trees-stink-1}, a solution to the
1105: transport problem exists.  However, it is easy to check that if we
1106: constrain data to flow along trees, none of the three possible trees
1107: ($\{(v_1,v_0);(v_2,v_0)\}$, or $\{(v_1,v_2);(v_2,v_0)\}$, or
1108: $\{(v_2,v_1);(v_1,v_0)\}$) are feasible: in all cases, there is one
1109: link for which the capacity constraint is violated.
1110: 
1111: Next we consider a case where feasible trees exist, but the lowest
1112: cost of any tree differs from the optimal cost by an arbitrarily large
1113: factor.  This case is illustrated in Fig.~\ref{fig:trees-stink-2}.
1114: 
1115: \begin{figure}[ht]
1116: \centerline{\psfig{file=trees2.eps,width=5cm,height=3cm}
1117:             \psfig{file=trees2a.eps,width=5cm,height=3cm}
1118:             \psfig{file=trees2b.eps,width=5cm,height=3cm}}
1119: \vspace{-2mm}
1120: \caption{To illustrate a problem in which trees are very expensive.
1121:   Left: a flow network with costs; right: an optimal solution to the
1122:   linear program in Fig.~\ref{fig:lp-optrouting}.  Such a case could
1123:   arise, e.g., in a situation where there is heavy interference in
1124:   the direct path from $v_1$ to $v_0$.}
1125: \label{fig:trees-stink-2}
1126: \end{figure}
1127: 
1128: In this case, there exists only one feasible tree:
1129: $\{(v_1,v_0);(v_2,v_0)\}$, with cost $\ell(1+\epsilon)+1$.  However,
1130: because of the ``expensive'' link $(v_1,v_0)$ along which the tree is
1131: forced to send all its data, the cost is significantly increased:
1132: by splitting the encoding of $U_1$
1133: as illustrated in Fig.~\ref{fig:trees-stink-2}, the cost incurred into
1134: by this structure would be $\epsilon\ell+3$.  Hence, we see that in
1135: this case, the cost of the best feasible tree is
1136: $\frac{\ell(1+\epsilon)+1}{\epsilon\ell+3}$ times larger than that
1137: of an optimal solution allowing splits.  And this
1138: ``overpayment factor'' could be significant: when $\ell$ is large,
1139: this is $\approx 1+\frac 1 \epsilon$, and it grows unbound for
1140: small $\epsilon$.
1141: 
1142: Note as well that any time that a network is operated close to capacity,
1143: it will be necessary to split flows.  And that is a situation likely to
1144: be encountered often in power-constrained networks, since minimum energy
1145: designs will necessarily result in links being allocated the least amount
1146: of power needed to carry a given traffic load.  Thus, we see that these
1147: examples above are {\em not} pathological cases of limited practical
1148: interest, but instead, they are good representatives of situations likely
1149: to be encountered often in practice.
1150: 
1151: \subsection{Suboptimality of Correlated Codes for Orthogonal Channels}
1152: 
1153: The key ingredient of the achievability proof presented by Cover,
1154: El Gamal and Salehi for the multiple access channel with correlated
1155: sources is the generation of random codes, whose codewords $X_i^N$ are
1156: statistically dependent on the source sequences $U_i^N$~\cite{CoverGS:80}.
1157: This property, which is achieved by drawing the codewords according to
1158: $\prod_{j=1}^{N}p(x_{ij}|u_{ij})$ with $u_{ij}$ and $x_{ij}$ denoting
1159: the $j$-th element of $U_i^N$ and $X_i^N$, respectively, implies that
1160: $U_i^N$ and $X_i^N$ are jointly typical with high probability.  Since
1161: the source sequences $U_1^N$ and $U_2^N$ are correlated, the codewords
1162: $X_1^N(U_1^N)$ and $X_2^N(U_2^N)$ are also correlated, and so we speak
1163: of {\it correlated codes}.  This class of random codes, which is treated
1164: in more general terms in~\cite{AhlswedeH:83}, can be viewed as joint
1165: source and channel codes that preserve the given correlation structure
1166: of the source sequences, based upon which the decoder can lower the
1167: probability of error.
1168: 
1169: The class of correlated codes is of interest to us because of two
1170: main reasons:
1171: \begin{itemize}
1172: \item From a practical point of view, correlated codes have a very
1173:   strong appeal: sensor nodes with limited processing capabilities may
1174:   be forced to use very simple codes that do not eliminate correlations
1175:   between measurements prior to transmission~\cite{BarrosTL:03} (e.g.,
1176:   a simple scalar quantizer and simple BPSK modulation).
1177: \item From a theoretical point of view, since these codes yield the
1178:   largest known admissibility region for the problem of communicating
1179:   distributed sources over multiple-access channels, it would be interesting
1180:   to know how these codes fare in our context, where we know separate
1181:   source and channel coding to achieve optimality.
1182: \end{itemize}
1183: Thus, specializing the achievability proof of~\cite{CoverGS:80} to the
1184: case of $M$ independent channels, we get the following result.
1185: 
1186: \begin{corollary}[From Theorem 1 of~\cite{CoverGS:80}]
1187: \label{cor:Machievable}
1188: A set of correlated sources $[U_1U_2...U_M]$ can be communicated
1189: reliably over independent channels
1190: $(\mathcal{X}_1,p(y_1|x_1),\mathcal{Y}_1)\dots
1191: (\mathcal{X}_M,p(y_M|x_M),\mathcal{Y}_M)$ to a sink $v_0$, if
1192: \[
1193:    H(U_S|U_{S^c})<\sum_{i\in S} I(X_i;Y_0|U_{S^c}),
1194: \]
1195: for all subsets $S\subseteq\{1,2,\dots,M\}$, $S\neq\emptyset$.
1196: \end{corollary}
1197: \begin{proof}
1198: This result can be obtained from the $M$-source version of the main theorem
1199: in ~\cite{CoverGS:80}, by specializing it to a multiple access channel with
1200: conditional probability distribution
1201: \[ p(y|x_1x_2...x_M)
1202:      = p(y_1y_2\dots y_M|x_1x_2\dots x_M) = \prod_{i=1}^Mp(y_i|x_i).
1203: \]
1204: \end{proof}
1205: 
1206: Part of the reason why we feel this is an interesting result is that the
1207: main theorem in~\cite{CoverGS:80} does {\em not} immediately specialize
1208: to Corollary~\ref{cor:indep}: whereas the achievability results do
1209: coincide,~\cite{CoverGS:80} does not provide a converse.  To illustrate
1210: this point better, we focus now on the case of $M=2$:
1211: \begin{itemize}
1212: \item In general, we have that
1213:   $I(X_1X_2;Y_1Y_2) \leq I(X_1;Y_1)+I(X_2;Y_2)$, for any 
1214:   $p(u_1u_2x_1x_2)p(y_1|x_1)p(y_2|x_2)$; but for this upper bound on
1215:   the sum-rate to be achieved, we must take
1216:   $p(u_1u_2x_1x_2) = p(u_1u_2)p(x_1)p(x_2)$ -- that is, the codewords
1217:   must be drawn independently of the source.  And for this special case,
1218:   our Theorem~\ref{thm:main} does provide a converse.
1219: \item As argued earlier, due to practical considerations it may not be
1220:   feasible to remove correlations in the source before choosing channel
1221:   codewords, in which case we face a situation where correlated codes
1222:   are used, despite their obvious suboptimality.  In this case, it is
1223:   of interest to determine the rate losses resulting from the use of
1224:   correlated codes, defined as $\Delta_1 = I(X_1;Y_1)-I(X_1;Y_1|U_2)$,
1225:   $\Delta_2 = I(X_2;Y_2)-I(X_2;Y_2|U_1)$, and
1226:   $\Delta_0 = I(X_1;Y_1)+I(X_2;Y_2)-I(X_1X_2;Y_1Y_2)$.  Straightforward
1227:   manipulations show that $\Delta_1 = I(Y_1;U_2)$, $\Delta_2 = I(Y_2;U_1)$,
1228:   and $\Delta_0 = I(Y_1;Y_2)$.
1229: \item Since $\Delta_i\geq 0$, $i\in\{0,1,2\}$ (mutual information is
1230:   always nonnegative), we conclude that the region of achievable rates
1231:   given by Corollary~\ref{cor:Machievable} is contained in the region
1232:   defined by Corollary~\ref{cor:indep}.  Furthermore, we find that the
1233:   rate loss terms have a simple, intuitive interpretation: $\Delta_0$
1234:   is the loss in sum rate due to the dependencies between the outputs
1235:   of different channels, and $\Delta_1$ (or $\Delta_2$) represent the
1236:   rate loss due to the dependencies between the outputs of channel $1$
1237:   (or $2$) and the source transmitted over channel $2$ (or $1$).  All
1238:   these terms become zero if, instead of using correlated codes, we fix
1239:   $p(x_1)p(x_2)$ and remove the correlation between the source blocks
1240:   before transmission over the channels.
1241: \end{itemize}
1242: At first glance, this observation may seem somewhat surprising, since
1243: the problem addressed by Corollary~\ref{cor:indep} is a special case
1244: of the multiple access channel with correlated sources considered
1245: in~\cite{CoverGS:80}, where it is shown that in the general case
1246: correlated codes outperform the concatenation of Slepian-Wolf codes
1247: (independent codewords) and optimal channel codes.  The crucial
1248: difference between the two problems is the presence (or absence)
1249: of interference in the channel.  Albeit somewhat informally, we can
1250: state that correlated codes are advantageous when the transmitted
1251: codewords are combined in the channel through interference, which
1252: is obviously not the case in our problem.  Practical code constructions
1253: built around this observation have been reported in~\cite{BarrosTL:03}.
1254: 
1255: 
1256: \section{Conclusions}
1257: \label{sec:conclusions}
1258: 
1259: \subsection{Summary}
1260: 
1261: In this paper we have considered the problem of encoding a set of
1262: distributed correlated sources for delivery to a single data collector
1263: node over a network of DMCs.  For this setup we were able to obtain
1264: single-letter information theoretic conditions that provide an exact
1265: characterization of the admissibility problem.  Two important conclusions
1266: follow from the achievability proof:
1267: \begin{itemize}
1268: \item Separate source/channel coding is optimal in any network with one
1269:   sink in which interference is dealt with at the MAC layer by creating
1270:   independent links among nodes.
1271: \item In such networks, the properties of Shannon information are
1272:   exactly identical to those of water in pipes -- information is a
1273:   flow.
1274: \end{itemize}
1275: 
1276: \subsection{Discussion}
1277: 
1278: A few interesting observations follow from our results:
1279: 
1280: \begin{itemize}
1281: \item It is a well known fact that turning a multiple access channel
1282:   into an array of orthogonal channels by using a suitable MAC protocol
1283:   is a suboptimal strategy in general, in the sense that the set of
1284:   rates that are achievable with orthogonal access is strictly contained
1285:   in the Ahlswede-Liao capacity region~\cite[Ch.\ 14.3]{CoverT:91}.
1286:   However, despite its inherent suboptimality, there are strong economic
1287:   incentives for the deployment of networks based on such technologies,
1288:   related to the low complexity and cost of existing solutions, as well
1289:   as experience in the fabrication and operation of such systems.  As
1290:   a result, most existing standard implementations we are aware of
1291:   (e.g., the IEEE 802.11 and 802.15.* families, or Bluetooth), are
1292:   based on variants of protocols like TDMA/FDMA/CDMA or Aloha, that
1293:   treat interference among users as noise or collisions, and deal with
1294:   it by creating orthogonal links.  We feel therefore that some of the
1295:   interest in our results stems from the fact that they provide a thorough
1296:   analysis for what we deem to be, with high likelihood, the vast majority
1297:   of wireless communication networks to be deployed for the foreseeable
1298:   future.
1299: \item A basic question follows from the results in this paper: when
1300:   exactly does Shannon information act like a classical flow in a network
1301:   setup?  In this paper, we showed that far more often than common wisdom
1302:   would suggest: 
1303:   for {\em any} network made up of independent links and one sink,
1304:   Shannon information is a flow.  The assumption of independence among
1305:   channels is crucial, since well known counterexamples hold without
1306:   it~\cite{CoverGS:80}.  But, as argued before, far from being just some
1307:   technical assumption needed for the theory to hold, independent channels
1308:   arise naturally in practical applications.  In establishing the flow
1309:   properties of information, we showed how some well understood network
1310:   flow tools can be applied to address network design problems that
1311:   have traditionally been difficult to deal with using standard tools
1312:   in network information theory, and we illustrated this with a simple
1313:   example involving optimal routing.  In particular we showed that, at
1314:   least from an information theoretic point of view, there is little
1315:   justification for the common practice of designing {\em trees} for
1316:   collecting data picked up by a sensor network, thus opening up
1317:   interesting problems of protocol design.
1318: \item In retrospect, perhaps the results we prove in this paper should
1319:   not have been surprising.  In the context of two-terminal networks, we
1320:   do know the following:
1321:   \begin{itemize}
1322:   \item Feedback does not increase the capacity.  Therefore, the capacity
1323:     of individual links is unaffected by the ability of our codes to
1324:     establish a conference mechanism among nodes.
1325:   \item Compression rates are not reduced by explicit cooperation, as it
1326:     follows from the Slepian-Wolf theorem: the minimum rate required to
1327:     communicate $U_1$ to a decoder that has access to side-information
1328:     $U_0$ is $H(U_1|U_0)$, and knowledge of $U_0$ does not reduce the
1329:     rates needed for coding $U_1$.  Therefore, the amount of information
1330:     that needs to flow through our network is not reduced either by the
1331:     ability of nodes to establish conferences.
1332:   \end{itemize}
1333:   Of course the statements above only hold for individual links, and a
1334:   proof was needed to carry that intuition to the general network setup
1335:   considered in this work.  But those observations we think are the
1336:   key to understanding why our results hold.
1337: \end{itemize}
1338: 
1339: \subsection{Future Work}
1340: 
1341: After having established coding theorems for the problem of network
1342: information flow with correlated sources, a natural question that arises:
1343: what if, in a given scenario, $\mathcal{R}=\emptyset$?  In that case,
1344: the best we can hope for is to reconstruct an {\em approximation} to
1345: the original source message --- and the answer is given by rate-distortion
1346: theory~\cite{Berger:71}.  The rate-distortion formulation of our
1347: problem in the case of non-cooperating encoders is equivalent to the
1348: well known (and still open) {\em Multiterminal Source Coding}
1349: problem~\cite{Berger:78}.  Our current efforts are focused on completing
1350: work on the rate/distortion problem, and on fully developing the ideas
1351: outlined in Section~\ref{sec:algorithmic-issues} (e.g., to deal with
1352: problems of the type considered in~\cite{Chiang:05}).
1353: 
1354: \section*{Acknowledgements}
1355: 
1356: The authors most gratefully acknowledge discussions with Neri Merhav,
1357: whose insightful comments on an earlier version of this manuscript led
1358: to substantial improvements, as well as the valuable feedback from all
1359: reviewers (and particularly from reviewer B).  They also wish to thank
1360: Toby Berger and Te Sun Han for helpful discussions, and Joachim Hagenauer
1361: for financial support without which they would have not been able to
1362: work together.  The second author is also grateful to Mung Chiang, Eric
1363: Friedman, \'Eva Tardos and Sergio Verd\'u, for useful discussions and
1364: feedback on this work.
1365: 
1366: 
1367: \appendix
1368: 
1369: \subsection{Converse Proof for Theorem~\ref{thm:main}}
1370: \label{app:proof-converse-mcoop}
1371: 
1372: \subsubsection{Preliminaries}
1373: 
1374: Assume there exists a sequence of codes such that the decoder at $v_0$
1375: is capable of producing a perfect reconstruction of blocks of $N$ snapshots
1376: ${\bf U} = [U_0^NU_1^N...U_M^N]$, with $P_e^{(N)}\to 0$ as $N\to\infty$.
1377: Consider now decoding $L$ blocks of $N$ snapshots (indexed by $l=0...L-1$):
1378: \begin{itemize}
1379: \item The $1$-st block of snapshots ($l=0$) is computed based on
1380:   messages $Y_{i0}^N$ received by $v_0$ from all nodes $v_i$ at
1381:   times $kN$ ($k=0\,...\,W\!-\!1$).
1382: \item The $2$-nd block of snapshots ($l=1$) is computed based on
1383:   messages $Y_{i0}^N$ received by $v_0$ from all nodes $v_i$ at
1384:   times $kN$ ($k=1\,...\,W$).
1385: \item[] $\vdots$
1386: \item The $L$-th block of snapshots ($l=L-1$) is computed based on
1387:   messages $Y_{i0}^N$ received by $v_0$ from all nodes $v_i$ at
1388:   times $kN$ ($k=L\!-\!1\,...\,W\!+\!(L\!-\!2)$).
1389: \end{itemize}
1390: Thus, we regard the network as a {\em pipeline}, in which ``packets''
1391: (i.e., blocks of $N$ source symbols injected by each source) take
1392: $NW$ units of time to flow, and each source gets to inject $L$ packets
1393: total.  We are interested in the behavior of this pipeline in the
1394: regime of large $L$.
1395: 
1396: For any fixed $L$, the probability of {\em at least one} of the $L$ blocks
1397: being decoded in error is $P_e^{(LN)} = 1-(1-P_e^{(N)})^L$.  Thus, from the
1398: existence of a code with low {\em block} probability of error we
1399: can infer the existence of codes for which the probability of error
1400: for the entire pipeline is low as well, by considering a large enough
1401: block length $N$.
1402: 
1403: We begin with Fano's inequality. If there
1404: is a suitable code as defined in the problem statement, then we must
1405: have
1406: \begin{equation}
1407:   H(U_1^{LN}U_2^{LN}\dots U_M^{LN}
1408:     | \hat{U}_1^{LN}\hat{U}_2^{LN}\dots \hat{U}_M^{LN})
1409:   \;\; \leq \;\;
1410:   P_e^{(LN)} \log\left(|{\mathcal U}_1^{LN}\!\times{\mathcal U}_2^{LN}
1411:                        \!\times\dots\times{\mathcal U}_M^{LN}|\right)
1412:                        + h(P_e^{(LN)}),
1413:   \label{eq:fano2}
1414: \end{equation}
1415: where $h(\cdot)$ denotes the binary entropy function, and
1416: $\hat U_{i}^{LN}=(\hat U_{i}^N(1),\hat U_{i}^N(2),\dots,\hat U_{i}^N(L))$
1417: denotes $L$ blocks of $N$ snapshots reconstructed at $v_0$.
1418: For convenience, we define also
1419: \[ \delta(P_e^{(LN)}) \;\; = \;\;
1420:    \left(P_e^{(LN)}\log\left(|{\mathcal U}_1^{LN}\times{\mathcal U}_2^{LN}
1421:    \times\dots\times{\mathcal U}_M^{LN}|\right)+h(P_e^{(LN)})\right)/LN.
1422: \]
1423: It follows from eqn.~(\ref{eq:fano2}) that
1424: \begin{eqnarray*}
1425: \lefteqn{H(U_1^{LN}U_2^{LN}\dots U_M^{LN}|U_0^{LN}Y_{10}^{BN}Y_{20}^{BN}\dots Y_{M0}^{BN})} \\
1426: %  & = & H(U_1^{LN}U_2^{LN}\dots U_M^{LN}|U_0^{LN}Y_{10}^{BN}Y_{20}^{BN}\dots
1427:  %         Y_{M0}^{BN}h^{(0)}(Y_{10}^{WN}Y_{20}^{WN}\dots Y_{M0}^{WN})\dots
1428: %h^{(L-1)}(Y_{10}^{WN}Y_{20}^{WN}\dots Y_{M0}^{WN})) \\
1429:  &\stackrel{(a)}{=} & H(U_1^{LN}U_2^{LN}\dots U_M^{LN}|U_0^{LN}Y_{10}^{BN}Y_{20}^{BN}\dots
1430:           Y_{M0}^{BN}\hat{U}_1^{LN}\hat{U}_2^{LN}\dots \hat{U}_M^{LN})\\
1431:   & \leq & H(U_1^{LN}U_2^{LN}\dots U_M^{LN}|\hat{U}_1^{LN}\hat{U}_2^{LN}\dots \hat{U}_M^{LN}) \\
1432:   & \leq & LN \delta(P_e^{(LN)}),
1433: \end{eqnarray*}
1434: where $Y_{ij}^{BN}=(Y_{ij}^N(1),Y_{ij}^N(2),\dots,Y_{ij}^N(B))$ denotes
1435: $B=W+(L-1)$ blocks of $N$ channel outputs observed by node $v_j$ while
1436: communicating with node $v_i$, and (a) follows from the fact that the
1437: estimates $\hat U_i^{LN}$, $i=1\dots M$, are functions of $U_0^{LN}$ and
1438: of the received channel outputs $Y_{i0}^{BN}$, $i=1\dots M$.  From the
1439: chain rule for entropy, from the fact that conditioning does not increase
1440: entropy, and for any $S\subseteq {\cal M}=\{0...M\}$, $S\neq\emptyset$,
1441: $0\in S^c$, it follows that
1442: \begin{equation}
1443: \label{eq:ineq}
1444:  H(U^{LN}_S|U_{S^c}^{LN}Y_{S\to S^c}^{BN}Y_{S^c\to S^c}^{BN})
1445:  \;\; \leq \;\;
1446:  H(U^{LN}_S|U_{S^c}^{LN}Y_{S\to 0}^{BN}Y_{S^c\backslash\{0\}\to 0}^{BN})
1447:  \;\; \leq \;\;
1448:  LN \delta(P_e^{(LN)}).
1449: \end{equation}
1450: Let the set of $B$ codewords sent by
1451: the nodes in a subset $A$ to the nodes in a subset $D$ be
1452: \[X_{A\to D}^{BN}=\{X_{ij}^{BN}:i\in A \textrm{ and } j\in D\},\]
1453: and, likewise, the corresponding channel outputs be denoted as
1454: \[Y_{A\to D}^{BN}=\{Y_{ij}^{BN}:i\in A \textrm{ and } j\in D\}.\]
1455: 
1456: We will make use of the following lemmas.
1457: 
1458: \begin{lemma}\label{lemma:mi}
1459: Let $X_{S\to S^c}$ be a set of channel inputs and $Y_{S\to S^c}$ be
1460: a set of channel outputs of an array of independent channels
1461: $\{{\cal X}_{ij},p_{ij}(y|x),{\cal Y}_{ij}\}$, $\forall i\in S$
1462: and $\forall j\in S^c$.  Then,
1463: \begin{equation}\label{eq:lemma}
1464: I(X_{S\to S^c};Y_{S\to S^c})\leq \sum_{{i\in S},{j\in S^c}} I(X_{ij};Y_{ij}).
1465: \end{equation}
1466: \end{lemma}
1467: \begin{proof}
1468: Without loss of generality, assume that $S=\{1,\dots, x_0\}$ and
1469: $S^c=\{x_0+1,\dots, M\}$.  From the definition of mutual information, it
1470: follows that
1471: \begin{eqnarray*}
1472: I(X_{S\to S^c};Y_{S\to S^c})&=&H(Y_{S\to S^c})-H(Y_{S\to S^c}|X_{S\to S^c}).
1473: \end{eqnarray*}
1474: Expanding the first term on the right handside, we get
1475: \begin{eqnarray*}
1476: H(Y_{S\to S^c})&=&H(Y_{1\to S^c}Y_{2\to S^c}\dots Y_{x_0\to S^c})\\
1477: &\leq& \sum_{i\in S} H(Y_{i\to S^c})\\
1478: &=& \sum_{i\in S} H(Y_{i\to x_0+1}Y_{i\to x_0+2}\dots Y_{i\to M})\\
1479: &\leq& \sum_{i\in S,j\in S^c} H(Y_{ij})\\
1480: \end{eqnarray*}
1481: Similarly, the second term reduces to
1482: \begin{eqnarray*}
1483: \lefteqn{H(Y_{S\to S^c}|X_{S\to S^c})}\\
1484: &=&H(Y_{1\to S^c}Y_{2\to S^c}\dots Y_{x_0\to S^c}| X_{1\to S^c}X_{2\to S^c}\dots X_{x_0\to S^c})\\
1485: &=& H(Y_{1\to S^c}| X_{1\to S^c}X_{2\to S^c}\dots X_{x_0\to S^c})+\sum_{i=2}^{x_0} H(Y_{i\to S^c}| X_{1\to S^c}X_{2\to S^c}\dots X_{x_0\to S^c}Y_{1\to S^c}\dots Y_{i-1\to S^c})\\
1486: &=&  H(Y_{1\to S^c}| X_{1\to S^c})+\sum_{i=2}^{x_0} H(Y_{i\to S^c}| X_{i\to S^c})\\
1487: &=& \sum_{i\in S}  H(Y_{i\to S^c}| X_{i\to S^c})\\
1488: &=& \sum_{i\in S}  H(Y_{i\to x_0+1}Y_{i\to x_0+2}\dots Y_{i\to M}| X_{i\to x_0+1}X_{i\to x_0+2}\dots X_{i\to M})\\
1489: &=& \sum_{i\in S} \bigg(H(Y_{i\to x_0+1}| X_{i\to x_0+1}X_{i\to x_0+2}\dots X_{i\to M})\\&& +\!\!\sum_{j=x_0+2}^M  H(Y_{i\to j}| X_{i\to x_0+1}X_{i\to x_0+2}\dots X_{i\to M})Y_{i\to x_0+1}\dots Y_{i\to j-1})\bigg)\\
1490: &=& \sum_{i\in S} \bigg(H(Y_{i\to x_0+1}| X_{i\to x_0+1})+ \sum_{j=x_0+2}^M  H(Y_{i\to j}| X_{i\to j})\bigg)\\
1491: &=& \sum_{i\in S,j\in S^c} H(Y_{ij}| X_{ij}).
1492: \end{eqnarray*}
1493: Combining the two expressions, we get
1494: \[
1495:   I(X_{S\to S^c};Y_{S\to S^c})
1496:     \;\; \leq \;\; \sum_{i\in S,j\in S^c} H(Y_{ij})-H(Y_{ij}|X_{ij})
1497:     \;\; = \;\; \sum_{i\in S,j\in S^c} I(X_{ij};Y_{ij}),
1498: \]
1499: thus proving the lemma.
1500: \end{proof}
1501: 
1502: \begin{lemma}
1503: \label{lemma:chain1}
1504: $U_S^{LN}\rightarrow(U_{S^c}^{LN}Y_{S\to S^c}^{BN})\rightarrow
1505: Y_{S^c\to S^c}^{BN}$ forms a Markov chain.
1506: \end{lemma}
1507: \begin{proof}
1508: We begin by expanding $p(u_S^{LN}u_{S^c}^{LN}y_{S\to
1509: S^c}^{BN}y_{S^c\to S^c}^{BN})$ according to
1510: \begin{eqnarray*}
1511: p(u_S^{LN}u_{S^c}^{LN}y_{S\to S^c}^{BN}y_{S^c\to S^c}^{BN})
1512: &=&p(u_{S}^{LN}) \cdot p(u_{S^c}^{LN}y_{S\to S^c}^{BN}|u_S^{LN})
1513: \cdot p(y_{S^c\to S^c}^{BN}|u_S^{LN}u_{S^c}^{LN}y_{S\to S^c}^{BN}).
1514: \end{eqnarray*}
1515: To prove that $U_S^{LN}$ can be removed from the last factor in
1516: the previous expression, we will use an induction argument on the
1517: length of the pipeline, $L$, and window sizes, $K$ and $T$.
1518: 
1519: Fix $(S,S^c)$ and $i,j\in S^c$. Let $L=K=T=1$. The encoding functions
1520: produce $g_{ij}(U_i^N)=X_{i\to j}^N$, which result in the channel
1521: outputs $Y_{i\to j}^N$ after transmission over the DMC between
1522: nodes $i$ and $j$. In shorthand, we write
1523: \[ g_{ij}(U_i^N)
1524:    \;\; = \;\; X_{i\to j}^N
1525:    \;\; \stackrel{\textrm{\tiny DMC}}{\longrightarrow} \;\; Y_{i\to j}^N.
1526: \]
1527: Thus, the first block of channel inputs
1528: $X_{S^c\to S^c}^{1\dots N}$
1529: generated in the node set $S_c$ depends only on source symbols
1530:  $U_{S^c}^{1\dots N}$ available in $S_c$. Moreover,
1531: since the channels are DMCs, the
1532: channel outputs depend only on the channel inputs.
1533: Thus, we conclude that $U_{S}^{1\dots N}$ and $Y_{S^c\to S^c}^{1\dots N}$
1534: are independent given  $U_{S^c}^{1\dots N}$.
1535: 
1536: Since we consider a pipeline of length $L=1$, there are no more blocks
1537: to inject, but not all data may have arrived to destination, so we have
1538: to allow for a few ($W$, to be precise) extra transmissions.  By ``flushing
1539: the pipeline'', we have
1540: \[ g_{ij}(Y_{S\to i}^{1\dots N}Y_{S^c\to i}^{1\dots N})
1541:    \;\; = \;\; X_{i\to j}^{N+1...2N}
1542:    \;\; \stackrel{\textrm{\tiny DMC}}{\longrightarrow} \;\;
1543:    Y_{i\to j}^{N+1\dots 2N}.
1544: \]
1545: It follows that $Y_{S^c\to S^c}^{N+1\dots 2N}$ is independent of
1546: $U_S^{1\dots N}$ given $Y_{S\to S^c}^{1\dots N}$ and $U_{S^c}^{1\dots N}$.
1547: Similarly, we have
1548: \[ g_{ij}(Y_{S\to i}^{(W-2)N+1\dots(W-1)N}
1549:           Y_{S^c\to i}^{(W-2)N+1\dots (W-1)N})
1550:    \;\; = \;\; X_{i\to j}^{(W-1)N+1\dots WN}
1551:    \;\; \stackrel{\textrm{\tiny DMC}}{\longrightarrow} \;\;
1552:    Y_{i\to j}^{(W-1)N+1\dots WN},
1553: \]
1554: from which we conclude that
1555: $Y_{S^c\to S^c}^{(W-1)N+1\dots WN}$ is independent of $U_S^{1\dots N}$
1556: given $Y_{S\to S^c}^{(W-2)N+1\dots (W-1)N}$ and $U_{S^c}^{1\dots N}$.
1557: Thus, for $K=T=L=1$, and $W$ arbitrary,\footnote{Since $W$ is the delay
1558: used to allow data to flow to the destination, it would not be reasonable
1559: to perform induction on $W$ for a given fixed network. Instead we take
1560: $W$ as a parameter, which must be greater or equal to the diameter of
1561: the network.} the Markov chain in the lemma holds (with $B=L+W-1$).
1562: 
1563: To proceed with the inductive proof, we still take $K=T=1$, $(S,S^c)$
1564: fixed, $i,j\in S^c$, but $L$ is now arbitrary.  By inductive hypothesis,
1565: we have the following Markov chain
1566: \[ U_S^{(L-1)N}
1567:    \;\; \rightarrow \;\; (U_{S^c}^{(L-1)N}Y_{S\to S^c}^{(B-1)N})
1568:    \;\; \rightarrow \;\; Y_{S^c\to S^c}^{(B-1)N}.\]
1569: Encoding and transmission of the last block of each source yields
1570: \[ g_{ij}(U_i^{(L-1)N+1...LN}Y_{S\to i}^{(L-1)N+1\dots LN}
1571:    Y_{S^c\to i}^{(L-1)N+1\dots LN})
1572:    \;\; = \;\; X_{i\to j}^{LN+1\dots (L+1)N}
1573:    \;\; \stackrel{\textrm{\tiny DMC}}{\longrightarrow} \;\;
1574:    Y_{i\to j}^{LN+1\dots (L+1)N},
1575: \]
1576: such that for the last block, we have that
1577: \[ U_S^{LN} \;\; \rightarrow \;\; (U_{S^c}^{LN}Y_{S\to S^c}^{(L+1)N})
1578:    \;\; \rightarrow \;\; Y_{S^c\to S^c}^{(L+1)N}.\]
1579: This is not yet the sought Markov chain, as we still need to flush the
1580: pipe.  But similarly to how it was done for the base case of this inductive
1581: argument, we have that
1582: \[\begin{array}{ccccc}
1583:   g_{ij}(Y_{S\to i}^{LN+1\dots (L+1)N}Y_{S^c\to i}^{LN+1\dots (L+1)N})
1584:   & = & X_{i\to j}^{(L+1)N+1\dots (L+2)N}
1585:   & \stackrel{\textrm{\tiny DMC}}{\longrightarrow} &
1586:    Y_{i\to j}^{(L+1)N+1\dots (L+2)N}, \\
1587:   & & \vdots & &\\
1588:   g_{ij}(Y_{S\to i}^{(B-2)N+1\dots (B-1)N}Y_{S^c\to i}^{(B-2)N+1\dots (B-1)N})
1589:   & = & X_{i\to j}^{(B-1)N+1\dots BN}
1590:   & \stackrel{\textrm{\tiny DMC}}{\longrightarrow} &
1591:    Y_{i\to j}^{(B-1)N+1\dots BN},
1592: \end{array}\]
1593: and therefore, now yes, we have that
1594: $Y_{S^c\to S^c}^{BN}$ is independent of $U_S^{1\dots N}$
1595: given $Y_{S\to S^c}^{BN}$ and $U_{S^c}^{1\dots N}$.
1596: 
1597: The proof of the lemma is completed by performing the exact same
1598: induction steps on $K$ and $T$ as done on $L$.  For brevity, those
1599: same steps are omitted from this proof.
1600: \end{proof}
1601: 
1602: \subsubsection{Main Proof}
1603: 
1604: We now take an arbitrary non-empty subset $S \subseteq {\cal M}=\{0...M\}$,
1605: $S\neq\emptyset$, $0\in S^c$. and start by bounding $H(U_S^{LN})$ according
1606: to
1607: \begin{eqnarray*}
1608: H(U_S^{LN})
1609:   &=& I\big(U_S^{LN};U_{S^c}^{LN}Y_{S\to S^c}^{BN}Y_{S^c\to S^c}^{BN}\big)\;+\;
1610:       H\big(U_S^{LN}|U_{S^c}^{LN}Y_{S\to S^c}^{BN}Y_{S^c\to S^c}^{BN}\big) \\
1611:   &\stackrel{(a)}{\leq}&
1612:     I\big(U_S^{LN};U_{S^c}^{LN}Y_{S\to S^c}^{BN}Y_{S^c\to S^c}^{BN}\big)
1613:     \;+\; LN\delta(P_e^{(LN)}) \\
1614:   &=& I\big(U_S^{LN};U_{S^c}^{LN}\big)
1615:     \;+\;I(U_S^{LN};Y_{S\to S^c}^{BN}|U_{S^c}^{LN})
1616:     \;+\;I(U_S^{LN};Y_{S^c\to S^c}^{BN}|U_{S^c}^{LN}Y_{S\to S^c}^{BN})
1617:     \:+\;LN\delta(P_e^{(LN)}),
1618: \end{eqnarray*}
1619: where (a) follows from~(\ref{eq:ineq}).  From Lemma~\ref{lemma:chain1}, we
1620: have that $I(U_S^{LN};Y_{S^c\to S^c}^{BN}|U_{S^c}^{LN}Y_{S\to S^c}^{BN}) = 0$,
1621: and so we get
1622: \begin{equation}
1623: H(U_S^{LN}) \;\; \leq \;\;
1624:   I(U_S^{LN};U_{S^c}^{LN}) \; + \; I(U_S^{LN};Y_{S\to S^c}^{BN}|U_{S^c}^{LN})
1625:   \; + \; LN\delta(P_e^{(LN)}).
1626: \label{eq:almostend}
1627: \end{equation}
1628: 
1629: Developing the second term on the right handside yields:
1630: \begin{eqnarray*}
1631: \lefteqn{I(U_S^{LN};Y_{S\to S^c}^{BN}|U_{S^c}^{LN})} \\
1632: & = & \sum_{k=1}^{BN}I(U_S^{LN};Y_{S\to S^c}(k)|U_{S^c}^{LN}Y_{S\to S^c}^{k-1})\\
1633: & \leq & \sum_{k=1}^{BN}I(U_S^{LN};Y_{S\to
1634: S^c}(k)|U_{S^c}^{LN}Y_{S\to
1635: S^c}^{k-1}) +\sum_{k=1}^{BN}I(X_{S\to
1636: S^c}(k);Y_{S\to
1637: S^c}(k)|U_{S^c}^{LN}Y_{S\to
1638: S^c}^{k-1}U_S^{LN})
1639: \\
1640: &=&\sum_{k=1}^{BN}I(X_{S\to S^c}(k)U_S^{LN};Y_{S\to
1641: S^c}(k)|U_{S^c}^{LN}Y_{S\to
1642: S^c}^{k-1})
1643: \\
1644: &=&\sum_{k=1}^{BN}I(X_{S\to S^c}(k);Y_{S\to S^c}(k)|U_{S^c}^{LN}Y_{S\to S^c}^{k-1}) +I(U_S^{LN};Y_{S\to S^c}(k)|U_{S^c}^{LN}Y_{S\to S^c}^{k-1}X_{S\to S^c}(k))\\
1645: \end{eqnarray*}\begin{eqnarray*}
1646: &\stackrel{(a)}{=}&\sum_{k=1}^{BN}I(X_{S\to S^c}(k);Y_{S\to S^c}(k)|U_{S^c}^{LN}Y_{S\to S^c}^{k-1})\\
1647: &=&\sum_{k=1}^{BN}H(Y_{S\to
1648: S^c}(k)|U_{S^c}^{LN}Y_{S\to
1649: S^c}^{k-1}) -
1650: H(Y_{S\to S^c}(k)|U_{S^c}^{LN}Y_{S\to S^c}^{k-1}X_{S\to S^c}(k))\\
1651: &\stackrel{(b)}{=}&\sum_{k=1}^{BN}H(Y_{S\to S^c}(k)|U_{S^c}^{LN}Y_{S\to S^c}^{k-1})-H(Y_{S\to S^c}(k)|X_{S\to S^c}(k))\\
1652: &\stackrel{(c)}{\leq}&\sum_{k=1}^{BN}H(Y_{S\to
1653: S^c}(k))-H(Y_{S\to
1654: S^c}(k)|X_{S\to S^c}(k))\\
1655: &=&\sum_{k=1}^{BN}I(X_{S\to S^c}(k);Y_{S\to S^c}(k))\\
1656: &\stackrel{(d)}{\leq}&\sum_{k=1}^{BN}\sum_{i\in S, j\in S^c} I(X_{ij}(k);Y_{ij}(k))\\
1657: &=&\sum_{i\in S, j\in S^c} \sum_{k=1}^{BN} I(X_{ij}(k);Y_{ij}(k))\\
1658: &\leq&\sum_{i\in S, j\in S^c}  BNC_{ij}
1659: \end{eqnarray*}
1660: where we use the following arguments:
1661: \begin{itemize}
1662: \item[(a)] given the channel inputs  $X_{S\to S^c}(i)$ the
1663:   channel outputs $Y_{S\to S^c}(i)$ are independent of all
1664:   other random variables;
1665: \item[(b)] same as (a);
1666: \item[(c)] conditioning does not increase the entropy;
1667: \item[(d)] direct application of lemma \ref{lemma:mi}.
1668: \end{itemize}
1669: Substituting in (\ref{eq:almostend}) yields
1670: \begin{eqnarray*}
1671:  H(U_S^{LN}) &\leq& I(U_S^{LN};U_{S^c}^{LN}) + \sum_{i\in S, j\in S^c} BNC_{ij}
1672:                     + LN\delta(P_e^{(LN)}).
1673: \end{eqnarray*}
1674: Using the fact that the sources are drawn i.i.d., this last expression
1675: can be rewritten as
1676: \[
1677:   LNH(U_S) \;\;\leq\;\; LNI(U_S;U_{S^c})
1678:                + \sum_{i\in S, j\in S^c} BNC_{ij}
1679:                 + LN\delta(P_e^{(LN)}),
1680: \]
1681: or equivalently,
1682: \[
1683: H(U_S|U_{S^c}) \;\;\leq\;\;  \frac{B}{L}\sum_{i\in S, j\in S^c}
1684: C_{ij} + \delta(P_e^{(LN)})
1685: \;\;\leq\;\; \frac{(W+L-1)}{L} \sum_{i\in S, j\in S^c} C_{ij}+\delta(P_e^{(LN)})\\
1686: \]
1687: Finally, we observe that this inequality holds for all finite values
1688: of $L$.  Thus, it must also be the case that
1689: \begin{eqnarray*}
1690: H(U_S|U_{S^c})
1691:   & < & \inf_{L=1,2,...}
1692:            \frac{(W+L-1)}{L} \sum_{i\in S, j\in S^c} C_{ij}+\delta(P_e^{(LN)}) \\
1693:   & = &    \sum_{i\in S, j\in S^c} C_{ij}+\delta(P_e^{(LN)}).
1694: \end{eqnarray*}
1695: But since $\delta(P_e^{(LN)})$ goes to zero as $P_e^{(N)}\rightarrow 0$,
1696: we get
1697: \[ H(U_S|U_{S^c}) \;\;<\;\; \sum_{i\in S, j\in S^c} C_{ij}, \]
1698: thus concluding the proof.  \tend
1699: 
1700: 
1701: %\bibliographystyle{unsrt}
1702: %\bibliography{library}
1703: \begin{thebibliography}{10}
1704: 
1705: \bibitem{BarrosS:02b}
1706: J.~Barros and S.~D. Servetto.
1707: \newblock {On the Capacity of the Reachback Channel in Wireless Sensor
1708:   Networks}.
1709: \newblock In {\em Proc. IEEE Int. Workshop Multimedia Sig. Proc.}, US Virgin
1710:   Islands, 2002.
1711: \newblock Invited paper to the special session on {\em Signal Processing for
1712:   Wireless Networks}.
1713: 
1714: \bibitem{BarrosS:03a}
1715: J.~Barros and S.~D. Servetto.
1716: \newblock {Reachback Capacity with Non-Interfering Nodes}.
1717: \newblock In {\em Proc. IEEE Int. Symp. Inform. Theory (ISIT)}, Yokohama,
1718:   Japan, 2003.
1719: 
1720: \bibitem{BarrosS:03d}
1721: J.~Barros and S.~D. Servetto.
1722: \newblock {Coding Theorems for the Sensor Reachback Problem with Partially
1723:   Cooperating Nodes}.
1724: \newblock In {\em Discrete Mathematics and Theoretical Computer Science
1725:   (DIMACS) series on Network Information Theory}, Piscataway, NJ, 2003.
1726: 
1727: \bibitem{BarrosS:05}
1728: J.~Barros and S.~D. Servetto.
1729: \newblock {A Coding Theorem for Network Information Flow with Correlated
1730:   Sources}.
1731: \newblock In {\em Proc. IEEE Int. Symp. Inform. Theory (ISIT)}, Adelaide,
1732:   Australia, 2005.
1733: 
1734: \bibitem{CoverT:91}
1735: T.~M. Cover and J.~Thomas.
1736: \newblock {\em {Elements of Information Theory}}.
1737: \newblock John Wiley and Sons, Inc., 1991.
1738: 
1739: \bibitem{AkyildizSSC:02}
1740: I.~F. Akyildiz, W.~Su, Y.~Sankarasubramaniam, and E.~Cayirci.
1741: \newblock {A Survey on Sensor Networks}.
1742: \newblock {\em IEEE Communications Mag.}, 40(8):102--114, 2002.
1743: 
1744: \bibitem{BergerZV:96}
1745: T.~Berger, Z.~Zhang, and H.~Viswanathan.
1746: \newblock {The CEO Problem}.
1747: \newblock {\em IEEE Trans. Inform. Theory}, 42(3):887--902, 1996.
1748: 
1749: \bibitem{BertsekasG:92}
1750: D.~Bertsekas and R.~Gallager.
1751: \newblock {\em {Data Networks (2nd ed)}}.
1752: \newblock Prentice Hall, 1992.
1753: 
1754: \bibitem{HuS:03c}
1755: A.~Hu and S.~D. Servetto.
1756: \newblock {dFSK: {\em Distributed} Frequency Shift Keying Modulation in Dense
1757:   Sensor Networks}.
1758: \newblock In {\em Proc. IEEE Int. Conf. Commun. (ICC)}, Paris, France, 2004.
1759: 
1760: \bibitem{HuS:03b}
1761: A.~Hu and S.~D. Servetto.
1762: \newblock {Algorithmic Aspects of the Time Synchronization Problem in
1763:   Large-Scale Sensor Networks}.
1764: \newblock {\em ACM/Kluwer Mobile Networks and Applications}, 10:491--503, 2005.
1765: \newblock Special issue with selected (and revised) papers from ACM WSNA 2003.
1766: 
1767: \bibitem{HuS:05}
1768: A.~Hu and S.~D. Servetto.
1769: \newblock {On the Scalability of Cooperative Time Synchronization in
1770:   Pulse-Connected Networks}.
1771: \newblock IEEE Trans. Inform. Theory, to appear. Available from
1772:   \href{http://cn.ece.cornell.edu/}{{\tt http://cn.ece.cornell.edu/}}.
1773: 
1774: \bibitem{Berger:78}
1775: T.~Berger.
1776: \newblock {\em The Information Theory Approach to Communications (G. Longo,
1777:   ed.)}, chapter Multiterminal Source Coding.
1778: \newblock Springer-Verlag, 1978.
1779: 
1780: \bibitem{CsiszarK:80}
1781: I.~Csisz\'ar and J.\ K{\"o}rner.
1782: \newblock {Towards a General Theory of Source Networks}.
1783: \newblock {\em IEEE Trans. Inform. Theory}, 26(2):155--166, 1980.
1784: 
1785: \bibitem{KoernerM:79}
1786: J.\ K{\"o}rner and K.~Marton.
1787: \newblock {How to Encode the Modulo-Two Sum of Binary Sources}.
1788: \newblock {\em IEEE Trans. Inform. Theory}, 25(2):219--221, 1979.
1789: 
1790: \bibitem{KawadiaK:04}
1791: V.~Kawadia and P.~R. Kumar.
1792: \newblock {A Cautionary Perspective on Cross-Layer Design}.
1793: \newblock IEEE Wireless Comm. Mag., 2004. Available from
1794:   \href{http://decision.csl.uiuc.edu/~prkumar/ps_files/cross-layer-design.pdf}%
1795: {{\tt http://decision.csl.uiuc.edu/\~{}prkumar/}}.
1796: 
1797: \bibitem{EphremidesH:98}
1798: A.~Ephremides and B.~Hajek.
1799: \newblock {Information Theory and Communication Networks: An Unconsummated
1800:   Union}.
1801: \newblock {\em IEEE Trans. Inform. Theory}, 44(6):2416--2434, 1998.
1802: 
1803: \bibitem{SlepianW:73b}
1804: D.~Slepian and J.~K. Wolf.
1805: \newblock {Noiseless Coding of Correlated Information Sources}.
1806: \newblock {\em IEEE Trans. Inform. Theory}, IT-19(4):471--480, 1973.
1807: 
1808: \bibitem{CoverGS:80}
1809: T.~M. Cover, A.~A. {El Gamal}, and M.~Salehi.
1810: \newblock {Multiple Access Channels with Arbitrarily Correlated Sources}.
1811: \newblock {\em IEEE Trans. Inform. Theory}, IT-26(6):648--657, 1980.
1812: 
1813: \bibitem{Dueck:81}
1814: G.~Dueck.
1815: \newblock {A Note on the Multiple Access Channel with Correlated Sources}.
1816: \newblock {\em IEEE Trans. Inform. Theory}, IT-27(2):232--235, 1981.
1817: 
1818: \bibitem{SlepianW:73}
1819: D.~Slepian and J.~K. Wolf.
1820: \newblock {A Coding Theorem for Multiple Access Channels with Correlated
1821:   Sources}.
1822: \newblock {\em Bell Syst. Tech. J.}, 52(7):1037--1076, 1973.
1823: 
1824: \bibitem{AhlswedeH:83}
1825: R.~Ahlswede and T.~S. Han.
1826: \newblock {On Source Coding with Side Information via a Multiple-Access
1827:   Channel, and Related Problems in Multi-User Information Theory}.
1828: \newblock {\em IEEE Trans. Inform. Theory}, 29(3):396--411, 1983.
1829: 
1830: \bibitem{Willems:83}
1831: F.~M.~J. Willems.
1832: \newblock {The Discrete Memoryless Multiple Access Channel with Partially
1833:   Cooperating Encoders}.
1834: \newblock {\em IEEE Trans. Inform. Theory}, 29(3):441--445, 1983.
1835: 
1836: \bibitem{Han:80}
1837: T.~S. Han.
1838: \newblock {Slepian-Wolf-Cover Theorem for a Network of Channels}.
1839: \newblock {\em Inform. Contr.}, 47(1):67--83, 1980.
1840: 
1841: \bibitem{AhlswedeCLY:00}
1842: R.~Ahlswede, N.~Cai, S.-Y.~R. Li, and R.~W. Yeung.
1843: \newblock {Network Information Flow}.
1844: \newblock {\em IEEE Trans. Inform. Theory}, 46(4):1204--1216, 2000.
1845: 
1846: \bibitem{Borade:02}
1847: S.~Borade.
1848: \newblock {Network Information Flow: Limits and Achievability}.
1849: \newblock In {\em Proc. IEEE Int. Symp. Inform. Theory (ISIT)}, Lausanne,
1850:   Switzerland, 2002.
1851: 
1852: \bibitem{LiYC:03}
1853: S.-Y.~R. Li, R.~W. Yeung, and N.~Cai.
1854: \newblock {Linear Network Coding}.
1855: \newblock {\em IEEE Trans. Inform. Theory}, 49(2):371--381, 2003.
1856: 
1857: \bibitem{KoetterM:03}
1858: R.~Koetter and M.~M\'edard.
1859: \newblock {An Algebraic Approach to Network Coding}.
1860: \newblock {\em IEEE/ACM Trans. Networking}, 11(5):782--795, 2003.
1861: 
1862: \bibitem{EffrosMHRKK:03}
1863: M.~Effros, M.~M\'edard, T.~Ho, S.~Ray, D.~Karger, and R.~Koetter.
1864: \newblock {Linear Network Codes: A Unified Framework for Source, Channel, and
1865:   Network Coding}.
1866: \newblock In {\em Discrete Mathematics and Theoretical Computer Science
1867:   (DIMACS) series on Network Information Theory}, Piscataway, NJ, 2003.
1868: 
1869: \bibitem{HoMEK:04}
1870: T.~Ho, M.~M\'edard, M.~Effros, and R.~Koetter.
1871: \newblock {Network Coding for Correlated Sources}.
1872: \newblock In {\em Proc. 38th Annual Conf. Inform. Sciences Syst. (CISS)},
1873:   Princeton, NJ, March 2004.
1874: 
1875: \bibitem{LiL:04}
1876: Z.~Li and B.~Li.
1877: \newblock {Network Coding in Undirected Networks}.
1878: \newblock In {\em Proc. 38th Annual Conf. Inform. Sciences Syst. (CISS)},
1879:   Princeton, NJ, March 2004.
1880: 
1881: \bibitem{GuptaK:00}
1882: P.~Gupta and P.~R. Kumar.
1883: \newblock {The Capacity of Wireless Networks}.
1884: \newblock {\em IEEE Trans. Inform. Theory}, 46(2):388--404, 2000.
1885: 
1886: \bibitem{PerakiS:03}
1887: C.~Peraki and S.~D. Servetto.
1888: \newblock {On the Maximum Stable Throughput Problem in Random Networks with
1889:   Directional Antennas}.
1890: \newblock In {\em Proc. ACM MobiHoc}, Annapolis, MD, 2003.
1891: 
1892: \bibitem{PerakiS:04}
1893: C.~Peraki and S.~D. Servetto.
1894: \newblock {Capacity, Stability and Flows in Large-Scale Random Networks}.
1895: \newblock In {\em Proc. IEEE Inform. Theory Workshop (ITW)}, San Antonio, TX,
1896:   2004.
1897: 
1898: \bibitem{CormenLRS:01}
1899: T.~H. Cormen, C.~E. Leiserson, R.~L. Rivest, and C.~Stein.
1900: \newblock {\em {Introduction to Algorithms (2nd ed)}}.
1901: \newblock MIT Press, 2001.
1902: 
1903: \bibitem{LehmanL:04}
1904: A.~R. Lehman and E.~Lehman.
1905: \newblock {Complexity Classification of Network Information Flow Problems}.
1906: \newblock In {\em Proc. ACM/SIAM Symp. Discr. Alg. (SODA)}, 2004.
1907: 
1908: \bibitem{Verdu:02}
1909: S.\ Verd\'u.
1910: \newblock {Spectral Efficiency in the Wideband Regime}.
1911: \newblock {\em IEEE Trans. Inform. Theory}, 48(6):1319--1343, 2002.
1912: 
1913: \bibitem{IntanagonwiwatGEHS:03}
1914: C.~Intanagonwiwat, R.~Govindan, D.~Estrin, J.~Heidemann, and F.~Silva.
1915: \newblock {Directed Diffusion for Wireless Sensor Networking}.
1916: \newblock {\em IEEE/ACM Trans. Networking}, 11(1):2--16, 2003.
1917: 
1918: \bibitem{GoelE:03}
1919: A.~Goel and D.~Estrin.
1920: \newblock {Simultaneous Optimization for Concave Costs: Single Sink Aggregation
1921:   or Single Source Buy-at-Bulk}.
1922: \newblock In {\em Proc. ACM/SIAM Symp. Discr. Alg. (SODA)}, Baltimore, MD,
1923:   2003.
1924: 
1925: \bibitem{BarrosTL:03}
1926: J.~Barros, M.~T{\"u}chler, and S.~P. Lee.
1927: \newblock {Scalable Source/Channel Decoding for Large-Scale Sensor Networks}.
1928: \newblock In {\em Proc. Int. Conf. Commun. (ICC)}, Paris, France, 2004.
1929: 
1930: \bibitem{Berger:71}
1931: T.~Berger.
1932: \newblock {\em {Rate Distortion Theory: A Mathematical Basis for Data
1933:   Compression}}.
1934: \newblock Prentice-Hall, Inc., 1971.
1935: 
1936: \bibitem{Chiang:05}
1937: M.~Chiang.
1938: \newblock {Balancing Transport and Physical Layers in Wireless Multihop
1939:   Networks: Jointly Optimal Congestion Control and Power Control}.
1940: \newblock {\em IEEE. J. Select. Areas Commun.}, 23(1):104--116, 2005.
1941: 
1942: \end{thebibliography}
1943: 
1944: 
1945: \end{document}
1946: 
1947: