1:
2: \documentclass[11pt,onecolumn,dvips,draftcls]{IEEEtran}
3: % \documentclass[10pt,twocolumn,dvips,final]{IEEEtran}
4: \usepackage{psfig, graphics, amsfonts, amsmath, color, amssymb, amsxtra, times}
5: \definecolor{gray}{cmyk}{.2,0.2,.3,.1}
6: \definecolor{dred}{cmyk}{0,0.9,0.4,0.3}
7: \definecolor{dblue}{rgb}{0,0,0.5}
8: \definecolor{dgreen}{rgb}{0,0.3,0}
9: \definecolor{dgray}{rgb}{0.3,0.3,0}
10: \usepackage[breaklinks=true, colorlinks=true, linkcolor=black, urlcolor=dblue,
11: citecolor=black, pdfpagemode=None, pdfstartview=FitH]{hyperref}
12:
13: \DeclareOldFontCommand{\rm}{\normalfont\rmfamily}{\mathrm}
14: \DeclareOldFontCommand{\sf}{\normalfont\sffamily}{\mathsf}
15: \DeclareOldFontCommand{\tt}{\normalfont\ttfamily}{\mathtt}
16: \DeclareOldFontCommand{\bf}{\normalfont\bfseries}{\mathbf}
17: \DeclareOldFontCommand{\it}{\normalfont\itshape}{\mathit}
18: \DeclareOldFontCommand{\sl}{\normalfont\slshape}{\@nomath\sl}
19: \DeclareOldFontCommand{\sc}{\normalfont\scshape}{\@nomath\sc}
20:
21: \newtheorem{theorem}{Theorem}
22: \newtheorem{proposition}{Proposition}
23: \newtheorem{lemma}{Lemma}
24: \setcounter{tocdepth}{2}
25: \setlength{\topmargin}{-15mm}
26: \setlength{\textwidth}{17cm}
27: \setlength{\textheight}{23cm}
28: \setlength{\oddsidemargin}{-2mm}
29:
30: \newcommand{\rend}{\hfill$\square$}
31: \newcommand{\tend}{\hfill$\blacksquare$}
32: \newcommand{\epsfig}{\psfig}
33: \newcommand{\fig}[1]{{Fig.~\ref{#1}}}
34: \newcommand{\eq}[1]{\eqref{#1}}
35: \newcommand{\expect}[1]{\ensuremath{\operatorname{E}\left[#1\right]}}
36: \newcommand{\secref}[1]{Section~\ref{#1}}
37: \newcommand{\ie}{i.e.~}
38: \newcommand{\eg}{e.g.~}
39: \newcommand{\theorref}[1]{{\itshape Theorem~\ref{#1}}}
40:
41:
42: \title{Multiterminal Source Coding With Two Encoders--I: A Computable
43: Outer Bound
44: \thanks{The author is with the School of Electrical and Computer
45: Engineering, Cornell University, Ithaca, NY. URL:
46: \href{http://cn.ece.cornell.edu/}{{\tt http://cn.ece.cornell.edu/}}.
47: Work supported by the National Science Foundation, under awards
48: CCR-0238271 (CAREER), CCR-0330059, and ANR-0325556.}}
49: \author{Sergio D.\ Servetto}
50: \date{November 12, 2006.}
51:
52:
53: \begin{document}
54: \maketitle
55: \thispagestyle{empty}
56:
57: \begin{picture}(0,0)
58: \put(-5,210){\tt\small Submitted to the IEEE Transactions on Information
59: Theory, April 2006; Revised,}
60: \put(-5,200){\tt\small November 2006.}
61: \end{picture}
62: \vspace{-4mm}
63:
64: \begin{abstract}
65: \noindent\it
66: In this first part, a computable outer bound is proved for the
67: multiterminal source coding problem, for a setup with two encoders,
68: discrete memoryless sources, and bounded distortion measures.
69: \end{abstract}
70:
71: \vspace{1cm}
72: \noindent{\bf Index terms:} multiterminal source coding, distributed
73: source coding, network source coding, rate-distortion theory, rate-distortion
74: with side information, network information theory.
75:
76: \vspace{9.3cm}
77: \pagebreak
78: \setcounter{page}{1}
79:
80:
81: \section{Introduction}
82:
83: \subsection{The Problem of Multiterminal Source Coding}
84:
85: Consider two dependent sources $X$ and $Y$, with joint distribution
86: $p(xy)$. These sources are to be encoded by two separate encoders,
87: each of which observes only one of them, and are to be decoded by a
88: single joint decoder. $X$ is encoded at rate $R_1$ and with average
89: distortion $D_1$, and $Y$ is encoded at rate $R_2$ and with average
90: distortion $D_2$. This setup is illustrated in Fig.~\ref{fig:setup}.
91:
92: \begin{figure}[!ht]
93: \centerline{\psfig{file=setup.eps,height=3cm,width=12cm}}
94: \caption{System setup for multiterminal source coding.}
95: \label{fig:setup}
96: \end{figure}
97:
98: In the classical {\em multiterminal source coding} problem, as
99: formulated in~\cite{Berger:78, Tung:PhD}, the goal
100: is to determine the region of all achievable rate-distortion tuples
101: $(R_1,R_2,D_1,D_2)$. Although relatively simple to describe (a
102: formal description is given later), the multiterminal source coding
103: problem was one of the long-standing open problems in information
104: theory -- see, e.g.,~\cite[pg.\ 443]{CoverT:91}. Furthermore,
105: besides its historical interest, this problem also comes up naturally
106: in the context of a sensor networking problem of interest to
107: us~\cite{BarrosS:06}.
108:
109: Multiterminal source coding has rich history, among which
110: fundamental contributions, in chronological order, are the
111: works of: a) Dobrushin-Tsybakhov~\cite{DobrushinT:62}, with the
112: first rate-distortion problem with a Markov chain constraint; b)
113: Slepian-Wolf~\cite{SlepianW:73b}, with the formulation and solution
114: to the first distributed source coding problem, and
115: Cover~\cite{Cover:75b}, with a simpler proof of the Slepian-Wolf
116: result, a proof method widely in use today; c)
117: Ahlswede-K\"orner~\cite{AhlswedeK:75} and Wyner~\cite{Wyner:75},
118: with the first use of an auxiliary random variable to describe
119: the rate region of a source coding problem, and with it the need
120: to introduce proof methods to bound their cardinality; d)
121: Wyner-Ziv~\cite{WynerZ:76}, with the first characterization of a
122: multiterminal rate-distortion function; e) Berger-Tung~\cite{Berger:78,
123: Tung:PhD}, with the first formulation and partial results on the
124: multiterminal source coding problem as formulated in Fig.~\ref{fig:setup};
125: and f) Berger-Yeung~\cite{BergerY:89, Yeung:PhD}, with a complete
126: solution to a more general form of the Wyner-Ziv problem. For
127: details on these, and on {\em many} more important contributions,
128: as well as for historical information on the problem, the reader
129: is referred to~\cite{BergerS:07}.
130:
131: The setup of Fig.~\ref{fig:setup} represents what we feel was the
132: simplest yet unsolved instance of a multiterminal source coding problem.
133: The problem of Fig.~\ref{fig:setup}, and the CEO problem~\cite{BergerZV:96}
134: are, to the best of our knowledge, the last two known special cases of
135: the general entropy characterization of problem of Csisz\'ar and
136: K\"orner~\cite{CsiszarK:80} that remained unsolved. This hierarchy
137: of problems is illustrated in Fig.~\ref{fig:hierarchy}.
138:
139: \begin{figure}[ht]
140: \centerline{\resizebox{10cm}{6cm}{\input{hierarchy.pstex_t}}}
141: \caption{A hierarchy of problems in multiterminal source coding
142: with two encoders and one decoder: an arrow from problem X to
143: problem Y indicates that X is a special case of Y, in the sense
144: that a solution to Y automatically provides a solution to X.
145: Abbreviations -- SC: two-terminal lossless source coding;
146: RD: two-terminal rate-distortion~\cite{Shannon:59}; SW: distributed
147: coding of dependent sources~\cite{SlepianW:73b}; AK/W: source
148: coding with side information~\cite{AhlswedeK:75, Wyner:75};
149: WZ: rate-distortion with side information~\cite{WynerZ:76};
150: BY: the Berger-Yeung extension of WZ theory~\cite{BergerY:89};
151: DT: rate-distortion with a remote source~\cite{DobrushinT:62};
152: BHOTW: a rate-distortion formulation of the Ahlswede-K\"orner-Wyner
153: problem~\cite{BergerHOTW:79}; CEO: the CEO problem~\cite{BergerZV:96};
154: MTRD: the problem setup of Fig.~\ref{fig:setup}; EC: the entropy
155: characterization problem~\cite{CsiszarK:80}. Asterisks are used
156: to indicate problems whose solution was previously known.}
157: \label{fig:hierarchy}
158: \end{figure}
159:
160: It should be pointed out though that the setup of Fig.~\ref{fig:setup}
161: is by no means the most general formulation of a multiterminal source
162: coding problem we could have given, there are many other ways in which
163: we could have chosen to formulate these problems: we could have chosen
164: a network with $M$ encoders and a single decoder which attempts to
165: reconstruct $L$ different functions of the sources, we could have
166: considered continuous-alphabet and/or general ergodic sources, we
167: could have considered feedback and interactive communication, we could
168: have studied how this problem relates to the network coding problem,
169: and we could have considered network
170: topologies with multiple decoders as well. All these alternative
171: possible formulations are discussed in detail in~\cite{BergerS:07}.
172:
173: \subsection{Difficulties in Proving a Converse}
174: \label{sec:difficulties}
175:
176: Among the limited number of references mentioned above, we included
177: the Berger-Tung bounds~\cite{Berger:78, Tung:PhD}. These bounds do
178: provide the best known descriptions of the region of achievable rates
179: for the problem setup of Fig.~\ref{fig:setup},\footnote{We note that
180: recently, a new outer bound has been proposed for a version of
181: multiterminal source coding
182: that contains the formulation of~\cite{Berger:78, Tung:PhD} considered
183: here as a special case~\cite{Wagner:PhD, WagnerA:05}. The new
184: bound has many desirable properties: it unifies known bounds custom
185: developed for seemingly different problems, and it provides a conclusive
186: answer for a previously unsolved instance. However, when specialized
187: to our two-encoder setup, it is unclear if the new bound provides
188: an improvement over the Berger-Tung outer bound. So, due to the
189: simplicity of the latter, we have chosen here to focus on that one
190: instead of on the more modern form.}
191: and so we elaborate on those now.
192:
193: \medskip\begin{proposition}[Berger-Tung Bounds]
194: \label{prp:bt-bounds}
195: Fix $(D_1,D_2)$. Let $X$ and $Y$ be two sources out of which
196: pairs of sequences $\big(X^n,Y^n\big)$ are drawn i.i.d.~$\sim p(xy)$;
197: and let $U$ and $V$ be auxiliary variables defined over alphabets
198: $\mathcal{U}$ and $\mathcal{V}$, such that there exist functions
199: $\gamma_1:\mathcal{U}\times\mathcal{V}\to\hat{\mathcal{X}}$
200: and $\gamma_2:\mathcal{U}\times\mathcal{V}\to\hat{\mathcal{Y}}$,
201: for which $\expect{d_1\big(X,\gamma_1(UV)\big)}\leq D_1$ and
202: $\expect{d_2\big(Y,\gamma_2(UV)\big)}\leq D_2$. Consider rates
203: $(R_1,R_2)$, such that $R_1 \geq I(XY\wedge U|V)$,
204: $R_2 \geq I(XY\wedge V|U)$, and $R_1+R_2 \geq I(XY\wedge UV)$,
205: for some joint distribution $p(xyuv)$. Now:
206: \begin{itemize}
207: \item for any $p(xyuv)$ that satisfies a Markov chain of the form
208: $U-X-Y-V$, all rates $(R_1,R_2)$ obtained for any such
209: $p$ are achievable;
210: \item if there exists a $p(xyuv)$ that satisfies two Markov chains of
211: the form $U-X-Y$ and $X-Y-V$, then if we consider the union of the
212: set of rates defined for each such $p(xyuv)$, we must have that any
213: achievable rates are included in that union;
214: \end{itemize}
215: that is, the first condition defines an {\em inner} bound, and the second
216: an {\em outer} bound to the rate region. \rend
217: \end{proposition}\medskip
218:
219: The regions defined by these bounds, when regarded as images of maps
220: that transform probability distributions into rate pairs, have a
221: property that is a source of many difficulties: the mutual information
222: expressions that define the inner and the outer bounds are identical,
223: it is only the {\em domains} of the two maps that differ; as such,
224: comparing the resulting regions is difficult. This difference between
225: the inner and outer bounds has been the state of affairs in multiterminal
226: source coding, since 1978.
227:
228: A close examination of these distributions suggested to us that the
229: gap might not be due to a suboptimal coding strategy used in the inner
230: bound, but instead that perhaps the outer bound allows for the inclusion
231: of dependencies that cannot be physically realized by any distributed
232: code. Consider these distributions:
233: \begin{itemize}
234: \item For the inner bound, $p(xyuv)$ = $p(xy)p(u|x)p(v|y)$.
235: \item For the outer bound, $p(xyuv)$
236: = $p(xy)p(u|x)p(v|y\underline{xu})$
237: = $p(xy)p(v|y)p(u|x\underline{yv})$.
238: \end{itemize}
239: If we choose to interpret $U$ and $V$ as instantaneous descriptions
240: of encodings of $X$ and $Y$,
241: then we see that the outer bound says that the encoding
242: $V$ is allowed to contain information about $X$ {\em beyond} that
243: which can be extracted from $Y$, and likewise for $U$ and
244: $Y$.\footnote{Note: this interpretation comes from the inner bound,
245: and is only justified for {\em blocks}. $U^n$ does represent an encoding
246: of $X^n$, but it would be incorrect to say that the variable $U$ is
247: an encoding of $X$ (and likewise for $V$ and $Y$). These insights
248: can only be carried so far, but at this point we are only trying to
249: build some intuition, and thus it is permissible to take such liberties.}
250: Motivated by this observation, in the first part of this work we
251: set ourselves the goal of finding a new outer bound.
252:
253: \subsection{An Interpretation of Distributed Rate-Distortion Codes
254: as Constrained Source Covers}
255:
256: In Part I of this paper we present a finitely parameterized outer
257: bound for the region of achievable rates of the multiterminal source
258: coding problem of Fig.~\ref{fig:setup}, based on what we
259: believe is an original proof technique. Some highlights of that
260: proof method, formally developed in later sections, are provided here.
261:
262: \subsubsection{Rate-Distortion Codes $\equiv$ Source Covers}
263: \label{sec:intro-distributed-covers}
264:
265: Our proof tightens existing converses by means of identifying a
266: constraint that {\em all} codes are subject to, but that is not
267: captured by any existing outer bound. To explain what the constraint
268: is, the easiest way to get started is by drawing an analogy to
269: classical, two-terminal rate-distortion codes.
270:
271: In the standard, two-terminal rate-distortion problem, a generic
272: code consists of the following elements:
273: \begin{itemize}
274: \item A block length $n$.
275: \item A cover $\big\{ \mathbf{S}_i \;:\; i=1...2^{nR} \big\}$ of the source
276: $\mathcal{X}^n$.
277: \item A reconstruction sequence $\hat{\mathbf{x}}^n(i)$, associated to each
278: cover element $\mathbf{S}_i$.
279: \end{itemize}
280: Given this description, an encoder $f:\mathcal{X}^n\to\{1...2^{nR}\}$
281: makes $f\big(\mathbf{x}^n\big)=i$ for some source sequence $\mathbf{x}^n$
282: and some index $i$, if
283: $\mathbf{x}^n\in \mathbf{S}_i$, with ties broken arbitrarily; a decoder
284: $g:\{1...2^{nR}\}\to
285: \hat{\mathcal{X}}^n$ simply maps $g(i)=\hat{\mathbf{x}}^n(i)$. And we say
286: that the encoder/decoder pair $(f,g)$ satisfies a distortion constraint $D$
287: if, roughly, $P\Big(d\big(\mathbf{x}^n,g(f(\mathbf{x}^n))\big)\leq D\Big)
288: \approx 1$, for all $n$ large enough. Such a representation is illustrated
289: in Fig.~\ref{fig:covers-classical}.
290:
291: \begin{figure}[ht]
292: \centerline{\resizebox{15cm}{4cm}{\input{covers-classical.pstex_t}}}
293: \vspace{-2mm}
294: \caption{Cover-based representation of a classical rate-distortion code.}
295: \label{fig:covers-classical}
296: \end{figure}
297:
298: In an analogous manner, we specify an arbitrary {\em distributed}
299: rate-distortion code as follows:
300: \begin{itemize}
301: \item A block length $n$.
302: \item {\em Two} covers:
303: \begin{itemize}
304: \item A cover $\big\{ \mathbf{S}_{1,i} \;:\; i=1...2^{nR_1}\big\}$ of the
305: source $\mathcal{X}^n$.
306: \item A cover $\big\{ \mathbf{S}_{2,j} \;:\; j=1...2^{nR_2}\big\}$ of the
307: source $\mathcal{Y}^n$.
308: \end{itemize}
309: Indirectly, these two covers specify a cover $\mathbf{S}_{ij}\;\triangleq\;
310: \big\{ \mathbf{S}_{1,i}\times\mathbf{S}_{2,j} : i=1...2^{nR_1},
311: j=1...2^{nR_2}\big\}$ of the product alphabet $\mathcal{X}^n\times
312: \mathcal{Y}^n$.
313: \item For each cover element $\mathbf{S}_{ij}$, we specify
314: {\em two} reconstruction sequences
315: $\big(\hat{\mathbf{x}}^n(ij),\hat{\mathbf{y}}^n(ij)\big)$.
316: \end{itemize}
317: Given this description, an encoder $f_1:\mathcal{X}^n\to\{1...2^{nR_1}\}$
318: for node 1 makes $f_1\big(\mathbf{x}^n\big)=i$ for some source sequence
319: $\mathbf{x}^n$ and some index $i$, if $\mathbf{x}^n\in\mathbf{S}_{1,i}$,
320: with ties broken arbitrarily (and similarly for an encoder $f_2$ at node 2);
321: a decoder $g:\{1...2^{nR_1}\}\times\{1...2^{nR_2}\}\to
322: \hat{\mathcal{X}}^n\times\hat{\mathcal{Y}}^n$ simply maps
323: $g(i,j)=\big(\hat{\mathbf{x}}^n(ij),\hat{\mathbf{y}}^n(ij)\big)$.
324: And we say that the distributed code $(f_1,f_2,g)$ satisfies two distortion
325: constraints $D_1$ and $D_2$ if, roughly,
326: $P\Big(d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big)\leq D_1
327: \mbox{ and }
328: d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big)\leq D_2\Big)\approx 1$,
329: for all $n$ large enough, and for
330: $\big(\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)=
331: g\big(f_1(\mathbf{x}^n),f_2(\mathbf{y}^n)\big)$. Such a representation is
332: illustrated in Fig.~\ref{fig:covers-distributed}.
333:
334: \begin{figure}[ht]
335: \centerline{\psfig{file=covers-distributed.eps,height=6cm,width=16cm}}
336: \caption{Cover-based representation of a {\em distributed} rate-distortion
337: code.}
338: \label{fig:covers-distributed}
339: \end{figure}
340:
341: \subsubsection{Constraints on the Structure of Source Covers}
342:
343: Our main insight is that, whereas in the classical problem
344: any arbitrary cover defines a valid rate-distortion code, in multiterminal
345: source coding this is no longer the case: {\em covers of the product source
346: $\mathcal{X}^n\times\mathcal{Y}^n$ only of the form $\mathbf{S}_{ij}
347: = \mathbf{S}_{1,i}\times\mathbf{S}_{2,j}$ can be realized by distributed
348: codes}. The significance of this requirement is illustrated with an
349: example in Fig.~\ref{fig:binary-example}.
350:
351: \begin{figure}[!h]
352: \centerline{\psfig{file=dtyp-bin.eps,height=6cm}\hspace{1cm}
353: \psfig{file=dtyp-ex.eps,height=6cm}}
354: \caption{An example, to illustrate the significance of the
355: requirement that cover elements $\mathbf{S}_{ij}$ take a product form.
356: Let $\mathcal{X}=\mathcal{Y}=\{0,1\}$, and $p(xy)=p(x)p(y|x)$
357: specified by a $p(x)$ such that $P(X=0)=P(X=1)=\frac 1 2$,
358: and $p(y|x)$ a binary symmetric channel with crossover probability
359: $p_c$. Left: for each typical $\mathbf{x}^n$, there is a ``ring'' of
360: $\mathbf{y}^n$'s jointly typical with it, centered at $\mathbf{x}^n$
361: and of radius $\approx np_c$. Right: consider pairs
362: $\big(\mathbf{x}_1^n\mathbf{y}_1^n\big)$ and $\big(\mathbf{x}_2^n
363: \mathbf{y}_2^n\big)$ in $\mathbf{S}_{ij}$; dashed circles denote
364: distortion balls centered at $\hat{\mathbf{x}}^n(ij)$ and
365: $\hat{\mathbf{y}}^n(ij)$ (with the centers omitted, for clarity),
366: and dark shaded regions denote the intersection of two rings.
367: Suppose now that all four pairs $(\mathbf{x}_1^n\mathbf{y}_1^n)$,
368: $(\mathbf{x}_1^n\mathbf{y}_2^n)$ $(\mathbf{x}_2^n\mathbf{y}_1^n)$,
369: and $(\mathbf{x}_2^n\mathbf{y}_2^n)$ are in $T_\epsilon^n\big(XY\big)$.
370: Because $\mathbf{S}_{ij}= \mathbf{S}_{1,i}\times\mathbf{S}_{2,j}$,
371: {\em all four pairs must be in $\mathbf{S}_{ij}$ as well:} the decoder
372: does not have enough information to discriminate among these pairs.
373: No such constraint exists with a centralized encoder.}
374: \label{fig:binary-example}
375: \end{figure}
376:
377: From the informal argument of Fig.~\ref{fig:binary-example},
378: we see how the fact that distributed codes produce covers only of the
379: form $\mathbf{S}_{ij}=\mathbf{S}_{1,i}\times\mathbf{S}_{2,j}$ results
380: in constraints on the sets used to cover the typical set
381: $T_\epsilon^n\big(XY\big)$: there are certain groups of typical
382: sequences that cannot be broken, in the sense that either all of them
383: appear together in a cover element $\mathbf{S}_{ij}$, or none of them
384: appear. We believe this is significant for two main reasons:
385: \begin{itemize}
386: \item If we compare to a classical rate-distortion code, this constraint
387: is clearly not there. Provided the distortion constraints are met, a
388: classical code would be able to split the typical set into distortion
389: balls, without any further constraints.
390: \item More fundamentally though, we view this constraint as a form of
391: ``independence,'' reminiscent to us of the extra independence assumption
392: required by the long Markov chain used in the definition of the
393: Berger-Tung inner bound, which is not there in the definition of the
394: outer bound, as highlighted in Section~\ref{sec:difficulties} earlier.
395: \end{itemize}
396: This latter observation is perhaps the strongest piece of evidence that
397: suggested to us that the Berger-Tung inner bound might be tight.
398:
399: \subsection{Main Contributions and Organization of the Paper}
400:
401: The main contribution presented in Part I of this paper is the
402: development of an outer bound to the region of achievable rates
403: for multiterminal source coding. This outer bound has two salient
404: properties that distinguish it from existing bounds in the literature:
405: \begin{itemize}
406: \item it is based on explicitly modeling a constraint on the
407: structure of codes that, as we understand things, had not been
408: captured by any previously developed bound;
409: \item and also unlike existing bounds, it is finitely parameterized.
410: \end{itemize}
411: We believe that this outer bound coincides with the set of achievable
412: rates defined by the Berger-Tung inner bound. This issue is thoroughly
413: explored in Part II of this paper, in the context of our study of
414: algorithmic issues involved in the effective computation of this bound.
415:
416: The rest of this paper is organized as follows. In
417: Section~\ref{sec:preliminaries} we define our notation, and state
418: our main result. In Section~\ref{sec:aux-lemmas} we state and prove
419: some auxiliary lemmas that greatly simplify the proof of the main
420: theorem, a proof that is fully developed in Section~\ref{sec:main-proof}.
421: The paper concludes with an extensive discussion on our main result
422: and its implications, in Section~\ref{sec:discussion}.
423:
424:
425: \section{Preliminaries}
426: \label{sec:preliminaries}
427:
428: \subsection{Definitions and Notation}
429:
430: First, a word about notation. Random variables are denoted with
431: capital letters, e.g., $X$. Realizations of these variables are
432: denoted with lower case letters: e.g., $X=x$ means that the random
433: variable $X$ takes on the value $x$. Script letters are typically
434: used to denote alphabets, e.g., the random variable $X$ takes values
435: on an alphabet $\mathcal{X}$. The alphabets of all random variables
436: considered in this work are always assumed finite. Sets in general
437: are denoted by capital boldface symbols, e.g., $\mathbf{S}$.
438: The size of a set is denoted by $\big|\mathbf{S}\big|$. A
439: probability mass function on $\mathcal{X}$ is denoted by $p_X(x)$,
440: or simply $p(x)$ when the variable that it applies to is clear from
441: the context. Sequences of elements from an alphabet $\mathcal{X}$
442: are denoted by boldface symbols $\mathbf{x}^n$,
443: and its $i$-th element by $\mathbf{x}_i$; this sequence is an element
444: of the extension alphabet $\mathcal{X}^n$. The expression
445: $\mathbf{x}_i^{j,n}$ denotes a subsequence of $\mathbf{x}^n$ consisting
446: of the elements $[\mathbf{x}_i,\mathbf{x}_{i+1},...,\mathbf{x}_j]$,
447: whenever $i\leq j$, otherwise it denotes an empty sequence; also,
448: sometimes the length $n$ of the sequence will be clear from the
449: context, and then we simply write $\mathbf{x}_i^j$ instead of
450: $\mathbf{x}_i^{j,n}$, whenever this does not cause confusion. The
451: expression $\mathbf{x}^{-i,n}$ denotes the sequence
452: $[\mathbf{x}_1,...,\mathbf{x}_{i-1},\mathbf{x}_{i+1},...,\mathbf{x}_n]$,
453: and again, we write this as $\mathbf{x}^{-i}$ whenever $n$ is
454: clear from the context. The same conventions are followed for
455: sequences of random variables.
456:
457: Given a boolean predicate $b(\mathbf{x})$ depending on a variable
458: $\mathbf{x}$, we write $1_{\{b(\mathbf{x})\}}$ to denote
459: the indicator function for the predicate: this is a function that
460: takes the value 1 whenever $b(\mathbf{x})$ is true, and 0 whenever
461: it is false. Given a sequence $\mathbf{x}^n\in\mathcal{X}^n$,
462: and an element $x\in\mathcal{X}$, we denote by $N(x;\mathbf{x}^n)$
463: the type of $\mathbf{x}^n$, defined as
464: $N(x;\mathbf{x}^n)=\sum_{i=1}^n 1_{\{\mathbf{x}_i=x\}}$. Then,
465: for any random variable $X$, any real number $\epsilon>0$, and
466: any integer $n>0$, we denote by $T_\epsilon^n(X)$ the strongly typical
467: set of $X$ with parameters $n$ and $\epsilon$, defined as
468: \[ T_\epsilon^n(X) \;\;=\;\; \Big\{ \mathbf{x}^n\in\mathcal{X}^n \;\Big|\;
469: \forall x\in\mathcal{X}:
470: \big|\mbox{$\frac 1 n$}N(x;\mathbf{x}^n)-p_X(x)\big|
471: < \mbox{$\frac\epsilon{|\mathcal{X}|}$} \Big\}.
472: \]
473: In some situations, we need to compare typical sets defined for
474: the same set of variables, but induced by different distributions on
475: these variables. To resolve this ambiguity, we denote by
476: $T_\epsilon^n\big(X\big)[p_X]$ the typical set corresponding to a
477: distribution $p_X$. The same convention is followed when there is
478: similar ambiguity in the evaluation of entropies (denoted
479: $H\big(X\big)[p_X]$), and mutual information expressions (denoted
480: $I\big(X\wedge Y\big)[p_{XY}]$).
481:
482: Vector extensions $N(xy;\mathbf{x}^n\mathbf{y}^n)$, $T_\epsilon^n(XY)$,
483: etc., are defined by considering the same definitions as above, over a
484: suitable product alphablet $\mathcal{X}\times\mathcal{Y}$. Similarly,
485: given two random variables $X$ and $Y$, a joint probability mass
486: function $p_{XY}(xy)$,
487: and a sequence $\mathbf{y}^n$, we denote by $T_\epsilon^n(X|\mathbf{y}^n)$
488: the conditional typical set of $X$ given $\mathbf{y}^n$, defined as
489: \[ T_\epsilon^n\big(X\big|\mathbf{y}^n\big)
490: \;\;=\;\; \Big\{ \mathbf{x}^n\in\mathcal{X}^n \;\Big|\;
491: \forall x\in\mathcal{X},y\in\mathcal{Y}:
492: \big|\mbox{$\frac 1 n$}N(xy;\mathbf{x}^n\mathbf{y}^n)-p_{XY}(xy)\big|
493: < \mbox{$\frac\epsilon{|\mathcal{X}||\mathcal{Y}|}$} \Big\}.
494: \]
495: We will also consider situations where we need to refer to the set of
496: all typical sequences which are jointly typical with at least one of a
497: group. In that case, for a set $\mathbf{S}\subseteq\mathcal{Y}^n$, we
498: write
499: \[ T_\epsilon^n\big(X\big|\mathbf{S}\big)
500: \;\;=\;\; \bigcup_{\mathbf{y}^n\in\mathbf{S}}
501: T_\epsilon^n\big(X\big|\mathbf{y}^n\big).
502: \]
503:
504: Given any $\epsilon>0$, many times we require to make reference
505: to quantities which are deterministic functions of $\epsilon$, having
506: the property that as $\epsilon\to 0$, these quantities also vanish.
507: Such small quantities are denoted by $\epsilon_1$, $\epsilon_2$,
508: $\dot\epsilon$, $\ddot\epsilon$, $\epsilon'$, $\epsilon''$, etc.;
509: and the value of $\epsilon$ on which they depend is either mentioned
510: explicitly or should be clear from the context.
511:
512: Consider two random variables
513: $X$ and $Y$ with joint distribution $p(xy)$. $T_\epsilon^n\big(X)$
514: is the usual typical set. Sometimes we also need to consider
515: the set $S_{\epsilon,Y}^n(X)\triangleq\Big\{\mathbf{x}^n\,\Big|\,
516: T_\epsilon^n\big(Y\big|\mathbf{x}^n\big)\neq\emptyset\Big\}$. Clearly,
517: $S_{\epsilon,Y}^n(X)\subseteq T_\epsilon^n\big(X)$. But we also
518: know from~\cite[Ch.\ 5]{Yeung:01}, that
519: $\Big|\frac 1 n\log\big|S_{\epsilon,Y}^n(X)\big|-H(X)\Big|<\dot\epsilon$.
520: That is, although there may exist strongly typical sequences $\mathbf{x}^n$
521: for which there are no sequences $\mathbf{y}^n$ jointly typical with them,
522: these $\mathbf{x}^n$'s form a set of vanishing measure.
523:
524: Some standard operations on sets are intersection
525: ($\mathbf{A}\cap\mathbf{B}$), union ($\mathbf{A}\cup\mathbf{B}$),
526: complementation ($\mathbf{A}^c$) and difference
527: ($\mathbf{A}\backslash\mathbf{B}$). The set of all subsets of
528: $\mathbf{S}$ is denoted by $2^{\mathbf{S}}$. The convex closure of $\mathbf{S}$
529: is denoted by
530: $\overline{\mathbf{S}}=\bigcap\big\{\mathbf{S}'\;\big|\;\mathbf{S}\subseteq
531: \mathbf{S}'\,\wedge\,\mathbf{S}'\mbox{ is closed and convex}\big\}$.
532: Given a set $\mathbf{S}$, a cover of size $N$ of $\mathbf{S}$ is a
533: collection of sets $\mathcal{S}=\big\{\mathbf{S}_i:i=1...N\big\}$,
534: such that $\mathbf{S}\subseteq\bigcup_{i=1}^N\mathbf{S}_i$. If a
535: cover further satisfies that $\mathbf{S}_i\cap \mathbf{S}_j=\emptyset$
536: ($1\leq i\neq j\leq N$), and that $\mathbf{S}=\bigcup_{i=1}^N
537: \mathbf{S}_i$, then we say that $\mathcal{S}$ is a {\em partition}
538: of $\mathbf{S}$.
539:
540: Consider two sets, $\mathbf{A}$ and $\mathbf{B}$, for which
541: $P\big(\mathbf{B}\big|\mathbf{A}\big)=1$: clearly,
542: $P\big(\mathbf{A}\cap\mathbf{B}\big)=P\big(\mathbf{A}\big)$,
543: and hence $\mathbf{A}\subseteq\mathbf{B}$, except perhaps for
544: a set of measure zero. If instead we have a slightly weaker
545: condition, namely that $P\big(\mathbf{B}\big|\mathbf{A}\big)>1-\epsilon$,
546: then we say that $\mathbf{A}$ is {\em weakly included} in $\mathbf{B}$,
547: and we denote this by $\mathbf{A}\subseteq_\epsilon\mathbf{B}$.
548:
549: \subsection{Distributed Rate-Distortion Codes}
550:
551: Consider two sources $X$ and $Y$, out of which random pairs of
552: sequences $\big(X^n,Y^n\big)$ are drawn i.i.d.~$\sim p(xy)$ from two
553: finite alphabets, denoted $\mathcal{X}$ and
554: $\mathcal{Y}$, and reproduced with elements of two other alphabets
555: $\hat{\mathcal{X}}$ and $\hat{\mathcal{Y}}$. The two sources
556: $X$ and $Y$ are processed by two separate encoders. The
557: {\em encoders} are two functions:
558: \[ f_1:\; \mathcal{X}^n \;\;\to\;\; \big\{1,2,\dots,2^{nR_1}\big\}
559: \mbox{\hspace{1cm}and\hspace{1cm}}
560: f_2:\; \mathcal{Y}^n \;\;\to\;\; \big\{1,2,\dots,2^{nR_2}\big\}.
561: \]
562: These encoding functions map a block of $n$ source symbols to discrete
563: indices. The {\em decoder} is a function
564: \[ g:\;\big\{1,2,\dots,2^{nR_1}\big\}\times\big\{1,2,\dots,2^{nR_2}\big\}
565: \;\;\to\;\; \hat{\mathcal{X}}^n \times \hat{\mathcal{Y}}^n, \]
566: which maps a pair of indices into two blocks of reconstructed
567: source sequences.
568:
569: Two distortion
570: measures $d_1:\mathcal{X}\times\hat{\mathcal{X}}\to[0,\infty)$ and
571: $d_2:\mathcal{Y}\times\hat{\mathcal{Y}}\to[0,\infty)$ are used to
572: define reconstruction quality. Since $\infty$ is not in their
573: range and the alphabets are finite, these distortion measures
574: are necessarily bounded, so we denote these largest values by
575: $\max\limits_{x\in\mathcal{X},\hat x\in\hat{\mathcal{X}}}
576: d_1(x,\hat x)\triangleq d_{1,\mbox{\tiny MAX}}$,
577: $\max\limits_{y\in\mathcal{Y},\hat y\in\hat{\mathcal{Y}}}
578: d_2(y,\hat y) \triangleq d_{2,\mbox{\tiny MAX}}$, and
579: $\max\big(d_{1,\mbox{\tiny MAX}},d_{2,\mbox{\tiny MAX}}\big)
580: \triangleq d_{\mbox{\tiny MAX}}<\infty$.
581: $d_1^n\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big)
582: \triangleq\frac 1 n\sum_{i=1}^n d_1\big(x_i,\hat x_i\big)$
583: and $d_2^n\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big)
584: \triangleq\frac 1 n\sum_{i=1}^n d_2\big(y_i,\hat y_i\big)$
585: denote the corresponding extensions to blocks. Oftentimes, the
586: symbols $d_1$ and $d_2$ are used for both the single-letter
587: and the block extensions; which is the intended meaning should
588: be clear from the context. For any distortion measure
589: $d:\mathcal{X}^n\times\hat{\mathcal{X}}^n\to[0,\infty)$, an element
590: $\hat{\mathbf{x}}^n\in\hat{\mathcal{X}}^n$ and a number $D\geq 0$,
591: a ``ball'' of radius $D$ centered at $\hat{\mathbf{x}}^n$ is the
592: set $B\big(\hat{\mathbf{x}}^n,D\big)=\big\{\mathbf{x}^n\in\mathcal{X}^n
593: \,\big|\,d\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big))<D\big\}$
594: (and similarly for a ball $B\big(\hat{\mathbf{y}}^n,D\big)$).
595: For any $D$, $D^+$ is shorthand for $D+\dot\epsilon$, for an
596: $\epsilon$ that is always clear from the context.
597:
598: Fix now encoders and decoder $(f_1,f_2,g)$ operating on blocks of length
599: $n$, and a real number $\epsilon>0$. If we have that
600: \begin{equation}
601: P\Big( \Big\{ \big(\mathbf{x}^n\mathbf{y}^n\big) \;\Big|\;
602: \big(\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)
603: = g\big(f_1(\mathbf{x}^n),f_2(\mathbf{y}^n)\big) \,\wedge\,
604: d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big)
605: < D_1^+ \,\wedge\,
606: d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big)
607: < D_2^+ \Big\}
608: \Big) \;\;\geq\;\; 1-\dot{\epsilon},
609: \label{eq:distortion-constraint}
610: \end{equation}
611: then we say that $(f_1,f_2,g)$ satisfies the $(\epsilon,D_1,D_2)$-distortion
612: constraint.\footnote{This form of a distortion constraint is referred
613: to as an {\it $\epsilon$-fidelity criterion} in~\cite[pg.\ 123]{CsiszarK:81}.
614: An alternative form to this ``local'' condition is given by requiring a
615: ``global'' average constraint of the form
616: $\expect{d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big)}<D_1^+$ and
617: $\expect{d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big)}<D_2^+$. For
618: the purpose of our developments, the local form lends itself more
619: readily to analysis, and hence is the one we adopt.}
620:
621: \subsection{Achievable Rates}
622:
623: A $\big(2^{nR_1},2^{nR_2},n,\epsilon,D_1,D_2\big)$ distributed
624: rate-distortion code is defined by a block length $n$, a
625: parameter $\epsilon>0$, two encoding functions $f_1$ and $f_2$
626: with ranges of size $2^{nR_1}$ and $2^{nR_2}$, and a decoding
627: function $g$, such that $(f_1,f_2,g)$ satisfies the
628: $\big(\epsilon,D_1,D_2\big)$-distortion constraints.
629:
630: We say that the rate-distortion tuple $(R_1,R_2,D_1,D_2)$ is
631: $\epsilon$-{\em achievable} if a
632: $\big(2^{nR_1},2^{nR_2},n,\epsilon,D_1,D_2\big)$ distributed
633: code exists; for fixed parameters $\big(\epsilon,D_1,D_2\big)$,
634: we denote the set of all $\epsilon$-achievable pairs $(R_1,R_2)$
635: by $\mathcal{R}_\epsilon(D_1,D_2)$. Then, the {\em rate region}
636: ${\cal R}^*(D_1,D_2)$ of the two sources is defined by
637: \[ \mathcal{R}^*(D_1,D_2)
638: \;\;\triangleq\;\; \bigcap_{\epsilon>0}\,\mathcal{R}_\epsilon(D_1,D_2).
639: \]
640:
641: Now we are going to describe a different set of rates. Define
642: $\mathbb{P}_{\mbox{\tiny LB}}$ to be the set of all probability
643: distributions $p(xy\hat x\hat y)$ over
644: $\mathcal{X}\times\mathcal{Y}\times\hat{\mathcal{X}}\times\hat{\mathcal{Y}}$,
645: such that:
646: \begin{itemize}
647: \item $p(xy\hat x\hat y)=p(\hat x\hat y)p(x|\hat x\hat y)p(y|\hat x\hat y)$
648: (that is, $X-\hat X\hat Y-Y$ forms a Markov chain);
649: \item $p_{XY}=\sum_{\hat x\hat y}
650: p(\hat x\hat y)p(x|\hat x\hat y)p(y|\hat x\hat y)$ ($p_{XY}$ is the source);
651: \item and $\expect{d_1\big(X,\hat X\big)}\leq D_1$ and
652: $\expect{d_2\big(Y,\hat Y\big)}\leq D_2$.
653: \end{itemize}
654: Then, for each $p\in\mathbb{P}_{\mbox{\tiny LB}}$, define
655: \[ \mathcal{R}\big(D_1,D_2,p\big)\;\;\triangleq\;\;
656: \left\{ (R_1,R_2)\;\left|\;\begin{array}{rcl}
657: R_1 & \geq & I\big(X\wedge\hat X\hat Y\big|Y\big)[p] \\
658: R_2 & \geq & I\big(Y\wedge\hat X\hat Y\big|X\big)[p] \\
659: R_1+R_2 & \geq & I\big(XY\wedge\hat X\hat Y\big)[p]
660: \end{array}\right.\right\},
661: \]
662: and define also $\mathcal{R}^o(D_1,D_2)\triangleq
663: \bigcup_{p\in\mathbb{P}_{\mbox{\tiny LB}}}\mathcal{R}\big(D_1,D_2,p\big)$.
664: Now we are ready to state our outer bound.
665:
666: \subsection{Statement of an Outer Bound}
667:
668: \medskip
669: \begin{center}\textcolor{gray}{\fbox{\begin{minipage}{16cm}
670: \vspace{-4mm}\textcolor{black}{\begin{theorem}
671: \label{thm:main}
672: \[ \mathcal{R}^*\big(D_1,D_2\big)\;\;\subseteq\;\;
673: \overline{\mathcal{R}^o(D_1,D_2)}.
674: \]
675: \rend
676: \end{theorem}}\end{minipage}}}\end{center}\medskip
677:
678: The proof of this theorem is given in Section~\ref{sec:main-proof}.
679: Before that, and next in Section~\ref{sec:aux-lemmas}, we develop a
680: number of observations and auxliary results to be used in the main
681: proof.
682:
683:
684: \section{Some Useful Observations and Auxiliary Results}
685: \label{sec:aux-lemmas}
686:
687: \subsection{Distributed Rate-Distortion Codes as Constrained Source Covers}
688:
689: \subsubsection{Distributed Source Covers}
690:
691: An equivalent representation for a generic
692: $(2^{nR_1},2^{nR_2},n,\epsilon,D_1,D_2)$ code is given as follows:
693: \begin{itemize}
694: \item Two covers:
695: $\mathcal{S}_1 = \big\{ \mathbf{S}_{1,i} : i=1...2^{nR_1} \big\}$
696: of $\mathcal{X}^n$,
697: and $\mathcal{S}_2 = \big\{ \mathbf{S}_{2,j} : j=1...2^{nR_2} \big\}$
698: of $\mathcal{Y}^n$.
699: Any code with encoders $f_1$ and $f_2$ can be represented in terms
700: of two such covers, by considering $f_1^{-1}(i)= \mathbf{S}_{1,i}$ and
701: $f_2^{-1}(j)=\mathbf{S}_{2,j}$.\footnote{Note that, strictly speaking,
702: this definition is correct only when $\mathcal{S}$ is a partition.
703: Occasionally we might abuse the notation and still refer to the code
704: specified by a cover, with the understanding that in such cases ties
705: (of the form of a source sequence being part of two different cover
706: elements) are broken arbitrarily. This should not cause any confusion.} \\
707: (Note: these two covers define a cover $\mathcal{S}=\big(\mathcal{S}_1,
708: \mathcal{S}_2\big)$ of $\mathcal{X}^n\times \mathcal{Y}^n$, with elements
709: $\mathbf{S}_{ij} \;=\; \mathbf{S}_{1,i}\times \mathbf{S}_{2,j}$,
710: for $(i,j)\in\{1...2^{nR_1}\}\times\{1...2^{nR_2}\}$.)
711: \item A pair of reconstruction sequences $\big(\hat{\mathbf{x}}^n(ij),
712: \hat{\mathbf{y}}^n(ij)\big)=g(i,j)$ associated to each cover element
713: $\mathbf{S}_{ij}$ of the product source, for all
714: $(i,j)\in\{1...2^{nR_1}\}\times\{1...2^{nR_2}\}$.
715: \end{itemize}
716:
717: In general, whenever we refer to a distributed rate-distortion code,
718: we use interchangeably the earlier representation in terms of two
719: encoders and one decoder, and this representation in terms of covers.
720:
721: \subsubsection{Distributed Typical Sets}
722:
723: As highlighted in the Introduction, it turns out that covers
724: $\mathbf{S}_{ij}$ of the product source $\mathcal{X}^n\times\mathcal{Y}^n$
725: are constrained beyond the requirements imposed by the fidelity
726: criteria. That ``extra'' structure is described by
727: Proposition~\ref{prp:distributed-typicality}.
728:
729: \medskip
730: \begin{center}\textcolor{gray}{\fbox{\begin{minipage}{16cm}
731: \vspace{-4mm}\textcolor{black}{\begin{proposition}
732: \label{prp:distributed-typicality}
733: For any cover $\mathcal S$ of $\mathcal{X}^n\times\mathcal{Y}^n$
734: defined by some $(2^{nR_1},2^{nR_2},n,\epsilon,D_1,D_2)$ distributed
735: rate-distortion code, and for any
736: $(i,j)\in\{1...2^{nR_1}\}\times\{1...2^{nR_2}\}$,
737: $\mathbf{x}^n\in\mathbf{S}_{1,i}$ and $\mathbf{y}^n\in\mathbf{S}_{2,j}$,
738: then it must be the case that
739: either $(\mathbf{x}^n\mathbf{y}^n)\in\mathbf{S}_{ij}\cap
740: T_\epsilon^n\big(XY\big)$ or $(\mathbf{x}^n\mathbf{y}^n)\not\in
741: T_\epsilon^n\big(XY\big)$. \rend
742: \end{proposition}}\end{minipage}}}\end{center}\medskip
743:
744: {\it Proof.} This is rather straightforward. Take any
745: $\mathbf{x}^n\in\mathbf{S}_{1,i}$ and $\mathbf{y}^n\in\mathbf{S}_{2,j}$.
746: Then:
747: \begin{itemize}
748: \item by construction,
749: $\big(\mathbf{x}^n\mathbf{y}^n\big)\in\mathbf{S}_{ij}$;
750: \item either $\big(\mathbf{x}^n\mathbf{y}^n\big)\in
751: T_\epsilon^n\big(XY\big)$ or
752: $\big(\mathbf{x}^n\mathbf{y}^n\big)\not\in
753: T_\epsilon^n\big(XY\big)$ -- a tautology;
754: \item if $\big(\mathbf{x}^n\mathbf{y}^n\big)\in
755: T_\epsilon^n\big(XY\big)$, then $\big(\mathbf{x}^n\mathbf{y}^n\big)\in
756: \mathbf{S}_{ij}\cap T_\epsilon^n\big(XY\big)$, and therefore
757: the proposition is proved;
758: \item and if instead, $\big(\mathbf{x}^n\mathbf{y}^n\big)\not\in
759: T_\epsilon^n\big(XY\big)$, then the proposition is proved too.
760: \tend
761: \end{itemize}
762: \medskip
763:
764: Proposition~\ref{prp:distributed-typicality} formally states the
765: property of covers arising from distributed codes discussed informally
766: in the Introduction (cf.~Sec.~\ref{sec:intro-distributed-covers}): all
767: combinations of an $\mathbf{x}^n$ sequence in $\mathbf{S}_{1,i}$ and
768: a $\mathbf{y}^n$ sequence in $\mathbf{S}_{2,j}$, if they are jointly
769: typical, must appear in $\mathbf{S}_{ij}\cap T_\epsilon^n\big(XY\big)$
770: -- the decoder does not have enough information to discriminate among
771: such pairs.
772:
773: We now introduce a new definition.
774: Consider any subset $\mathbf{S}\subseteq T_\epsilon^n\big(XY\big)$
775: for which, for any $(\mathbf{x}^n,\mathbf{y}_1^n)\in\mathbf{S}$ and
776: $(\mathbf{x}_1^n,\mathbf{y}^n)\in\mathbf{S}$, we have that either
777: $(\mathbf{x}^n\mathbf{y}^n)\in\mathbf{S}$ or
778: $(\mathbf{x}^n\mathbf{y}^n)\not\in T_\epsilon^n\big(XY\big)$
779: -- that is, the property of Prop.~\ref{prp:distributed-typicality}
780: holds for $\mathbf{S}$. In this case, we say that $\mathbf{S}$ is
781: is a {\em distributed} typical set.
782:
783: Clearly there are ``interesting'' distributed typical sets, the
784: concept is not vacuous:
785: \begin{itemize}
786: \item all sets of the form $\mathbf{S} = \{ (\mathbf{x}^n\mathbf{y}^n) \}$,
787: with $(\mathbf{x}^n\mathbf{y}^n)\in T_\epsilon^n\big(XY\big)$,
788: are distributed typical sets;
789: \item for any $\mathbf{S}_1\subseteq\mathcal{X}^n$ and any
790: $\mathbf{S}_2\subseteq\mathcal{Y}^n$,
791: $\mathbf{S}\triangleq\big[\mathbf{S}_1\!\times\!\mathbf{S}_2\big]\cap
792: T_\epsilon^n\big(XY\big)$ is a distributed typical set.
793: \end{itemize}
794: The last example provides a natural way of systematically constructing
795: distributed typical sets.
796:
797: \subsubsection{Source Covers Made of Distributed Typical Sets}
798:
799: We show next that in multiterminal source coding, the source must
800: be covered with distributed typical sets in which each of the two
801: components of the set gets specified by a different encoder.
802:
803: Consider a length $n$ $\big(f_1,f_2,g\big)$ code, satisfying the
804: $(\epsilon,D_1,D_2)$-distortion constraint of
805: eqn.~\eqref{eq:distortion-constraint}:
806: \begin{eqnarray*}
807: \lefteqn{P\Big( \Big\{ \big(\mathbf{x}^n\mathbf{y}^n\big) \;\Big|\;
808: \big(\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)
809: = g\big(f_1(\mathbf{x}^n),f_2(\mathbf{y}^n)\big) \,\wedge\,
810: d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big)
811: < D_1^+ \,\wedge\,
812: d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big)
813: < D_2^+ \Big\}
814: \Big)} \\
815: & \stackrel{(a)}{=} &
816: P\Big( \Big\{ \big(\mathbf{x}^n\mathbf{y}^n\big) \;\Big|\;
817: \big(\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)
818: = g\big(f_1(\mathbf{x}^n),f_2(\mathbf{y}^n)\big) \,\wedge\,
819: d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big)
820: < D_1^+ \,\wedge\,
821: d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big)
822: < D_2^+ \Big\}
823: \cap\bigcup_{(i,j)}\mathbf{S}_{ij}
824: \Big) \\
825: & = & P\Big(
826: \bigcup_{(i,j)}
827: \Big\{ \big(\mathbf{x}^n\mathbf{y}^n\big) \;\Big|\;
828: \big(\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)
829: = g\big(f_1(\mathbf{x}^n),f_2(\mathbf{y}^n)\big) \,\wedge\,
830: d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big)
831: < D_1^+ \,\wedge\,
832: d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big)
833: < D_2^+ \Big\}
834: \cap\mathbf{S}_{ij}
835: \Big) \\
836: & \stackrel{(b)}{=} & P\Big(
837: \bigcup_{(i,j)}
838: \Big\{ \big(\mathbf{x}^n\mathbf{y}^n\big) \;\Big|\;
839: d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n(ij)\big)<D_1^+
840: \,\wedge\,\mathbf{x}^n\in\mathbf{S}_{1,i}
841: \,\wedge\, d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n(ij)\big)<D_2^+
842: \,\wedge\,\mathbf{y}_2^n\in\mathbf{S}_{2,j} \Big\}
843: \Big) \\
844: & = & P\Big( \bigcup_{(i,j)}
845: \big[\mathbf{S}_{1,i}\!\times\!\mathbf{S}_{2,j}\big]
846: \cap
847: \big[B\big(\hat{\mathbf{x}}^n(ij),D_1^+\big)
848: \!\times\!B\big(\hat{\mathbf{y}}^n(ij),D_2^+\big)\big]
849: \,\Big) \\
850: & \stackrel{(c)}{\geq} & 1-\dot{\epsilon},
851: \end{eqnarray*}
852: where (a) follows from
853: $\big\{ \big(\mathbf{x}^n\mathbf{y}^n\big)\,\Big|\,
854: \big(\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)
855: = g\big(f_1(\mathbf{x}^n),f_2(\mathbf{y}^n)\big) \,\wedge\,
856: d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big) < D_1^+ \,\wedge\,
857: d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big) < D_2^+ \big\}
858: \;\subseteq\;\mathcal{X}^n\times\mathcal{Y}^n
859: \;\subseteq\;\bigcup_{(i,j)} \mathbf{S}_{ij}$;
860: (b) follows from $\mathbf{S}_{ij}=\mathbf{S}_{1,i}\times\mathbf{S}_{2,j}$;
861: and (c) follows from the fact
862: that the code under consideration satisfies the distortion constraint
863: of eqn.~\eqref{eq:distortion-constraint}. We also know, from basic
864: properties of typical sets, that
865: \[ P\Big( T_\epsilon^n\big(XY\big) \Big) \;\;\geq\;\; 1-\epsilon,
866: \]
867: and so, if we define $\tilde{\mathbf{S}}_{ij}\triangleq
868: \big[\mathbf{S}_{1,i}\times\mathbf{S}_{2,j}\big]\cap
869: T_\epsilon^n\big(XY\big)$, we see that
870: \begin{eqnarray}
871: \lefteqn{P\Big( \bigcup_{(i,j)}
872: \big[\mathbf{S}_{1,i}\!\times\!\mathbf{S}_{2,j}\big]
873: \cap
874: \big[B\big(\hat{\mathbf{x}}^n(ij),D_1^+\big)
875: \!\times\!B\big(\hat{\mathbf{y}}^n(ij),D_2^+\big)\big]
876: \cap
877: T_\epsilon^n\big(XY\big) \,\Big)} \nonumber
878: \hspace{4cm} \\
879: & = & P\left( \bigcup_{(i,j)}
880: \tilde{\mathbf{S}}_{ij} \cap
881: \big[B\big(\hat{\mathbf{x}}^n(ij),D_1^+\big)
882: \!\times\!B\big(\hat{\mathbf{y}}^n(ij),D_2^+\big)\big]
883: \right) \nonumber \\
884: & \geq & 1-\ddot\epsilon;
885: \label{eq:distortion-constraint-2}
886: \end{eqnarray}
887: that is, since $\tilde{\mathbf{S}}_{ij}$ is a distributed typical set,
888: the source must be covered with the fraction of such sets contained in
889: pairs of balls centered at the reconstruction sequences; furthermore,
890: we note that each component of the distributed typical set must be
891: specified completely by each encoder.
892:
893: % These constraints on the structure of source covers are significant,
894: % and they were not captured by any previous outer bounds. The main task
895: % ahead of us then is to make use of this newly discovered structure to
896: % prove a better outer bound.
897:
898: \subsection{The ``Reverse'' Markov Lemma}
899: \label{sec:reverse-markov-lemma}
900:
901: \subsubsection{The Standard Form}
902:
903: Lemma~\ref{lemma:markov} is the Markov lemma as stated
904: in~\cite[pg.\ 202]{Berger:78}, in our own notation.
905:
906: \medskip\begin{lemma}[Markov]
907: \label{lemma:markov}
908: Consider a Markov chain of the form $X-Z-Y$. Then, for all $\epsilon>0$,
909: \[ \lim_{n\to\infty}
910: P\Big( \big(X^n,\mathbf{y}^n\big)\in T_\epsilon^n\big(XY\big)
911: \;\Big|\;
912: \big(Z^n,\mathbf{y}^n\big)\in T_\epsilon^n\big(ZY\big)
913: \Big) \;\;=\;\; 1,
914: \]
915: for any sequence $\mathbf{y}^n\in\mathcal{Y}^n$.
916: \rend
917: \end{lemma}\medskip
918:
919: The lemma says that for {\em every} $\mathbf{y}^n\in\mathcal{Y}^n$,
920: {\em if} the random vector
921: $\big(Z^n,\mathbf{y}^n\big)\in T_\epsilon^n\big(ZY\big)$, {\em then}
922: the random vector $\big(X^n,\mathbf{y}^n\big)\in T_\epsilon^n\big(XY\big)$,
923: with high probability. This is not true in general: if we have two pairs
924: of sequences $\big(\mathbf{x}^n\mathbf{z}^n\big)\in T_\epsilon^n\big(XZ\big)$
925: and $\big(\mathbf{z}^n\mathbf{y}^n\big)\in T_\epsilon^n\big(ZY\big)$, it
926: is not always the case that
927: $\big(\mathbf{x}^n\mathbf{z}^n\mathbf{y}^n\big)\in T_\epsilon^n\big(XZY\big)$,
928: and therefore that
929: $\big(\mathbf{x}^n\mathbf{y}^n\big)\in T_\epsilon^n\big(XY\big)$; that
930: is, joint typicality is {\em not} a transitive relation. However,
931: if $X-Z-Y$ forms a Markov chain, and then only in a high probability
932: sense, said transitivity property holds.
933:
934: \subsubsection{A Converse Statement}
935:
936: We are interested in a converse form of the Markov lemma. Suppose
937: we are given an arbitrary distribution $p(xyz)$, whose typical
938: sets satisfy the constraints imposed by the Markov lemma: can we say
939: that $p$ itself must be a Markov chain? It turns out the answer is
940: {\em almost yes} -- if some arbitrary distribution $p$ induces typical
941: sets like those of a Markov chain, then there must exist a Markov
942: chain $p'$ within $L_1$ distance $2\epsilon$ of $p$. This statement
943: is made precise in the following lemma.
944:
945: \medskip\begin{center}\textcolor{gray}{\fbox{\begin{minipage}{16cm}
946: \vspace{-4mm}\textcolor{black}{\begin{lemma}[Reverse Markov]
947: \label{lemma:reverse-markov}
948: Fix $n$, $\epsilon>0$. Consider any distribution
949: $p(xyz)$ for which, for some $\mathbf{z}^n$,
950: \[
951: T_\epsilon^n\big(X\big|\mathbf{z}^n\big)[p]
952: \times T_\epsilon^n\big(Y\big|\mathbf{z}^n\big)[p]
953: \;\;=\;\; T_\epsilon^n\big(XY\big|\mathbf{z}^n\big)[p].
954: \]
955: Define a Markov chain $p'(xyz)=p(z)p(x|z)p(y|z)$, with the components
956: $p(z)$, $p(x|z)$ and $p(y|z)$ taken from the given $p(xyz)$. Then,
957: $\big|\big|p-p'\big|\big|_1\,<\,2\epsilon$.
958: \rend
959: \end{lemma}}\end{minipage}}}\end{center}\medskip
960:
961: {\it Proof.}
962: Consider any $\mathbf{z}^n$ for which
963: $T_\epsilon^n\big(XY\big|\mathbf{z}^n\big)[p]\neq\emptyset$.
964: Since $p'$ is a Markov chain, from the direct form of the Markov
965: lemma we know that
966: \[
967: T_\epsilon^n\big(X\big|\mathbf{z}^n\big)[p']
968: \times T_\epsilon^n\big(Y\big|\mathbf{z}^n\big)[p']
969: \;\;\subseteq_{\epsilon'}\;\;
970: T_\epsilon^n\big(XY\big|\mathbf{z}^n\big)[p'];
971: \]
972: and clearly,
973: $\emptyset\neq
974: T_\epsilon^n\big(XY\big|\mathbf{z}^n\big)[p]
975: =
976: T_\epsilon^n\big(X\big|\mathbf{z}^n\big)[p]
977: \times T_\epsilon^n\big(Y\big|\mathbf{z}^n\big)[p]
978: =
979: T_\epsilon^n\big(X\big|\mathbf{z}^n\big)[p']
980: \times T_\epsilon^n\big(Y\big|\mathbf{z}^n\big)[p']$,
981: since we choose $p'$ to coincide with $p$ on the corresponding marginals,
982: and from our choice of $\mathbf{z}^n$. So, this last inclusion can be
983: written as
984: \[
985: T_\epsilon^n\big(X\big|\mathbf{z}^n\big)[p]
986: \times T_\epsilon^n\big(Y\big|\mathbf{z}^n\big)[p]
987: \;\;\subseteq_{\epsilon'}\;\;
988: T_\epsilon^n\big(XY\big|\mathbf{z}^n\big)[p'],
989: \]
990: and therefore we see that
991: \[
992: \emptyset \;\;\neq\;\;
993: T_\epsilon^n\big(X\big|\mathbf{z}^n\big)[p]
994: \times T_\epsilon^n\big(Y\big|\mathbf{z}^n\big)[p]
995: \;\;\subseteq_{\epsilon'}\;\;
996: T_\epsilon^n\big(XY\big|\mathbf{z}^n\big)[p]
997: \cap T_\epsilon^n\big(XY\big|\mathbf{z}^n\big)[p'];
998: \]
999: thus, there must exist at least one triplet of sequences
1000: $\big(\mathbf{x}^n\mathbf{y}^n\mathbf{z}^n\big)$ that
1001: is jointly typical under both $p$ and $p'$. So for these particular
1002: sequences, it follows from the definition of strong typicality that
1003: both
1004: \[ \forall xyz: \big|\mbox{$\frac 1 n$}N\big(xyz;
1005: \mathbf{x}^n\mathbf{y}^n\mathbf{z}^n\big)-
1006: p(xyz)\big|\,<\,\mbox{$\frac\epsilon{|\mathcal{X}|
1007: |\mathcal{Y}||\mathcal{Z}|}$}
1008: \;\textrm{ and }\;
1009: \forall xyz: \big|\mbox{$\frac 1 n$}N\big(xyz;
1010: \mathbf{x}^n\mathbf{y}^n\mathbf{z}^n\big)-
1011: p'(xyz)\big|\,<\,\mbox{$\frac\epsilon{|\mathcal{X}|
1012: |\mathcal{Y}||\mathcal{Z}|}$},
1013: \]
1014: and therefore the $L_1$ norm of $p-p'$ can be written as
1015: \begin{eqnarray*}
1016: \big|\big|p'-p\big|\big|_1
1017: & = & \sum_{xyz}\big|p(xyz)-p'(xyz)\big| \\
1018: & = & \sum_{xyz}\big|p(xyz)-\mbox{$\frac 1 n$}
1019: N\big(xyz;\mathbf{x}^n\mathbf{y}^n\mathbf{z}^n\big)+\mbox{$\frac 1 n$}
1020: N\big(xyz;\mathbf{x}^n\mathbf{y}^n\mathbf{z}^n\big)-p'(xyz)\big| \\
1021: & \leq & \sum_{xyz}\big|\mbox{$\frac 1 n$}N\big(xyz;
1022: \mathbf{x}^n\mathbf{y}^n\mathbf{z}^n\big)
1023: -p(xyz)\big|
1024: +\sum_{xyz}\big|\mbox{$\frac 1 n$}
1025: N\big(xyz;\mathbf{x}^n\mathbf{y}^n\mathbf{z}^n\big)-p'(xyz)\big| \\
1026: & < & 2\epsilon,
1027: \end{eqnarray*}
1028: thus proving the lemma.
1029: \tend\bigskip
1030:
1031: Our interest in this question stems from the fact that, from the
1032: requirement to cover a product source with distributed typical sets,
1033: we do get constraints on the shape of various typical sets. So we
1034: need to characterize what distributions can give rise to those sets,
1035: and this lemma plays an important role in that.
1036:
1037:
1038: \subsection{Upper Bounds on the Size of Distributed Typical Cover Elements}
1039:
1040: \medskip
1041: \begin{center}\textcolor{gray}{\fbox{\begin{minipage}{16cm}
1042: \vspace{-4mm}\textcolor{black}{\begin{lemma}
1043: \label{lemma:bound-size}
1044: Consider any $\big(2^{nR_1},2^{nR_2},n,\epsilon,D_1,D_2\big)$ distributed
1045: rate-distortion code, represented by a cover $\mathcal{S}$. Then, there
1046: exists a distribution $\pi\in\mathbb{P}_{\mbox{\tiny LB}}$ such that, for
1047: all $(i,j)\in\{1...2^{nR_1}\}\times\{1...2^{nR_2}\}$ and all $\epsilon>0$,
1048: \[ \big|\mathbf{S}_{ij}\,\cap\,T_\epsilon^n\big(XY\big)\big|
1049: \;\;\leq\;\;
1050: 2^{n(H(XY|\hat X\hat Y)[\pi]+\ddot\epsilon)},
1051: \]
1052: provided $n$ is large enough. Furthermore, for all
1053: $\mathbf{y}^n\in\mathcal{Y}^n$,
1054: \[ \big|\mathbf{S}_{1,i}\cap T_\epsilon^n\big(X\big|\mathbf{y}^n\big)\big|
1055: \;\;\leq\;\;
1056: 2^{n(H(X|\hat X\hat YY)[\pi]+\ddot\epsilon')},
1057: \]
1058: and similarly for all $\mathbf{x}^n\in\mathcal{X}^n$,
1059: \[ \big|\mathbf{S}_{2,j}\cap T_\epsilon^n\big(Y\big|\mathbf{x}^n\big)\big|
1060: \;\;\leq\;\;
1061: 2^{n(H(Y|\hat X\hat YX)[\pi]+\ddot\epsilon'')},
1062: \]
1063: also provided $n$ is large enough.
1064: \rend
1065: \end{lemma}}\end{minipage}}}\end{center}\medskip
1066:
1067: {\it Proof.} From the two-terminal rate-distortion
1068: theorem~\cite[Thm.\ 2.2.3]{CsiszarK:81}, we know there exists a
1069: distribution $p(xy\hat x\hat y)=p(xy)p(\hat x\hat y|xy)$, with
1070: $p(xy)$ the given source, $\expect{d_1\big(X,\hat X\big)}\leq D_1$
1071: and $\expect{d_2\big(Y,\hat Y\big)}\leq D_2$, and
1072: sequences $\hat{\mathbf{x}}^n(ij)$ and $\hat{\mathbf{y}}^n(ij)$
1073: such that, for all
1074: $(i,j)\in\{1...2^{nR_1}\}\times\{1...2^{nR_2}\}$ and all $\epsilon>0$,
1075: \begin{equation}
1076: \tilde{\mathbf{S}}_{ij}
1077: \;\;\subseteq\;\;
1078: T_\epsilon^n\big(XY\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big),
1079: \label{eq:const-std-rd}
1080: \end{equation}
1081: provided $n$ is large enough. But since for distributed codes we
1082: have $\tilde{\mathbf{S}}_{ij}=
1083: \big[\mathbf{S}_{1,i}\times\mathbf{S}_{2,j}\big]\cap T_\epsilon^n\big(XY\big)$,
1084: it follows from standard properties of typical sets that
1085: \[ \mathbf{S}_{1,i}\cap T_\epsilon^n\big(X\big|\mathbf{S}_{2,j}\big)
1086: \;\;\subseteq\;\;
1087: T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)
1088: \mbox{\hspace{1cm}and\hspace{1cm}}
1089: \mathbf{S}_{2,j}\cap T_\epsilon^n\big(Y\big|\mathbf{S}_{1,i}\big)
1090: \;\;\subseteq\;\;
1091: T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big).
1092: \]
1093: Consider now a new cover $\mathcal{S}'$, having the property that
1094: \[ \mathbf{S}'_{1,i}\cap T_\epsilon^n\big(X\big|\mathbf{S}'_{2,j}\big)
1095: \;\;=\;\;
1096: T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)
1097: \mbox{\hspace{1cm}and\hspace{1cm}}
1098: \mathbf{S}'_{2,j}\cap T_\epsilon^n\big(X\big|\mathbf{S}'_{1,i}\big)
1099: \;\;=\;\;
1100: T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big).
1101: \]
1102: A simple expression for the cover element $\mathbf{S}'_{1,i}$ is obtained
1103: as follows. Fix an index $i\in\{1...2^{nR_1}\}$:
1104: \[\begin{array}{lrcl}
1105: & \forall k: \mathbf{S}'_{1,i}\cap
1106: T_\epsilon^n\big(X\big|\mathbf{S}'_{2,k}\big)
1107: & =
1108: & T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ik)\hat{\mathbf{y}}^n(ik)\big)
1109: \\
1110: \Rightarrow\hspace{6mm}
1111: & \bigcup_{k=1}^{2^{nR_2}}
1112: \mathbf{S}'_{1,i}\cap T_\epsilon^n\big(X\big|\mathbf{S}'_{2,k}\big)
1113: & =
1114: & \bigcup_{k=1}^{2^{nR_2}}
1115: T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ik)\hat{\mathbf{y}}^n(ik)\big)
1116: \\
1117: \Rightarrow
1118: & \mathbf{S}'_{1,i}\cap \bigcup_{k=1}^{2^{nR_2}}
1119: T_\epsilon^n\big(X\big|\mathbf{S}'_{2,k}\big)
1120: & =
1121: & \bigcup_{k=1}^{2^{nR_2}}
1122: T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ik)\hat{\mathbf{y}}^n(ik)\big)
1123: \\
1124: \Rightarrow
1125: & \mathbf{S}'_{1,i}\cap S_{\epsilon,Y}^n\big(X\big)
1126: & =
1127: & \bigcup_{k=1}^{2^{nR_2}}
1128: T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ik)\hat{\mathbf{y}}^n(ik)\big),
1129: \end{array}\]
1130: and since $P\big(S_{\epsilon,Y}^n\big(X\big)\big)>1-\dot\epsilon$,
1131: $\mathbf{S}'_{1,i}$ is determined up to a set of vanishing measure;
1132: similarly, fixing $j\in\{1...2^{nR_2}\}$, we get
1133: $\mathbf{S}'_{2,j}\cap S_{\epsilon,X}^n\big(Y\big) = \bigcup_{l=1}^{2^{nR_1}}
1134: T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n(lj)\hat{\mathbf{y}}^n(lj)\big)$.
1135:
1136: The new cover $\mathcal{S}'$ has some useful properties:
1137: \begin{itemize}
1138: \item for all $(i,j)$, $\mathbf{S}_{1,i}\cap S_{\epsilon,Y}^n\big(X\big)
1139: \subseteq\mathbf{S}'_{1,i}\cap S_{\epsilon,Y}^n\big(X\big)$ and
1140: $\mathbf{S}_{2,j}\cap S_{\epsilon,X}^n\big(Y\big)\subseteq
1141: \mathbf{S}'_{2,j}\cap S_{\epsilon,X}^n\big(Y\big)$, and therefore
1142: $\tilde{\mathbf{S}}_{ij}\subseteq\tilde{\mathbf{S}}'_{ij}$ as
1143: well, by construction;
1144: \item for all $\big(\mathbf{x}^n\mathbf{y}^n\big)\in\tilde{\mathbf{S}}'_{ij}$,
1145: $d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n(ij)\big)<D_1^+$ and
1146: $d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n(ij)\big)<D_2^+$, from the
1147: joint typicality conditions defining $\mathbf{S}'_{1,i}$ and
1148: $\mathbf{S}'_{2,j}$;
1149: \item and $P\Big(\bigcup_{ij}\tilde{\mathbf{S}}'_{ij}\Big) \geq
1150: P\Big(\bigcup_{ij}\tilde{\mathbf{S}}_{ij}\Big) > 1-\dot\epsilon$;
1151: \end{itemize}
1152: so, $\mathcal{S}'$ ``dominates'' $\mathcal{S}$ (in that every element
1153: in $\mathcal{S}$ is contained in one element of $\mathcal{S}'$), and
1154: $\mathcal{S}'$ satisfies the same distortion constraints that $\mathcal{S}$
1155: does. Therefore, an upper bound on the size of the elements in the new
1156: cover $\mathcal{S}'$ is also an upper bound on the size of the elements
1157: in the given cover $\mathcal{S}$.
1158:
1159: Next we observe that new cover element $\tilde{\mathbf{S}}'_{ij}$ can be
1160: ``sandwiched'' in between two other terms:
1161: \begin{eqnarray*}
1162: \Big[T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)
1163: \times
1164: T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)
1165: \Big]\cap T_\epsilon^n\big(XY\big)
1166: & \stackrel{(a)}{\subseteq} &
1167: \big[\mathbf{S}'_{1,i}\times\mathbf{S}'_{2,j}\big]
1168: \cap T_\epsilon^n\big(XY\big) \\
1169: & \stackrel{(b)}{\subseteq} &
1170: T_\epsilon^n\big(XY\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big),
1171: \end{eqnarray*}
1172: where (a) follows from our choice of $\mathbf{S}'_{1,i}$ and
1173: $\mathbf{S}'_{2,j}$, and from elementary algebra of sets; and (b)
1174: follows from eqn.~\eqref{eq:const-std-rd}, and from the product form
1175: of distributed covers. So, since the other inclusion always holds,
1176: \[
1177: \Big[
1178: T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)
1179: \times
1180: T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)
1181: \Big]\cap T_\epsilon^n\big(XY\big)
1182: \;\;=\;\;
1183: T_\epsilon^n\big(XY\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)
1184: \]
1185: is a necessary condition on any suitable distribution $p(xy\hat x\hat y)$
1186: whose typical sets can be used to construct the cover $\mathcal{S}'$; or
1187: equivalently, since this must hold for every $(i,j)$,
1188: \[ \Big[
1189: T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)
1190: \times
1191: T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)
1192: \Big]\cap T_\epsilon^n\big(XY\big)
1193: \;\;=\;\;
1194: T_\epsilon^n\big(XY\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big),
1195: \]
1196: for any sequences $\hat{\mathbf{x}}^n$ and $\hat{\mathbf{y}}^n$ such
1197: that $T_\epsilon^n\big(XY\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)
1198: \neq\emptyset$. Finally we note that this last condition is equivalent
1199: to
1200: \begin{equation}
1201: T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)
1202: \times
1203: T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)
1204: \;\;=\;\;
1205: T_\epsilon^n\big(XY\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big).
1206: \label{eq:const-typsets-1}
1207: \end{equation}
1208: This is because this last equality already forces any $\mathbf{x}^n
1209: \in T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)$
1210: and $\mathbf{y}^n\in
1211: T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)$ to
1212: be jointly typical. Therefore, from the reverse Markov lemma, we
1213: conclude there exists a distribution $\pi(xy\hat x\hat y)$, which
1214: satisfies a Markov chain of the form $X-\hat X\hat Y-Y$, such that
1215: $\big|\big|p-\pi\big|\big|_1<2\epsilon$.
1216:
1217: \centerline{---------------------}
1218:
1219: Next we observe that if $\big|\big|p-\pi\big|\big|_1<2\epsilon$,
1220: then conditionals and marginals of $p$ and of $\pi$ are also close.
1221: Consider, for example,
1222: $p_{\hat X\hat Y}(\hat x\hat y)=\sum_{xy}p_{XY\hat X\hat Y}(xy\hat x\hat y)$
1223: and $\pi_{\hat X\hat Y}(\hat x\hat y)
1224: =\sum_{xy}\pi_{XY\hat X\hat Y}(xy\hat x\hat y)$:
1225: \begin{eqnarray*}
1226: \big|\big|p_{\hat X\hat Y}(\cdot)-\pi_{\hat X\hat Y}(\cdot)\big|\big|_1
1227: & = & \sum_{\hat x\hat y}
1228: \big|p_{\hat X\hat Y}(\hat x\hat y)
1229: -\pi_{\hat X\hat Y}(\hat x\hat y)\big| \\
1230: & = & \sum_{\hat x\hat y}
1231: \Big|\Big(\sum_{x'y'}p_{XY\hat X\hat Y}(x'y'\hat x\hat y)\Big)
1232: -\Big(\sum_{x''y''}\pi_{XY\hat X\hat Y}(x''y''\hat x\hat y)
1233: \Big)\Big| \\
1234: & = & \sum_{\hat x\hat y}
1235: \Big|\sum_{xy}p_{XY\hat X\hat Y}(xy\hat x\hat y)
1236: -\pi_{XY\hat X\hat Y}(xy\hat x\hat y)\Big| \\
1237: & \leq & \sum_{xy\hat x\hat y}
1238: \big|p_{XY\hat X\hat Y}(xy\hat x\hat y)
1239: -\pi_{XY\hat X\hat Y}(xy\hat x\hat y)\big| \\
1240: & < & 2\epsilon.
1241: \end{eqnarray*}
1242: For the conditional $p_{XY|\hat X\hat Y}(xy|\hat x\hat y)$:
1243: \begin{eqnarray*}
1244: \lefteqn{\big|\big|p_{XY|\hat X\hat Y}(\cdot|\hat x\hat y)
1245: -\pi_{XY|\hat X\hat Y}(\cdot|\hat x\hat y)\big|\big|_1
1246: \;\; = \;\; \sum_{xy} \big|p_{XY|\hat X\hat Y}(xy|\hat x\hat y)
1247: -p_{XY|\hat X\hat Y}(xy|\hat x\hat y)\big|} \\
1248: & = & \sum_{xy} \Big|\frac{p_{XY\hat X\hat Y}(xy\hat x\hat y)}
1249: {p_{\hat X\hat Y}(\hat x\hat y)}
1250: -\frac{\pi_{XY\hat X\hat Y}(xy\hat x\hat y)}
1251: {\pi_{\hat X\hat Y}(\hat x\hat y)}\Big| \\
1252: & = & \mbox{$\frac{1}{p_{\hat X\hat Y}(\hat x\hat y)
1253: \pi_{\hat X\hat Y}(\hat x\hat y)}$}
1254: \sum_{xy} \big|p_{XY\hat X\hat Y}(xy\hat x\hat y)
1255: \pi_{\hat X\hat Y}(\hat x\hat y)
1256: -\pi_{XY\hat X\hat Y}(xy\hat x\hat y)
1257: p_{\hat X\hat Y}(\hat x\hat y)\big| \\
1258: & \stackrel{(a)}{<} & \mbox{$\frac{1}{p_{\hat X\hat Y}(\hat x\hat y)
1259: \pi_{\hat X\hat Y}(\hat x\hat y)}$}
1260: \sum_{xy} \big|p_{XY\hat X\hat Y}(xy\hat x\hat y)
1261: p_{\hat X\hat Y}(\hat x\hat y)
1262: +p_{XY\hat X\hat Y}(xy\hat x\hat y)2\epsilon
1263: -\pi_{XY\hat X\hat Y}(xy\hat x\hat y)
1264: p_{\hat X\hat Y}(\hat x\hat y)\big| \\
1265: & \leq & \mbox{$\frac{1}{p_{\hat X\hat Y}(\hat x\hat y)
1266: \pi_{\hat X\hat Y}(\hat x\hat y)}$}
1267: \sum_{xy}\Big(2\epsilon p_{XY\hat X\hat Y}(xy\hat x\hat y)
1268: +p_{\hat X\hat Y}(\hat x\hat y)
1269: \big|p_{XY\hat X\hat Y}(xy\hat x\hat y)
1270: -\pi_{XY\hat X\hat Y}(xy\hat x\hat y)\big|\Big) \\
1271: & = & \mbox{$\frac{1}{p_{\hat X\hat Y}(\hat x\hat y)
1272: \pi_{\hat X\hat Y}(\hat x\hat y)}$}
1273: \left(2\epsilon p_{\hat X\hat Y}(\hat x\hat y)
1274: +p_{\hat X\hat Y}(\hat x\hat y)\sum_{xy}
1275: \big|p_{XY\hat X\hat Y}(xy\hat x\hat y)
1276: -\pi_{XY\hat X\hat Y}(xy\hat x\hat y)\big|\right) \\
1277: & \leq & \frac{4\epsilon}{\pi_{\hat X\hat Y}(\hat x\hat y)} \\
1278: & \triangleq & \epsilon_1,
1279: \end{eqnarray*}
1280: where (a) follows from the $L_1$ bound on the marginals
1281: $p_{\hat X\hat Y}$ and $\pi_{\hat X\hat Y}$ above; and provided both
1282: $p_{\hat X\hat Y}(\hat x\hat y)\neq 0$ and
1283: $\pi_{\hat X\hat Y}(\hat x\hat y)\neq 0$. We also note that under the
1284: assumption that
1285: $\big|\big|p_{XY\hat X\hat Y}-\pi_{XY\hat X\hat Y}\big|\big|_1<2\epsilon$,
1286: there exists a value $\hat\epsilon$ such that, for all
1287: $0<\epsilon<\hat\epsilon$, it is not possible to have a pair
1288: $(\hat x_0\hat y_0)$ such that $p_{\hat X\hat Y}(\hat x_0\hat y_0)>0$
1289: but $\pi_{\hat X\hat Y}(\hat x_0\hat y_0)=0$, or vice versa. This is
1290: because $\pi_{\hat X\hat Y}(\hat x_0\hat y_0)=0$ means that for all $xy$,
1291: $\pi_{XY\hat X\hat Y}(xy\hat x_0\hat y_0)=0$. But if
1292: $p_{\hat X\hat Y}(\hat x_0\hat y_0)>0$, this means there exists at
1293: least one $x_0y_0$ such that $p_{XY\hat X\hat Y}(x_0y_0\hat x_0\hat y_0)>0$,
1294: and as a result,
1295: $\big|\big|p_{XY\hat X\hat Y}-\pi_{XY\hat X\hat Y}\big|\big|_1\geq
1296: p_{XY\hat X\hat Y}(x_0y_0\hat x_0\hat y_0)$; thus, setting
1297: $\hat\epsilon\triangleq p_{XY\hat X\hat Y}(x_0y_0\hat x_0\hat y_0)$,
1298: we get the sought contradiction. Thus, for all $\epsilon$ small enough,
1299: the bound on the conditionals holds as well, and so we have
1300: from~\cite[Thm.\ 16.3.2]{CoverT:91} that
1301: \begin{equation}
1302: \Big|H\big(XY\big|\hat X=\hat x,\hat Y=\hat y\big)[p]
1303: -H\big(XY\big|\hat X=\hat x,\hat Y=\hat y\big)[\pi]\Big|
1304: \;\;<\;\;
1305: -\epsilon_1\log\Big(\mbox{$\frac{\mbox{\normalsize $\epsilon_1$}}{|\mathcal{X}||\mathcal{Y}|
1306: |\hat{\mathcal{X}}||\hat{\mathcal{Y}}|}$}\Big)
1307: \;\;\triangleq\;\; \epsilon_2,
1308: \label{eq:l1-bound-cond-entropy}
1309: \end{equation}
1310: and so,
1311: \begin{eqnarray*}
1312: \lefteqn{\Big|H\big(XY\big|\hat X\hat Y\big)[p]
1313: -H\big(XY\big|\hat X\hat Y\big)[\pi]\Big|} \\
1314: & \leq & \sum_{\hat x\hat y}
1315: \Big|p_{\hat X\hat Y}(\hat x\hat y)
1316: H\big(XY\big|\hat X=\hat x,\hat Y=\hat y\big)[p]
1317: -\pi_{\hat X\hat Y}(\hat x\hat y)
1318: H\big(XY\big|\hat X=\hat x,\hat Y=\hat y\big)[\pi]\Big| \\
1319: & \stackrel{(a)}{\leq} &
1320: \big|\hat{\mathcal{X}}\big|\cdot\big|\hat{\mathcal{Y}}\big|\cdot
1321: \Big|p_{\hat X\hat Y}(\hat x^*\hat y^*)
1322: H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[p]
1323: -\pi_{\hat X\hat Y}(\hat x^*\hat y^*)
1324: H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[\pi]\Big| \\
1325: & \stackrel{(b)}{\leq} &
1326: \big|\hat{\mathcal{X}}\big|\cdot\big|\hat{\mathcal{Y}}\big|\cdot
1327: \Big|\pi_{\hat X\hat Y}(\hat x^*\hat y^*)
1328: H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[p]
1329: +2\epsilon H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[p]
1330: \\&&\mbox{\hspace{1.7cm}}
1331: -\pi_{\hat X\hat Y}(\hat x^*\hat y^*)
1332: H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[\pi]\Big| \\
1333: & = &
1334: \big|\hat{\mathcal{X}}\big|\cdot\big|\hat{\mathcal{Y}}\big|\cdot
1335: \Big|2\epsilon H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[p]
1336: \\&&\mbox{\hspace{1.7cm}}
1337: +\pi_{\hat X\hat Y}(\hat x^*\hat y^*)
1338: \Big(H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[p]
1339: -H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[\pi]\Big)\Big|
1340: \\
1341: & \stackrel{(c)}{\leq} &
1342: \big|\hat{\mathcal{X}}\big|\cdot\big|\hat{\mathcal{Y}}\big|\cdot
1343: \Big(2\epsilon H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[p]
1344: +p_{\hat X\hat Y}(\hat x^*\hat y^*)\epsilon_2\Big) \\
1345: & \triangleq & \epsilon_3,
1346: \end{eqnarray*}
1347: where (a) follows from choosing $\hat x^*\hat y^*$ as the pair
1348: $\hat x\hat y\in\hat{\mathcal{X}}\times\hat{\mathcal{Y}}$ that makes
1349: the difference $\big|p_{\hat X\hat Y}(\hat x\hat y)
1350: H\big(XY\big|\hat X=\hat x,\hat Y=\hat y\big)[p]
1351: -\pi_{\hat X\hat Y}(\hat x\hat y)
1352: H\big(XY\big|\hat X=\hat x,\hat Y=\hat y\big)[\pi]\big|$ largest;
1353: (b) follows from
1354: $\big|\big|p_{\hat X\hat Y}-\pi_{\hat X\hat Y}\big|\big|_1<2\epsilon$;
1355: and (c) follows from eqn.~\eqref{eq:l1-bound-cond-entropy} above, and
1356: from the triangle inequality.
1357:
1358: We conclude this part of the proof by noting that completely analogous
1359: arguments can be made to show that
1360: \[ \Big|H\big(X\big|\hat X\hat YY\big)[p]
1361: -H\big(X\big|\hat X\hat YY\big)[\pi]\Big|
1362: \;\;\leq\;\;\epsilon_4
1363: \mbox{\hspace{1cm}and\hspace{1cm}}
1364: \Big|H\big(Y\big|\hat X\hat YX\big)[p]
1365: -H\big(Y\big|\hat X\hat YX\big)[\pi]\Big|
1366: \;\;\leq\;\;\epsilon_5.
1367: \]
1368:
1369: \centerline{---------------------}
1370:
1371: We are now ready to prove our desired bounds.
1372:
1373: Since for all $(i,j)$, $\tilde{\mathbf{S}}_{ij}
1374: \subseteq \tilde{\mathbf{S}}'_{ij} =
1375: T_\epsilon^n\big(XY\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)$,
1376: \[ \big|\tilde{\mathbf{S}}_{ij}\big|
1377: \;\;\leq\;\;
1378: 2^{n(H(XY|\hat X\hat Y)[p]+\epsilon)}
1379: \;\;\leq\;\;
1380: 2^{n(H(XY|\hat X\hat Y)[\pi]+\epsilon+\epsilon_3)};
1381: \]
1382: therefore, choosing $\ddot\epsilon\triangleq\epsilon+\epsilon_3$,
1383: the first bound specified by the lemma follows.
1384:
1385: For the other two bounds, fix now $\mathbf{y}^n\in\mathcal{Y}^n$.
1386: Since $\mathcal{S}$ is a cover, there must exist at least one value
1387: $j_0\in\{1...2^{nR_2}\}$, such that $\mathbf{y}^n\in\mathbf{S}_{2,j_0}$.
1388: So consider any $i\in\{1...2^{nR_1}\}$, and assume $\mathbf{S}_{1,i}
1389: \cap T_\epsilon^n\big(X\big|\mathbf{y}^n\big)\neq\emptyset$; based on
1390: this assumption, pick any $\mathbf{x}^n\in\mathbf{S}_{1,i}\cap
1391: T_\epsilon^n\big(X\big|\mathbf{y}^n\big)$. This means that
1392: $\big(\mathbf{x}^n\mathbf{y}^n\big)\in
1393: \big[\mathbf{S}_{1,i}\times\mathbf{S}_{2,j_0}\big]\cap
1394: T_\epsilon^n\big(XY\big)$, and therefore that
1395: $\big(\mathbf{x}^n\mathbf{y}^n\big)\in
1396: \big[\mathbf{S}'_{1,i}\times\mathbf{S}'_{2,j_0}\big]\cap
1397: T_\epsilon^n\big(XY\big)$, and hence from eqn.~\eqref{eq:const-std-rd}
1398: we have that $\big(\mathbf{x}^n\mathbf{y}^n\hat{\mathbf{x}}^n(ij_0)
1399: \hat{\mathbf{y}}^n(ij_0)\big)\in T_\epsilon^n\big(XY\hat X\hat Y\big)$,
1400: and therefore we conclude that
1401: \[ \mathbf{S}_{1,i}\cap T_\epsilon^n\big(X\big|\mathbf{y}^n\big)
1402: \;\;\subseteq\;\;
1403: T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ij_0)\hat{\mathbf{y}}^n(ij_0)
1404: \mathbf{y}^n).
1405: \]
1406: We also note that if $\mathbf{S}_{1,i}\cap
1407: T_\epsilon^n\big(X\big|\mathbf{y}^n\big)=\emptyset$, then the last inclusion
1408: holds trivially. Thus,
1409: \[ \big|\mathbf{S}_{1,i}\cap T_\epsilon^n\big(X\big|\mathbf{y}^n\big)\big|
1410: \;\;\leq\;\;
1411: 2^{n(H(X|\hat X\hat YY)[p]+\epsilon)}
1412: \;\;\leq\;\;
1413: 2^{n(H(X|\hat X\hat YY)[\pi]+\epsilon+\epsilon_4)},
1414: \]
1415: Therefore, choosing $\ddot\epsilon'\triangleq\epsilon+\epsilon_4$, the
1416: second bound specified by the lemma holds. And the third (and last)
1417: bound follows from an argument identical to this last one. So the lemma
1418: is proved.
1419: \tend\bigskip
1420:
1421:
1422: \section{Proof of Theorem~\ref{thm:main}}
1423: \label{sec:main-proof}
1424:
1425: Consider any $\big(2^{nR_1},2^{nR_2},n,\epsilon,D_1,D_2\big)$ distributed
1426: rate-distortion code, represented by a cover $\mathcal{S}$. Then,
1427: \begin{eqnarray*}
1428: \lefteqn{n(R_1+R_2) \;\; \geq \;\; H\big(f_1(X^n)f_2(Y^n)\big)} \\
1429: & = & H\big(f_1(X^n)f_2(Y^n)\big)
1430: - H\big(f_1(X^n)f_2(Y^n)\big|X^nY^n\big) \\
1431: & = & I\big(X^nY^n\wedge f_1(X^n)f_2(Y^n)\big) \\
1432: & = & H\big(X^nY^n\big)
1433: - H\big(X^nY^n\big|f_1(X^n)f_2(Y^n)\big) \\
1434: & = & nH\big(XY\big)
1435: - \sum_{1\leq i\leq 2^{nR_1},1\leq j\leq 2^{nR_2}}
1436: P\big(f_1(X^n)=i,f_2(Y^n)=j\big)
1437: H\big(X^nY^n\big|f_1(X^n)=i,f_2(Y^n)=j\big) \\
1438: & \geq & nH\big(XY\big) -
1439: \Big[ \max_{1\leq i\leq 2^{nR_1},1\leq j\leq 2^{nR_2}}
1440: H\big(X^nY^n\big|f_1(X^n)=i,f_2(Y^n)=j\big)
1441: \Big] \\&&\mbox{\hspace{2.06cm}}
1442: \Big[ \sum_{1\leq i\leq 2^{nR_1},1\leq j\leq 2^{nR_2}}
1443: P\big(f_1(X^n)=i,f_2(Y^n)=j\big)
1444: \Big] \\
1445: & = & nH\big(XY\big)
1446: - \max_{1\leq i\leq 2^{nR_1},1\leq j\leq 2^{nR_2}}
1447: H\big(X^nY^n\big|f_1(X^n)=i,f_2(Y^n)=j\big) \\
1448: & \stackrel{(a)}{\geq} & nH\big(XY\big)
1449: - \Big[\max_{1\leq i\leq 2^{nR_1},1\leq j\leq 2^{nR_2}}
1450: \log\big|\tilde{\mathbf{S}}_{ij}\big|\Big]-n\epsilon_1 \\
1451: & \stackrel{(b)}{\geq} &
1452: nH\big(XY\big) - nH\big(XY\big|\hat X\hat Y\big)[\pi]
1453: - n\ddot\epsilon - n\epsilon_1 \\
1454: & = & nI\big(XY\wedge \hat X\hat Y\big)[\pi] - n\ddot\epsilon - n\epsilon_1,
1455: \end{eqnarray*}
1456: where (a) follows from splitting outcomes of $X^nY^n$ into typical and
1457: non-typical ones, and from bounding the entropy of the typical ones with
1458: a uniform distribution; and (b) follows from Lemma~\ref{lemma:bound-size},
1459: for some $\pi\in\mathbb{P}_{\mbox{\tiny LB}}$.
1460:
1461: For the individual rates, we have the following chain of inequalities:
1462: \begin{eqnarray*}
1463: nR_1 & \geq & H\big(f_1(X^n)\big) \\
1464: & \geq & H\big(f_1(X^n)\big|Y^n\big) \\
1465: & = & H\big(f_1(X^n)\big|Y^n\big)-H\big(f_1(X^n)\big|X^nY^n\big) \\
1466: & = & I\big(X^n\wedge f_1(X^n)\big|Y^n\big) \\
1467: & = & H\big(X^n\big|Y^n\big)-H\big(X^n\big|f_1(X^n)Y^n\big) \\
1468: & = & nH\big(X\big|Y\big)-H\big(X^n\big|f_1(X^n)Y^n\big) \\
1469: & = & nH\big(X\big|Y\big)
1470: -\sum_{\mathbf{y}^n\in\mathcal{Y}^n}\sum_{i=1}^{2^{nR_1}}
1471: P\big(f_1(X^n)=i,Y^n=\mathbf{y}^n\big)
1472: H\big(X^n\big|f_1(X^n)=i,Y^n=\mathbf{y}^n\big) \\
1473: & \geq & nH\big(X\big|Y\big)
1474: - \Big[ \max_{i=1...2^{nR_1},\mathbf{y}^n\in\mathcal{Y}^n}
1475: H\big(X^n\big|f_1(X^n)=i,Y^n=\mathbf{y}^n\big) \Big]
1476: \\&&\mbox{\hspace{2.18cm}}
1477: \Big[ \sum_{\mathbf{y}^n\in\mathcal{Y}^n}\sum_{i=1}^{2^{nR_1}}
1478: P\big(f_1(X^n)=i,Y^n=\mathbf{y}^n\big) \Big] \\
1479: & = & nH\big(X\big|Y\big)
1480: - \max_{i=1...2^{nR_1},\mathbf{y}^n\in\mathcal{Y}^n}
1481: H\big(X^n\big|f_1(X^n)=i,Y^n=\mathbf{y}^n\big) \\
1482: & \stackrel{(a)}{\geq} & nH\big(X\big|Y\big)
1483: - \Big[\max_{i=1...2^{nR_1},\mathbf{y}^n\in\mathcal{Y}^n}
1484: \log_2\big|\mathbf{S}_{1,i}\cap
1485: T_\epsilon^n\big(X\big|\mathbf{y}^n\big)\big|\Big]-n\epsilon_1 \\
1486: & \stackrel{(b)}{\geq} & nH\big(X\big|Y\big)
1487: - nH\big(X\big|\hat X\hat YY\big)[\pi]
1488: - n\ddot\epsilon' - n\epsilon_1 \\
1489: & = & nI\big(X\wedge \hat X\hat Y\big|Y\big)[\pi]
1490: - n\ddot\epsilon' - n\epsilon_1,
1491: \end{eqnarray*}
1492: where (a) follows from splitting the outcomes of $X^n$ into those
1493: that are jointly typical with the given sequence $\mathbf{y}^n$ and
1494: those that are not, and from bounding the entropy of the typical
1495: ones with a uniform distribution; and (b) follows from
1496: Lemma~\ref{lemma:bound-size}. An identical argument shows that
1497: $nR_2\geq nI\big(Y\wedge\hat X\hat Y\big|X\big)[\pi]-n\ddot\epsilon''
1498: -n\epsilon_1$. And since these conditions must hold for all
1499: $\epsilon>0$, the theorem follows.
1500: \tend
1501:
1502:
1503: \section{Discussion}
1504: \label{sec:discussion}
1505:
1506: We conclude the first part of this paper with some discussion on
1507: the results proved so far.
1508:
1509: \subsection{Finite Parameterization of $\mathcal{R}^o(D_1,D_2)$}
1510:
1511: The class of distributions used to define the Berger-Tung inner bound
1512: is given by:
1513: \[ \mathbb{P}_{\mbox{\tiny BT}}
1514: \;\;\triangleq\;\;
1515: \left\{p_{XYUV}\left|\begin{array}{rl}
1516: \bullet & p(xy)=\sum_{uv}p_{XYUV}(xyuv) \\
1517: \bullet & U-X-Y-V\textrm{ is a Markov chain} \\
1518: \bullet & \expect{d_1\big(X,\gamma_1(U,V)\big)}\leq D_1
1519: \textrm{ and }
1520: \expect{d_2\big(Y,\gamma_2(U,V)\big)}\leq D_2
1521: \end{array}\right\}\right.,
1522: \]
1523: for fixed distortions $(D_1,D_2)$, source $p(xy)$, and some functions
1524: $\gamma_1:\mathcal{U}\times\mathcal{V}\to\hat{\mathcal{X}}$ and
1525: $\gamma_2:\mathcal{U}\times\mathcal{V}\to\hat{\mathcal{Y}}$.
1526: To make a direct comparison
1527: with $\mathbb{P}_{\mbox{\tiny BT}}$ easier, we rewrite
1528: $\mathbb{P}_{\mbox{\tiny LB}}$ in terms of two
1529: variables $U$ and $V$ as follows:
1530: \begin{itemize}
1531: \item Set $\mathcal{U}\triangleq\hat{\mathcal{X}}$ and
1532: $\mathcal{V}\triangleq\hat{\mathcal{V}}$.
1533: \item For any $p_{XY\hat X\hat Y}\in\mathbb{P}_{\mbox{\tiny LB}}$,
1534: set $p_{XYUV}(xyuv)\triangleq p_{XY\hat X\hat Y}(xy\hat x\hat y)$.
1535: \end{itemize}
1536: Then, it is clear that $\mathbb{P}'_{\mbox{\tiny LB}}$, defined by
1537: \[ \mathbb{P}'_{\mbox{\tiny LB}}
1538: \;\;\triangleq\;\;
1539: \left\{p_{XYUV}\left|\begin{array}{rl}
1540: \bullet & p(xy)=\sum_{uv}p_{XYUV}(xyuv) \\
1541: \bullet & X-UV-Y\textrm{ is a Markov chain} \\
1542: \bullet & \expect{d_1\big(X,\gamma_1(U,V)\big)}\leq D_1
1543: \textrm{ and }
1544: \expect{d_2\big(Y,\gamma_2(U,V)\big)}\leq D_2
1545: \end{array}\right\}\right.,
1546: \]
1547: again for fixed distortions $(D_1,D_2)$, source $p(xy)$, and some
1548: functions $\gamma_1:\mathcal{U}\times\mathcal{V}\to\hat{\mathcal{X}}$
1549: and $\gamma_2:\mathcal{U}\times\mathcal{V}\to\hat{\mathcal{Y}}$, is
1550: just a relabeling of $\mathbb{P}_{\mbox{\tiny LB}}$.
1551:
1552: In terms of these sets, we can state the following bounds on
1553: $\mathcal{R}^*(D_1,D_2)$:
1554: \begin{equation}
1555: \overline{\bigcup_{p\in\mathbb{P}_{\mbox{\tiny BT}}}\mathcal{R}(D_1,D_2,p)}
1556: \;\;\subseteq\;\;
1557: \mathcal{R}^*(D_1,D_2)
1558: \;\;\subseteq\;\;
1559: \overline{\bigcup_{p\in\mathbb{P}'_{\mbox{\tiny LB}}}\mathcal{R}(D_1,D_2,p)}.
1560: \label{eq:region-bounds}
1561: \end{equation}
1562: $\mathcal{R}^*(D_1,D_2)$ is not a characterization of the region of
1563: achievable rates that we would normally consider satisfactory, in that
1564: it is not ``computable,'' in the sense of~\cite[pg.\ 259]{CsiszarK:81}.
1565: Yet with eqn.~\eqref{eq:region-bounds}, we have managed to ``sandwich''
1566: the uncomputable $\mathcal{R}^*(D_1,D_2)$ region in between two
1567: other regions, both of which are computable:
1568: \begin{itemize}
1569: \item in $\mathbb{P}'_{\mbox{\tiny LB}}$, $U$ and $V$ are taken
1570: over finite alphabets ($\mathcal{U}=\hat{\mathcal{X}}$ and
1571: $\mathcal{V}=\hat{\mathcal{Y}}$);
1572: \item and in $\mathbb{P}_{\mbox{\tiny BT}}$, although we have
1573: not been able to find anywhere in the literature a proof that
1574: the cardinality of $U$ and $V$ must be finite, presumably a
1575: direct application of the method of Ahlswede and K\"orner should
1576: produce the desired bounds~\cite{AhlswedeK:75, Salehi:78}.
1577: \end{itemize}
1578: This is of interest because, as far as we can tell, none of the outer
1579: bounds we have found in the literature are computable.
1580:
1581: \subsection{Relationship to the Berger-Tung Outer Bound}
1582:
1583: One simple sufficient condition (which unfortunately does not hold)
1584: for proving the inclusions in eqn.~\eqref{eq:region-bounds} to be
1585: in fact equalities would have been to show that
1586: $\mathbb{P}'_{\mbox{\tiny LB}}\subseteq\mathbb{P}_{\mbox{\tiny BT}}$.
1587: However, a direct comparison among these two sets is still revealing.
1588: Consider any distribution $p$ that satisfies the constraints of both
1589: sets (i.e., $p\in\mathbb{P}_{\mbox{\tiny LB}}\cap
1590: \mathbb{P}_{\mbox{\tiny BT}}$), and elements $xyuv$ for which
1591: $p(xyuv)\neq 0$. Then, this $p$ admits two different factorizations:
1592: \[\begin{array}{crcl}
1593: & p(uv)p(x|uv)p(y|uv) & = & p(xy)p(u|x)p(v|y) \\
1594: \Leftrightarrow & p(uv)\frac{p(uv|x)p(x)}{p(uv)}\frac{p(uv|y)p(y)}{p(uv)}
1595: & = & p(xy)p(u|x)p(v|y) \\
1596: \Leftrightarrow & p(uv|x)p(x)p(uv|y)p(y) & = & p(xy)p(u|x)p(v|y)p(uv) \\
1597: \Leftrightarrow & p(u|x)p(v|x)p(x)p(u|y)p(v|y)p(y)
1598: & = & p(xy)p(u|x)p(v|y)p(uv) \\
1599: \Leftrightarrow & p(v|x)p(x)p(u|y)p(y) & = & p(xy)p(uv) \\
1600: \Leftrightarrow & p(xv)p(yu) & = & p(xy)p(uv).
1601: \end{array}\]
1602: Clearly, any distribution in this intersection must make all
1603: variables pairwise independent: integrate any two of them, the
1604: other two can be expressed as the product of their marginals.
1605:
1606: We find this observation interesting because it provides clear
1607: evidence that our lower bound is very different in nature from the
1608: Berger-Tung outer bound~\cite{Berger:78, Tung:PhD}. In that bound,
1609: the set of distributions in the outer bound (all Markov chains of
1610: the form $U-X-Y$ and $X-Y-V$) strictly contains
1611: $\mathbb{P}_{\mbox{\tiny BT}}$; that means, there is a subset of
1612: the distributions in the outer bound that generates all rates we
1613: know to be achievable. In our bound, since
1614: $\mathbb{P}_{\mbox{\tiny LB}}\cap\mathbb{P}_{\mbox{\tiny BT}}$
1615: is a degenerate set, {\em none} of the distributions
1616: in $p\in\mathbb{P}_{\mbox{\tiny LB}}$ can be used to define a code
1617: construction based on known methods,\footnote{Except of course for
1618: trivial cases, such as when the two sources $X$ and $Y$ are independent,
1619: and the distortion is maximum.}
1620: such as the ``quantize-then-bin'' strategy used in the proof of
1621: the Berger-Tung inner bound.
1622:
1623: \subsection{Computation of the Outer Bound}
1624:
1625: The finite parameterization of our outer bound is an important
1626: contribution in itself we believe, given the fact that the Berger-Tung
1627: outer bound is not computable.\footnote{And neither is the more modern
1628: outer bound of Wagner and Anantharam~\cite{Wagner:PhD, WagnerA:05},
1629: also mentioned in the introduction.} This is of interest in part
1630: because, at least in principle, this finite parameterization renders
1631: the problem amenable to analysis using computational methods. Finding
1632: an efficient algorithm for computing solutions to the optimization
1633: problem defined by Theorem~\ref{thm:main}, similar in spirit to the
1634: Blahut-Arimoto algorithm for the numerical evaluation of channel
1635: capacity and rate-distortion functions~\cite{Arimoto:72, Blahut:72},
1636: certainly is an interesting challenge in its own right.
1637:
1638: More fundamentally though, we believe the computability of our
1639: bound holds the key to complete a proof of the optimality of the
1640: Berger-Tung inner bound for the problem setup of Fig.~\ref{fig:setup}:
1641: \begin{itemize}
1642: \item Computational methods are of interest not only because they
1643: lead to answers that are ``useful in practice;'' discovering
1644: efficient algorithms invariably requires the uncovering of structure
1645: in the problem. A good example in our field: the characterization
1646: by Chiang and Boyd of the Lagrange duals of channel capacity and
1647: rate-distortion as convex geometric programs~\cite{ChiangB:04}.
1648: \item Last but not least, an efficient algorithm to compute the
1649: sandwich terms in eqn.~\eqref{eq:region-bounds} provides a fallback
1650: strategy. If all else fails, at least by means of numerical methods
1651: we can check whether, in concrete instances of the problem, the
1652: lower and upper bounds coincide or not.
1653: \end{itemize}
1654: The achievability of the set of rates defined by Theorem~\ref{thm:main},
1655: and the effective computation of the bounds of eqn.~\eqref{eq:region-bounds},
1656: are the main topics considered in Part II.
1657:
1658:
1659: \bigskip\noindent{\em Acknowledgements}--In the final version.
1660:
1661:
1662: %\pagebreak
1663: %\bibliographystyle{plain}
1664: %\bibliography{library}
1665: \begin{thebibliography}{10}
1666:
1667: \bibitem{AhlswedeK:75}
1668: R.~Ahlswede and J.~K{\"o}rner.
1669: \newblock {Source Coding with Side Information and a Converse for Degraded
1670: Broadcast Channels}.
1671: \newblock {\em IEEE Trans. Inform. Theory}, IT-21(6):629--637, 1975.
1672:
1673: \bibitem{Arimoto:72}
1674: S.~Arimoto.
1675: \newblock {An Algorithm for Computing the Capacity of Arbitrary Discrete
1676: Memoryless Channels}.
1677: \newblock {\em IEEE Trans. Inform. Theory}, IT-18(1):14--20, 1972.
1678:
1679: \bibitem{BarrosS:06}
1680: J.~Barros and S.~D. Servetto.
1681: \newblock {Network Information Flow with Correlated Sources}.
1682: \newblock {\em IEEE Trans. Inform. Theory}, 52(1):155--170, 2006.
1683:
1684: \bibitem{Berger:78}
1685: T.~Berger.
1686: \newblock {\em The Information Theory Approach to Communications (G. Longo,
1687: ed.)}, chapter Multiterminal Source Coding.
1688: \newblock Springer-Verlag, 1978.
1689:
1690: \bibitem{BergerHOTW:79}
1691: T.~Berger, K.~B. Housewright, J.~K. Omura, S.~Tung, and J.~Wolfowitz.
1692: \newblock {An Upper Bound on the Rate Distortion Function for Source Coding
1693: with Partial Side Information at the Decoder}.
1694: \newblock {\em IEEE Trans. Inform. Theory}, 25(6):664--666, 1979.
1695:
1696: \bibitem{BergerS:07}
1697: T.~Berger and S.~D. Servetto.
1698: \newblock {Multiterminal Source Coding -- 30 Years Later}.
1699: \newblock In preparation, for Foundations and Trends in Communications and
1700: Information Theory.
1701:
1702: \bibitem{BergerY:89}
1703: T.~Berger and R.~W. Yeung.
1704: \newblock {Multiterminal Source Encoding with One Distortion Criterion}.
1705: \newblock {\em IEEE Trans. Inform. Theory}, 35(2):228--236, 1989.
1706:
1707: \bibitem{BergerZV:96}
1708: T.~Berger, Z.~Zhang, and H.~Viswanathan.
1709: \newblock {The CEO Problem}.
1710: \newblock {\em IEEE Trans. Inform. Theory}, 42(3):887--902, 1996.
1711:
1712: \bibitem{Blahut:72}
1713: R.~E. Blahut.
1714: \newblock {Computation of Channel Capacity and Rate-Distortion Functions}.
1715: \newblock {\em IEEE Trans. Inform. Theory}, IT-18(4):460--473, 1972.
1716:
1717: \bibitem{ChiangB:04}
1718: M.~Chiang and S.~Boyd.
1719: \newblock {Geometric Programming Duals of Channel Capacity and Rate
1720: Distortion}.
1721: \newblock {\em IEEE Trans. Inform. Theory}, 50(2):245--258, 2004.
1722:
1723: \bibitem{Cover:75b}
1724: T.~M. Cover.
1725: \newblock {A Proof of the Data Compression Theorem of Slepian and Wolf for
1726: Ergodic Sources}.
1727: \newblock {\em IEEE Trans. Inform. Theory}, IT-21(2):226--228, 1975.
1728:
1729: \bibitem{CoverT:91}
1730: T.~M. Cover and J.~Thomas.
1731: \newblock {\em {Elements of Information Theory}}.
1732: \newblock John Wiley and Sons, Inc., 1991.
1733:
1734: \bibitem{CsiszarK:80}
1735: I.~Csisz\'ar and J.\ K{\"o}rner.
1736: \newblock {Towards a General Theory of Source Networks}.
1737: \newblock {\em IEEE Trans. Inform. Theory}, 26(2):155--166, 1980.
1738:
1739: \bibitem{CsiszarK:81}
1740: I.~Csisz\'ar and J.~K{\"o}rner.
1741: \newblock {\em {Information Theory: Coding Theorems for Discrete Memoryless
1742: Systems}}.
1743: \newblock Acad\'emiai Kiad\'o, Budapest, 1981.
1744:
1745: \bibitem{DobrushinT:62}
1746: R.~L. Dobrushin and B.~S. Tsybakov.
1747: \newblock {Information Transmission with Additional Noise}.
1748: \newblock {\em IEEE Trans. Inform. Theory}, 8(5):293--304, 1962.
1749:
1750: \bibitem{Salehi:78}
1751: M.~Salehi.
1752: \newblock {Cardinality Bounds on Auxiliary Variables in Multiple-User Theory
1753: via the Method of Ahlswede and K{\"o}rner}.
1754: \newblock Technical Report~33, Statistics Department, Stanford University,
1755: August 1978.
1756:
1757: \bibitem{Shannon:59}
1758: C.~E. Shannon.
1759: \newblock {Coding Theorems for a Discrete Source with a Fidelity Criterion}.
1760: \newblock {\em IRE Nat. Conv. Rec.}, 4:142--163, 1959.
1761:
1762: \bibitem{SlepianW:73b}
1763: D.~Slepian and J.~K. Wolf.
1764: \newblock {Noiseless Coding of Correlated Information Sources}.
1765: \newblock {\em IEEE Trans. Inform. Theory}, IT-19(4):471--480, 1973.
1766:
1767: \bibitem{Tung:PhD}
1768: S.~Y. Tung.
1769: \newblock {\em {Multiterminal Source Coding}}.
1770: \newblock PhD thesis, Cornell University, 1978.
1771:
1772: \bibitem{Wagner:PhD}
1773: A.~B. Wagner.
1774: \newblock {\em {Methods of Offine Distributed Detection: Interacting Particle
1775: Models and Information-Theoretic Limits}}.
1776: \newblock PhD thesis, University of California, Berkeley, 2005.
1777:
1778: \bibitem{WagnerA:05}
1779: A.~B. Wagner and V.~Anantharam.
1780: \newblock {An Improved Outer Bound for the Multiterminal Source Coding
1781: Problem}.
1782: \newblock In {\em Proc. IEEE Int. Symp. Inform. Theory (ISIT)}, Adelaide,
1783: Australia, 2005.
1784: \newblock Extended version submitted to the IEEE Transactions on Information
1785: Theory. Available from \href{http://arxiv.org/abs/cs.IT/0511103/} {{\tt
1786: http://arxiv.org/abs/cs.IT/0511103/}}.
1787:
1788: \bibitem{Wyner:75}
1789: A.~D. Wyner.
1790: \newblock {On Source Coding with Side Information at the Decoder}.
1791: \newblock {\em IEEE Trans. Inform. Theory}, IT-21(3):294--300, 1975.
1792:
1793: \bibitem{WynerZ:76}
1794: A.~D. Wyner and J.~Ziv.
1795: \newblock {The Rate-Distortion Function for Source Coding with Side Information
1796: at the Decoder}.
1797: \newblock {\em IEEE Trans. Inform. Theory}, IT-22(1):1--10, 1976.
1798:
1799: \bibitem{Yeung:PhD}
1800: R.~W. Yeung.
1801: \newblock {\em {Some Results on Multiterminal Source Coding}}.
1802: \newblock PhD thesis, Cornell University, 1988.
1803:
1804: \bibitem{Yeung:01}
1805: R.~W. Yeung.
1806: \newblock {\em {A First Course in Information Theory}}.
1807: \newblock Kluwer Academic Publishers, 2001.
1808:
1809: \end{thebibliography}
1810:
1811:
1812:
1813: \end{document}
1814:
1815: