cs0604005/pp.tex
1: 
2: \documentclass[11pt,onecolumn,dvips,draftcls]{IEEEtran}
3: % \documentclass[10pt,twocolumn,dvips,final]{IEEEtran}
4: \usepackage{psfig, graphics, amsfonts, amsmath, color, amssymb, amsxtra, times}
5: \definecolor{gray}{cmyk}{.2,0.2,.3,.1}
6: \definecolor{dred}{cmyk}{0,0.9,0.4,0.3}
7: \definecolor{dblue}{rgb}{0,0,0.5}
8: \definecolor{dgreen}{rgb}{0,0.3,0}
9: \definecolor{dgray}{rgb}{0.3,0.3,0}
10: \usepackage[breaklinks=true, colorlinks=true, linkcolor=black, urlcolor=dblue,
11:   citecolor=black, pdfpagemode=None, pdfstartview=FitH]{hyperref}
12: 
13: \DeclareOldFontCommand{\rm}{\normalfont\rmfamily}{\mathrm}
14: \DeclareOldFontCommand{\sf}{\normalfont\sffamily}{\mathsf}
15: \DeclareOldFontCommand{\tt}{\normalfont\ttfamily}{\mathtt}
16: \DeclareOldFontCommand{\bf}{\normalfont\bfseries}{\mathbf}
17: \DeclareOldFontCommand{\it}{\normalfont\itshape}{\mathit}
18: \DeclareOldFontCommand{\sl}{\normalfont\slshape}{\@nomath\sl}
19: \DeclareOldFontCommand{\sc}{\normalfont\scshape}{\@nomath\sc}
20: 
21: \newtheorem{theorem}{Theorem}
22: \newtheorem{proposition}{Proposition}
23: \newtheorem{lemma}{Lemma}
24: \setcounter{tocdepth}{2}
25: \setlength{\topmargin}{-15mm}
26: \setlength{\textwidth}{17cm}
27: \setlength{\textheight}{23cm}
28: \setlength{\oddsidemargin}{-2mm}
29: 
30: \newcommand{\rend}{\hfill$\square$}
31: \newcommand{\tend}{\hfill$\blacksquare$}
32: \newcommand{\epsfig}{\psfig}
33: \newcommand{\fig}[1]{{Fig.~\ref{#1}}}
34: \newcommand{\eq}[1]{\eqref{#1}}
35: \newcommand{\expect}[1]{\ensuremath{\operatorname{E}\left[#1\right]}}
36: \newcommand{\secref}[1]{Section~\ref{#1}}
37: \newcommand{\ie}{i.e.~}
38: \newcommand{\eg}{e.g.~}
39: \newcommand{\theorref}[1]{{\itshape Theorem~\ref{#1}}}
40: 
41: 
42: \title{Multiterminal Source Coding With Two Encoders--I: A Computable
43:   Outer Bound
44:   \thanks{The author is with the School of Electrical and Computer
45:   Engineering, Cornell University, Ithaca, NY.  URL:
46:   \href{http://cn.ece.cornell.edu/}{{\tt http://cn.ece.cornell.edu/}}.
47:   Work supported by the National Science Foundation, under awards
48:   CCR-0238271 (CAREER), CCR-0330059, and ANR-0325556.}}
49: \author{Sergio D.\ Servetto}
50: \date{November 12, 2006.}
51: 
52: 
53: \begin{document}
54: \maketitle
55: \thispagestyle{empty}
56: 
57: \begin{picture}(0,0)
58: \put(-5,210){\tt\small Submitted to the IEEE Transactions on Information
59:   Theory, April 2006;  Revised,}
60: \put(-5,200){\tt\small November 2006.}
61: \end{picture}
62: \vspace{-4mm}
63: 
64: \begin{abstract}
65: \noindent\it
66: In this first part, a computable outer bound is proved for the
67: multiterminal source coding problem, for a setup with two encoders,
68: discrete memoryless sources, and bounded distortion measures.
69: \end{abstract}
70: 
71: \vspace{1cm}
72: \noindent{\bf Index terms:} multiterminal source coding, distributed
73: source coding, network source coding, rate-distortion theory, rate-distortion
74: with side information, network information theory.
75: 
76: \vspace{9.3cm}
77: \pagebreak
78: \setcounter{page}{1}
79: 
80: 
81: \section{Introduction}
82: 
83: \subsection{The Problem of Multiterminal Source Coding}
84: 
85: Consider two dependent sources $X$ and $Y$, with joint distribution
86: $p(xy)$.  These sources are to be encoded by two separate encoders,
87: each of which observes only one of them, and are to be decoded by a
88: single joint decoder.  $X$ is encoded at rate $R_1$ and with average
89: distortion $D_1$, and $Y$ is encoded at rate $R_2$ and with average
90: distortion $D_2$.  This setup is illustrated in Fig.~\ref{fig:setup}.
91: 
92: \begin{figure}[!ht]
93: \centerline{\psfig{file=setup.eps,height=3cm,width=12cm}}
94: \caption{System setup for multiterminal source coding.}
95: \label{fig:setup}
96: \end{figure}
97: 
98: In the classical {\em multiterminal source coding} problem, as
99: formulated in~\cite{Berger:78, Tung:PhD}, the goal
100: is to determine the region of all achievable rate-distortion tuples
101: $(R_1,R_2,D_1,D_2)$.  Although relatively simple to describe (a 
102: formal description is given later), the multiterminal source coding
103: problem was one of the long-standing open problems in information
104: theory -- see, e.g.,~\cite[pg.\ 443]{CoverT:91}.  Furthermore,
105: besides its historical interest, this problem also comes up naturally
106: in the context of a sensor networking problem of interest to
107: us~\cite{BarrosS:06}.
108: 
109: Multiterminal source coding has rich history, among which
110: fundamental contributions, in chronological order, are the
111: works of: a) Dobrushin-Tsybakhov~\cite{DobrushinT:62}, with the
112: first rate-distortion problem with a Markov chain constraint; b)
113: Slepian-Wolf~\cite{SlepianW:73b}, with the formulation and solution
114: to the first distributed source coding problem, and
115: Cover~\cite{Cover:75b}, with a simpler proof of the Slepian-Wolf
116: result, a proof method widely in use today; c)
117: Ahlswede-K\"orner~\cite{AhlswedeK:75} and Wyner~\cite{Wyner:75},
118: with the first use of an auxiliary random variable to describe
119: the rate region of a source coding problem, and with it the need
120: to introduce proof methods to bound their cardinality; d)
121: Wyner-Ziv~\cite{WynerZ:76}, with the first characterization of a
122: multiterminal rate-distortion function; e) Berger-Tung~\cite{Berger:78,
123: Tung:PhD}, with the first formulation and partial results on the
124: multiterminal source coding problem as formulated in Fig.~\ref{fig:setup};
125: and f) Berger-Yeung~\cite{BergerY:89, Yeung:PhD}, with a complete
126: solution to a more general form of the Wyner-Ziv problem.  For
127: details on these, and on {\em many} more important contributions,
128: as well as for historical information on the problem, the reader
129: is referred to~\cite{BergerS:07}.  
130: 
131: The setup of Fig.~\ref{fig:setup} represents what we feel was the
132: simplest yet unsolved instance of a multiterminal source coding problem.
133: The problem of Fig.~\ref{fig:setup}, and the CEO problem~\cite{BergerZV:96}
134: are, to the best of our knowledge, the last two known special cases of
135: the general entropy characterization of problem of Csisz\'ar and
136: K\"orner~\cite{CsiszarK:80} that remained unsolved.  This hierarchy
137: of problems is illustrated in Fig.~\ref{fig:hierarchy}.
138: 
139: \begin{figure}[ht]
140: \centerline{\resizebox{10cm}{6cm}{\input{hierarchy.pstex_t}}}
141: \caption{A hierarchy of problems in multiterminal source coding
142:   with two encoders and one decoder: an arrow from problem X to
143:   problem Y indicates that X is a special case of Y, in the sense
144:   that a solution to Y automatically provides a solution to X.
145:   Abbreviations -- SC: two-terminal lossless source coding;
146:   RD: two-terminal rate-distortion~\cite{Shannon:59}; SW: distributed
147:   coding of dependent sources~\cite{SlepianW:73b}; AK/W: source
148:   coding with side information~\cite{AhlswedeK:75, Wyner:75};
149:   WZ: rate-distortion with side information~\cite{WynerZ:76};
150:   BY: the Berger-Yeung extension of WZ theory~\cite{BergerY:89};
151:   DT: rate-distortion with a remote source~\cite{DobrushinT:62};
152:   BHOTW: a rate-distortion formulation of the Ahlswede-K\"orner-Wyner
153:   problem~\cite{BergerHOTW:79}; CEO: the CEO problem~\cite{BergerZV:96};
154:   MTRD: the problem setup of Fig.~\ref{fig:setup}; EC: the entropy
155:   characterization problem~\cite{CsiszarK:80}.  Asterisks are used
156:   to indicate problems whose solution was previously known.}
157: \label{fig:hierarchy}
158: \end{figure}
159: 
160: It should be pointed out though that the setup of Fig.~\ref{fig:setup}
161: is by no means the most general formulation of a multiterminal source
162: coding problem we could have given, there are many other ways in which
163: we could have chosen to formulate these problems: we could have chosen
164: a network with $M$ encoders and a single decoder which attempts to
165: reconstruct $L$ different functions of the sources, we could have
166: considered continuous-alphabet and/or general ergodic sources, we
167: could have considered feedback and interactive communication, we could
168: have studied how this problem relates to the network coding problem,
169: and we could have considered network
170: topologies with multiple decoders as well.  All these alternative
171: possible formulations are discussed in detail in~\cite{BergerS:07}.
172: 
173: \subsection{Difficulties in Proving a Converse}
174: \label{sec:difficulties}
175: 
176: Among the limited number of references mentioned above, we included
177: the Berger-Tung bounds~\cite{Berger:78, Tung:PhD}.  These bounds do
178: provide the best known descriptions of the region of achievable rates
179: for the problem setup of Fig.~\ref{fig:setup},\footnote{We note that
180: recently, a new outer bound has been proposed for a version of
181: multiterminal source coding
182: that contains the formulation of~\cite{Berger:78, Tung:PhD} considered
183: here as a special case~\cite{Wagner:PhD, WagnerA:05}.  The new
184: bound has many desirable properties: it unifies known bounds custom
185: developed for seemingly different problems, and it provides a conclusive
186: answer for a previously unsolved instance.  However, when specialized
187: to our two-encoder setup, it is unclear if the new bound provides
188: an improvement over the Berger-Tung outer bound.  So, due to the
189: simplicity of the latter, we have chosen here to focus on that one
190: instead of on the more modern form.}
191: and so we elaborate on those now.
192: 
193: \medskip\begin{proposition}[Berger-Tung Bounds]
194: \label{prp:bt-bounds}
195: Fix $(D_1,D_2)$.  Let $X$ and $Y$ be two sources out of which
196: pairs of sequences $\big(X^n,Y^n\big)$ are drawn i.i.d.~$\sim p(xy)$;
197: and let $U$ and $V$ be auxiliary variables defined over alphabets
198: $\mathcal{U}$ and $\mathcal{V}$, such that there exist functions
199: $\gamma_1:\mathcal{U}\times\mathcal{V}\to\hat{\mathcal{X}}$
200: and $\gamma_2:\mathcal{U}\times\mathcal{V}\to\hat{\mathcal{Y}}$,
201: for which $\expect{d_1\big(X,\gamma_1(UV)\big)}\leq D_1$ and
202: $\expect{d_2\big(Y,\gamma_2(UV)\big)}\leq D_2$.  Consider rates
203: $(R_1,R_2)$, such that $R_1 \geq I(XY\wedge U|V)$,
204: $R_2 \geq I(XY\wedge V|U)$, and $R_1+R_2 \geq I(XY\wedge UV)$,
205: for some joint distribution $p(xyuv)$.  Now:
206: \begin{itemize}
207: \item for any $p(xyuv)$ that satisfies a Markov chain of the form
208:   $U-X-Y-V$, all rates $(R_1,R_2)$ obtained for any such
209:   $p$ are achievable;
210: \item if there exists a $p(xyuv)$ that satisfies two Markov chains of
211:   the form $U-X-Y$ and $X-Y-V$, then if we consider the union of the
212:   set of rates defined for each such $p(xyuv)$, we must have that any
213:   achievable rates are included in that union;
214: \end{itemize}
215: that is, the first condition defines an {\em inner} bound, and the second
216: an {\em outer} bound to the rate region. \rend
217: \end{proposition}\medskip
218: 
219: The regions defined by these bounds, when regarded as images of maps
220: that transform probability distributions into rate pairs, have a
221: property that is a source of many difficulties: the mutual information
222: expressions that define the inner and the outer bounds are identical,
223: it is only the {\em domains} of the two maps that differ; as such,
224: comparing the resulting regions is difficult.  This difference between
225: the inner and outer bounds has been the state of affairs in multiterminal
226: source coding, since 1978.
227: 
228: A close examination of these distributions suggested to us that the
229: gap might not be due to a suboptimal coding strategy used in the inner
230: bound, but instead that perhaps the outer bound allows for the inclusion
231: of dependencies that cannot be physically realized by any distributed
232: code.  Consider these distributions:
233: \begin{itemize}
234: \item For the inner bound, $p(xyuv)$ = $p(xy)p(u|x)p(v|y)$.
235: \item For the outer bound, $p(xyuv)$
236:   = $p(xy)p(u|x)p(v|y\underline{xu})$
237:   = $p(xy)p(v|y)p(u|x\underline{yv})$.
238: \end{itemize}
239: If we choose to interpret $U$ and $V$ as instantaneous descriptions
240: of encodings of $X$ and $Y$,
241: then we see that the outer bound says that the encoding
242: $V$ is allowed to contain information about $X$ {\em beyond} that
243: which can be extracted from $Y$, and likewise for $U$ and
244: $Y$.\footnote{Note: this interpretation comes from the inner bound,
245: and is only justified for {\em blocks}.  $U^n$ does represent an encoding
246: of $X^n$, but it would be incorrect to say that the variable $U$ is
247: an encoding of $X$ (and likewise for $V$ and $Y$).  These insights
248: can only be carried so far, but at this point we are only trying to
249: build some intuition, and thus it is permissible to take such liberties.}
250: Motivated by this observation, in the first part of this work we
251: set ourselves the goal of finding a new outer bound.
252: 
253: \subsection{An Interpretation of Distributed Rate-Distortion Codes
254:   as Constrained Source Covers}
255: 
256: In Part I of this paper we present a finitely parameterized outer
257: bound for the region of achievable rates of the multiterminal source
258: coding problem of Fig.~\ref{fig:setup}, based on what we
259: believe is an original proof technique.  Some highlights of that
260: proof method, formally developed in later sections, are provided here.
261: 
262: \subsubsection{Rate-Distortion Codes $\equiv$ Source Covers}
263: \label{sec:intro-distributed-covers}
264: 
265: Our proof tightens existing converses by means of identifying a
266: constraint that {\em all} codes are subject to, but that is not
267: captured by any existing outer bound.  To explain what the constraint
268: is, the easiest way to get started is by drawing an analogy to
269: classical, two-terminal rate-distortion codes.
270: 
271: In the standard, two-terminal rate-distortion problem, a generic
272: code consists of the following elements:
273: \begin{itemize}
274: \item A block length $n$.
275: \item A cover $\big\{ \mathbf{S}_i \;:\; i=1...2^{nR} \big\}$ of the source
276:   $\mathcal{X}^n$.
277: \item A reconstruction sequence $\hat{\mathbf{x}}^n(i)$, associated to each
278:   cover element $\mathbf{S}_i$.
279: \end{itemize}
280: Given this description, an encoder $f:\mathcal{X}^n\to\{1...2^{nR}\}$
281: makes $f\big(\mathbf{x}^n\big)=i$ for some source sequence $\mathbf{x}^n$
282: and some index $i$, if
283: $\mathbf{x}^n\in \mathbf{S}_i$, with ties broken arbitrarily; a decoder
284: $g:\{1...2^{nR}\}\to
285: \hat{\mathcal{X}}^n$ simply maps $g(i)=\hat{\mathbf{x}}^n(i)$.  And we say
286: that the encoder/decoder pair $(f,g)$ satisfies a distortion constraint $D$
287: if, roughly, $P\Big(d\big(\mathbf{x}^n,g(f(\mathbf{x}^n))\big)\leq D\Big)
288: \approx 1$, for all $n$ large enough.  Such a representation is illustrated
289: in Fig.~\ref{fig:covers-classical}.
290: 
291: \begin{figure}[ht]
292: \centerline{\resizebox{15cm}{4cm}{\input{covers-classical.pstex_t}}}
293: \vspace{-2mm}
294: \caption{Cover-based representation of a classical rate-distortion code.}
295: \label{fig:covers-classical}
296: \end{figure}
297: 
298: In an analogous manner, we specify an arbitrary {\em distributed}
299: rate-distortion code as follows:
300: \begin{itemize}
301: \item A block length $n$.
302: \item {\em Two} covers:
303:   \begin{itemize}
304:   \item A cover $\big\{ \mathbf{S}_{1,i} \;:\; i=1...2^{nR_1}\big\}$ of the
305:     source $\mathcal{X}^n$.
306:   \item A cover $\big\{ \mathbf{S}_{2,j} \;:\; j=1...2^{nR_2}\big\}$ of the
307:     source $\mathcal{Y}^n$.
308:   \end{itemize}
309:   Indirectly, these two covers specify a cover $\mathbf{S}_{ij}\;\triangleq\;
310:     \big\{ \mathbf{S}_{1,i}\times\mathbf{S}_{2,j} : i=1...2^{nR_1},
311:     j=1...2^{nR_2}\big\}$ of the product alphabet $\mathcal{X}^n\times
312:     \mathcal{Y}^n$.
313: \item For each cover element $\mathbf{S}_{ij}$, we specify
314:   {\em two} reconstruction sequences
315:   $\big(\hat{\mathbf{x}}^n(ij),\hat{\mathbf{y}}^n(ij)\big)$.
316: \end{itemize}
317: Given this description, an encoder $f_1:\mathcal{X}^n\to\{1...2^{nR_1}\}$
318: for node 1 makes $f_1\big(\mathbf{x}^n\big)=i$ for some source sequence
319: $\mathbf{x}^n$ and some index $i$, if $\mathbf{x}^n\in\mathbf{S}_{1,i}$,
320: with ties broken arbitrarily (and similarly for an encoder $f_2$ at node 2);
321: a decoder $g:\{1...2^{nR_1}\}\times\{1...2^{nR_2}\}\to
322: \hat{\mathcal{X}}^n\times\hat{\mathcal{Y}}^n$ simply maps
323: $g(i,j)=\big(\hat{\mathbf{x}}^n(ij),\hat{\mathbf{y}}^n(ij)\big)$.
324: And we say that the distributed code $(f_1,f_2,g)$ satisfies two distortion
325: constraints $D_1$ and $D_2$ if, roughly,
326: $P\Big(d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big)\leq D_1
327: \mbox{ and }
328: d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big)\leq D_2\Big)\approx 1$,
329: for all $n$ large enough, and for
330: $\big(\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)=
331: g\big(f_1(\mathbf{x}^n),f_2(\mathbf{y}^n)\big)$.  Such a representation is
332: illustrated in Fig.~\ref{fig:covers-distributed}.
333: 
334: \begin{figure}[ht]
335: \centerline{\psfig{file=covers-distributed.eps,height=6cm,width=16cm}}
336: \caption{Cover-based representation of a {\em distributed} rate-distortion
337:   code.}
338: \label{fig:covers-distributed}
339: \end{figure}
340: 
341: \subsubsection{Constraints on the Structure of Source Covers}
342: 
343: Our main insight is that, whereas in the classical problem
344: any arbitrary cover defines a valid rate-distortion code, in multiterminal
345: source coding this is no longer the case: {\em covers of the product source
346: $\mathcal{X}^n\times\mathcal{Y}^n$ only of the form $\mathbf{S}_{ij}
347: = \mathbf{S}_{1,i}\times\mathbf{S}_{2,j}$ can be realized by distributed
348: codes}.  The significance of this requirement is illustrated with an
349: example in Fig.~\ref{fig:binary-example}.
350: 
351: \begin{figure}[!h]
352: \centerline{\psfig{file=dtyp-bin.eps,height=6cm}\hspace{1cm}
353:             \psfig{file=dtyp-ex.eps,height=6cm}}
354: \caption{An example, to illustrate the significance of the
355:   requirement that cover elements $\mathbf{S}_{ij}$ take a product form.
356:   Let $\mathcal{X}=\mathcal{Y}=\{0,1\}$, and $p(xy)=p(x)p(y|x)$
357:   specified by a $p(x)$ such that $P(X=0)=P(X=1)=\frac 1 2$,
358:   and $p(y|x)$ a binary symmetric channel with crossover probability
359:   $p_c$.  Left: for each typical $\mathbf{x}^n$, there is a ``ring'' of
360:   $\mathbf{y}^n$'s jointly typical with it, centered at $\mathbf{x}^n$
361:   and of radius $\approx np_c$.  Right: consider pairs
362:   $\big(\mathbf{x}_1^n\mathbf{y}_1^n\big)$ and $\big(\mathbf{x}_2^n
363:   \mathbf{y}_2^n\big)$ in $\mathbf{S}_{ij}$; dashed circles denote
364:   distortion balls centered at $\hat{\mathbf{x}}^n(ij)$ and
365:   $\hat{\mathbf{y}}^n(ij)$ (with the centers omitted, for clarity),
366:   and dark shaded regions denote the intersection of two rings.
367:   Suppose now that all four pairs $(\mathbf{x}_1^n\mathbf{y}_1^n)$,
368:   $(\mathbf{x}_1^n\mathbf{y}_2^n)$ $(\mathbf{x}_2^n\mathbf{y}_1^n)$,
369:   and $(\mathbf{x}_2^n\mathbf{y}_2^n)$ are in $T_\epsilon^n\big(XY\big)$.
370:   Because $\mathbf{S}_{ij}= \mathbf{S}_{1,i}\times\mathbf{S}_{2,j}$,
371:   {\em all four pairs must be in $\mathbf{S}_{ij}$ as well:} the decoder
372:   does not have enough information to discriminate among these pairs.
373:   No such constraint exists with a centralized encoder.}
374: \label{fig:binary-example}
375: \end{figure}
376: 
377: From the informal argument of Fig.~\ref{fig:binary-example},
378: we see how the fact that distributed codes produce covers only of the
379: form $\mathbf{S}_{ij}=\mathbf{S}_{1,i}\times\mathbf{S}_{2,j}$ results
380: in constraints on the sets used to cover the typical set
381: $T_\epsilon^n\big(XY\big)$: there are certain groups of typical
382: sequences that cannot be broken, in the sense that either all of them
383: appear together in a cover element $\mathbf{S}_{ij}$, or none of them
384: appear.  We believe this is significant for two main reasons:
385: \begin{itemize}
386: \item If we compare to a classical rate-distortion code, this constraint
387:   is clearly not there.  Provided the distortion constraints are met, a
388:   classical code would be able to split the typical set into distortion
389:   balls, without any further constraints.
390: \item More fundamentally though, we view this constraint as a form of
391:   ``independence,'' reminiscent to us of the extra independence assumption
392:   required by the long Markov chain used in the definition of the
393:   Berger-Tung inner bound, which is not there in the definition of the
394:   outer bound, as highlighted in Section~\ref{sec:difficulties} earlier.
395: \end{itemize}
396: This latter observation is perhaps the strongest piece of evidence that
397: suggested to us that the Berger-Tung inner bound might be tight.
398: 
399: \subsection{Main Contributions and Organization of the Paper}
400: 
401: The main contribution presented in Part I of this paper is the
402: development of an outer bound to the region of achievable rates
403: for multiterminal source coding.  This outer bound has two salient
404: properties that distinguish it from existing bounds in the literature:
405: \begin{itemize}
406: \item it is based on explicitly modeling a constraint on the
407:   structure of codes that, as we understand things, had not been
408:   captured by any previously developed bound;
409: \item and also unlike existing bounds, it is finitely parameterized.
410: \end{itemize}
411: We believe that this outer bound coincides with the set of achievable
412: rates defined by the Berger-Tung inner bound.  This issue is thoroughly
413: explored in Part II of this paper, in the context of our study of
414: algorithmic issues involved in the effective computation of this bound.
415: 
416: The rest of this paper is organized as follows.  In
417: Section~\ref{sec:preliminaries} we define our notation, and state
418: our main result.  In Section~\ref{sec:aux-lemmas} we state and prove
419: some auxiliary lemmas that greatly simplify the proof of the main
420: theorem, a proof that is fully developed in Section~\ref{sec:main-proof}.
421: The paper concludes with an extensive discussion on our main result
422: and its implications, in Section~\ref{sec:discussion}.
423: 
424: 
425: \section{Preliminaries}
426: \label{sec:preliminaries}
427: 
428: \subsection{Definitions and Notation}
429: 
430: First, a word about notation.  Random variables are denoted with
431: capital letters, e.g., $X$.  Realizations of these variables are
432: denoted with lower case letters: e.g., $X=x$ means that the random
433: variable $X$ takes on the value $x$.  Script letters are typically
434: used to denote alphabets, e.g., the random variable $X$ takes values
435: on an alphabet $\mathcal{X}$.  The alphabets of all random variables
436: considered in this work are always assumed finite.  Sets in general
437: are denoted by capital boldface symbols, e.g., $\mathbf{S}$.
438: The size of a set is denoted by $\big|\mathbf{S}\big|$.  A
439: probability mass function on $\mathcal{X}$ is denoted by $p_X(x)$,
440: or simply $p(x)$ when the variable that it applies to is clear from
441: the context.  Sequences of elements from an alphabet $\mathcal{X}$
442: are denoted by boldface symbols $\mathbf{x}^n$,
443: and its $i$-th element by $\mathbf{x}_i$; this sequence is an element
444: of the extension alphabet $\mathcal{X}^n$.  The expression
445: $\mathbf{x}_i^{j,n}$ denotes a subsequence of $\mathbf{x}^n$ consisting
446: of the elements $[\mathbf{x}_i,\mathbf{x}_{i+1},...,\mathbf{x}_j]$,
447: whenever $i\leq j$, otherwise it denotes an empty sequence; also,
448: sometimes the length $n$ of the sequence will be clear from the
449: context, and then we simply write $\mathbf{x}_i^j$ instead of
450: $\mathbf{x}_i^{j,n}$, whenever this does not cause confusion.  The
451: expression $\mathbf{x}^{-i,n}$ denotes the sequence
452: $[\mathbf{x}_1,...,\mathbf{x}_{i-1},\mathbf{x}_{i+1},...,\mathbf{x}_n]$,
453: and again, we write this as $\mathbf{x}^{-i}$ whenever $n$ is
454: clear from the context.  The same conventions are followed for
455: sequences of random variables.
456: 
457: Given a boolean predicate $b(\mathbf{x})$ depending on a variable
458: $\mathbf{x}$, we write $1_{\{b(\mathbf{x})\}}$ to denote
459: the indicator function for the predicate: this is a function that
460: takes the value 1 whenever $b(\mathbf{x})$ is true, and 0 whenever
461: it is false.  Given a sequence $\mathbf{x}^n\in\mathcal{X}^n$,
462: and an element $x\in\mathcal{X}$, we denote by $N(x;\mathbf{x}^n)$
463: the type of $\mathbf{x}^n$, defined as
464: $N(x;\mathbf{x}^n)=\sum_{i=1}^n 1_{\{\mathbf{x}_i=x\}}$.  Then,
465: for any random variable $X$, any real number $\epsilon>0$, and
466: any integer $n>0$, we denote by $T_\epsilon^n(X)$ the strongly typical
467: set of $X$ with parameters $n$ and $\epsilon$, defined as
468: \[ T_\epsilon^n(X) \;\;=\;\; \Big\{ \mathbf{x}^n\in\mathcal{X}^n \;\Big|\;
469:    \forall x\in\mathcal{X}:
470:    \big|\mbox{$\frac 1 n$}N(x;\mathbf{x}^n)-p_X(x)\big|
471:    < \mbox{$\frac\epsilon{|\mathcal{X}|}$} \Big\}.
472: \]
473: In some situations, we need to compare typical sets defined for
474: the same set of variables, but induced by different distributions on
475: these variables.  To resolve this ambiguity, we denote by
476: $T_\epsilon^n\big(X\big)[p_X]$ the typical set corresponding to a
477: distribution $p_X$.  The same convention is followed when there is
478: similar ambiguity in the evaluation of entropies (denoted
479: $H\big(X\big)[p_X]$), and mutual information expressions (denoted
480: $I\big(X\wedge Y\big)[p_{XY}]$).
481: 
482: Vector extensions $N(xy;\mathbf{x}^n\mathbf{y}^n)$, $T_\epsilon^n(XY)$,
483: etc., are defined by considering the same definitions as above, over a
484: suitable product alphablet $\mathcal{X}\times\mathcal{Y}$.  Similarly,
485: given two random variables $X$ and $Y$, a joint probability mass
486: function $p_{XY}(xy)$,
487: and a sequence $\mathbf{y}^n$, we denote by $T_\epsilon^n(X|\mathbf{y}^n)$
488: the conditional typical set of $X$ given $\mathbf{y}^n$, defined as
489: \[ T_\epsilon^n\big(X\big|\mathbf{y}^n\big)
490:    \;\;=\;\; \Big\{ \mathbf{x}^n\in\mathcal{X}^n \;\Big|\;
491:    \forall x\in\mathcal{X},y\in\mathcal{Y}:
492:    \big|\mbox{$\frac 1 n$}N(xy;\mathbf{x}^n\mathbf{y}^n)-p_{XY}(xy)\big|
493:    < \mbox{$\frac\epsilon{|\mathcal{X}||\mathcal{Y}|}$} \Big\}.
494: \]
495: We will also consider situations where we need to refer to the set of
496: all typical sequences which are jointly typical with at least one of a
497: group.  In that case, for a set $\mathbf{S}\subseteq\mathcal{Y}^n$, we
498: write
499: \[ T_\epsilon^n\big(X\big|\mathbf{S}\big)
500:    \;\;=\;\; \bigcup_{\mathbf{y}^n\in\mathbf{S}}
501:              T_\epsilon^n\big(X\big|\mathbf{y}^n\big).
502: \]
503: 
504: Given any $\epsilon>0$, many times we require to make reference
505: to quantities which are deterministic functions of $\epsilon$, having
506: the property that as $\epsilon\to 0$, these quantities also vanish.
507: Such small quantities are denoted by $\epsilon_1$, $\epsilon_2$,
508: $\dot\epsilon$, $\ddot\epsilon$, $\epsilon'$, $\epsilon''$, etc.;
509: and the value of $\epsilon$ on which they depend is either mentioned
510: explicitly or should be clear from the context.
511: 
512: Consider two random variables
513: $X$ and $Y$ with joint distribution $p(xy)$.  $T_\epsilon^n\big(X)$
514: is the usual typical set.  Sometimes we also need to consider
515: the set $S_{\epsilon,Y}^n(X)\triangleq\Big\{\mathbf{x}^n\,\Big|\,
516: T_\epsilon^n\big(Y\big|\mathbf{x}^n\big)\neq\emptyset\Big\}$.  Clearly,
517: $S_{\epsilon,Y}^n(X)\subseteq T_\epsilon^n\big(X)$.  But we also
518: know from~\cite[Ch.\ 5]{Yeung:01}, that
519: $\Big|\frac 1 n\log\big|S_{\epsilon,Y}^n(X)\big|-H(X)\Big|<\dot\epsilon$.
520: That is, although there may exist strongly typical sequences $\mathbf{x}^n$
521: for which there are no sequences $\mathbf{y}^n$ jointly typical with them,
522: these $\mathbf{x}^n$'s form a set of vanishing measure.
523: 
524: Some standard operations on sets are intersection
525: ($\mathbf{A}\cap\mathbf{B}$), union ($\mathbf{A}\cup\mathbf{B}$),
526: complementation ($\mathbf{A}^c$) and difference
527: ($\mathbf{A}\backslash\mathbf{B}$).   The set of all subsets of
528: $\mathbf{S}$ is denoted by $2^{\mathbf{S}}$.  The convex closure of $\mathbf{S}$
529: is denoted by
530: $\overline{\mathbf{S}}=\bigcap\big\{\mathbf{S}'\;\big|\;\mathbf{S}\subseteq
531: \mathbf{S}'\,\wedge\,\mathbf{S}'\mbox{ is closed and convex}\big\}$.
532: Given a set $\mathbf{S}$, a cover of size $N$ of $\mathbf{S}$ is a
533: collection of sets $\mathcal{S}=\big\{\mathbf{S}_i:i=1...N\big\}$,
534: such that $\mathbf{S}\subseteq\bigcup_{i=1}^N\mathbf{S}_i$.  If a
535: cover further satisfies that $\mathbf{S}_i\cap \mathbf{S}_j=\emptyset$
536: ($1\leq i\neq j\leq N$), and that $\mathbf{S}=\bigcup_{i=1}^N
537: \mathbf{S}_i$, then we say that $\mathcal{S}$ is a {\em partition}
538: of $\mathbf{S}$.
539: 
540: Consider two sets, $\mathbf{A}$ and $\mathbf{B}$, for which
541: $P\big(\mathbf{B}\big|\mathbf{A}\big)=1$: clearly,
542: $P\big(\mathbf{A}\cap\mathbf{B}\big)=P\big(\mathbf{A}\big)$,
543: and hence $\mathbf{A}\subseteq\mathbf{B}$, except perhaps for
544: a set of measure zero.  If instead we have a slightly weaker
545: condition, namely that $P\big(\mathbf{B}\big|\mathbf{A}\big)>1-\epsilon$,
546: then we say that $\mathbf{A}$ is {\em weakly included} in $\mathbf{B}$,
547: and we denote this by $\mathbf{A}\subseteq_\epsilon\mathbf{B}$.
548: 
549: \subsection{Distributed Rate-Distortion Codes}
550: 
551: Consider two sources $X$ and $Y$, out of which random pairs of
552: sequences $\big(X^n,Y^n\big)$ are drawn i.i.d.~$\sim p(xy)$ from two
553: finite alphabets, denoted $\mathcal{X}$ and
554: $\mathcal{Y}$, and reproduced with elements of two other alphabets
555: $\hat{\mathcal{X}}$ and $\hat{\mathcal{Y}}$.  The two sources
556: $X$ and $Y$ are processed by two separate encoders.  The
557: {\em encoders} are two functions:
558: \[ f_1:\; \mathcal{X}^n \;\;\to\;\; \big\{1,2,\dots,2^{nR_1}\big\}
559:    \mbox{\hspace{1cm}and\hspace{1cm}}
560:    f_2:\; \mathcal{Y}^n \;\;\to\;\; \big\{1,2,\dots,2^{nR_2}\big\}.
561: \]
562: These encoding functions map a block of $n$ source symbols to discrete
563: indices.  The {\em decoder} is a function
564: \[ g:\;\big\{1,2,\dots,2^{nR_1}\big\}\times\big\{1,2,\dots,2^{nR_2}\big\}
565:        \;\;\to\;\; \hat{\mathcal{X}}^n \times \hat{\mathcal{Y}}^n, \]
566: which maps a pair of indices into two blocks of reconstructed
567: source sequences.
568: 
569: Two distortion
570: measures $d_1:\mathcal{X}\times\hat{\mathcal{X}}\to[0,\infty)$ and
571: $d_2:\mathcal{Y}\times\hat{\mathcal{Y}}\to[0,\infty)$ are used to
572: define reconstruction quality.  Since $\infty$ is not in their
573: range and the alphabets are finite, these distortion measures
574: are necessarily bounded, so we denote these largest values by
575: $\max\limits_{x\in\mathcal{X},\hat x\in\hat{\mathcal{X}}}
576: d_1(x,\hat x)\triangleq d_{1,\mbox{\tiny MAX}}$,
577: $\max\limits_{y\in\mathcal{Y},\hat y\in\hat{\mathcal{Y}}}
578: d_2(y,\hat y) \triangleq d_{2,\mbox{\tiny MAX}}$, and
579: $\max\big(d_{1,\mbox{\tiny MAX}},d_{2,\mbox{\tiny MAX}}\big)
580: \triangleq d_{\mbox{\tiny MAX}}<\infty$.
581: $d_1^n\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big)
582: \triangleq\frac 1 n\sum_{i=1}^n d_1\big(x_i,\hat x_i\big)$
583: and $d_2^n\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big)
584: \triangleq\frac 1 n\sum_{i=1}^n d_2\big(y_i,\hat y_i\big)$
585: denote the corresponding extensions to blocks.  Oftentimes, the
586: symbols $d_1$ and $d_2$ are used for both the single-letter
587: and the block extensions; which is the intended meaning should
588: be clear from the context.  For any distortion measure
589: $d:\mathcal{X}^n\times\hat{\mathcal{X}}^n\to[0,\infty)$, an element
590: $\hat{\mathbf{x}}^n\in\hat{\mathcal{X}}^n$ and a number $D\geq 0$,
591: a ``ball'' of radius $D$ centered at $\hat{\mathbf{x}}^n$ is the
592: set $B\big(\hat{\mathbf{x}}^n,D\big)=\big\{\mathbf{x}^n\in\mathcal{X}^n
593: \,\big|\,d\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big))<D\big\}$
594: (and similarly for a ball $B\big(\hat{\mathbf{y}}^n,D\big)$).
595: For any $D$, $D^+$ is shorthand for $D+\dot\epsilon$, for an
596: $\epsilon$ that is always clear from the context.
597: 
598: Fix now encoders and decoder $(f_1,f_2,g)$ operating on blocks of length
599: $n$, and a real number $\epsilon>0$.  If we have that
600: \begin{equation}
601:    P\Big( \Big\{ \big(\mathbf{x}^n\mathbf{y}^n\big) \;\Big|\;
602:           \big(\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)
603:           = g\big(f_1(\mathbf{x}^n),f_2(\mathbf{y}^n)\big) \,\wedge\,
604:           d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big)
605:           < D_1^+ \,\wedge\,
606:           d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big)
607:           < D_2^+ \Big\}
608:     \Big) \;\;\geq\;\; 1-\dot{\epsilon},
609:   \label{eq:distortion-constraint}
610: \end{equation}
611: then we say that $(f_1,f_2,g)$ satisfies the $(\epsilon,D_1,D_2)$-distortion
612: constraint.\footnote{This form of a distortion constraint is referred
613: to as an {\it $\epsilon$-fidelity criterion} in~\cite[pg.\ 123]{CsiszarK:81}.
614: An alternative form to this ``local'' condition is given by requiring a
615: ``global'' average constraint of the form
616: $\expect{d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big)}<D_1^+$ and
617: $\expect{d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big)}<D_2^+$.  For
618: the purpose of our developments, the local form lends itself more
619: readily to analysis, and hence is the one we adopt.}
620: 
621: \subsection{Achievable Rates}
622: 
623: A $\big(2^{nR_1},2^{nR_2},n,\epsilon,D_1,D_2\big)$ distributed
624: rate-distortion code is defined by a block length $n$, a
625: parameter $\epsilon>0$, two encoding functions $f_1$ and $f_2$
626: with ranges of size $2^{nR_1}$ and $2^{nR_2}$, and a decoding
627: function $g$, such that $(f_1,f_2,g)$ satisfies the
628: $\big(\epsilon,D_1,D_2\big)$-distortion constraints.
629: 
630: We say that the rate-distortion tuple $(R_1,R_2,D_1,D_2)$ is
631: $\epsilon$-{\em achievable} if a
632: $\big(2^{nR_1},2^{nR_2},n,\epsilon,D_1,D_2\big)$ distributed
633: code exists; for fixed parameters $\big(\epsilon,D_1,D_2\big)$,
634: we denote the set of all $\epsilon$-achievable pairs $(R_1,R_2)$
635: by $\mathcal{R}_\epsilon(D_1,D_2)$.  Then, the {\em rate region}
636: ${\cal R}^*(D_1,D_2)$ of the two sources is defined by
637: \[ \mathcal{R}^*(D_1,D_2)
638:      \;\;\triangleq\;\; \bigcap_{\epsilon>0}\,\mathcal{R}_\epsilon(D_1,D_2).
639: \]
640: 
641: Now we are going to describe a different set of rates.  Define
642: $\mathbb{P}_{\mbox{\tiny LB}}$ to be the set of all probability
643: distributions $p(xy\hat x\hat y)$ over
644: $\mathcal{X}\times\mathcal{Y}\times\hat{\mathcal{X}}\times\hat{\mathcal{Y}}$,
645: such that:
646: \begin{itemize}
647: \item $p(xy\hat x\hat y)=p(\hat x\hat y)p(x|\hat x\hat y)p(y|\hat x\hat y)$
648:   (that is, $X-\hat X\hat Y-Y$ forms a Markov chain);
649: \item $p_{XY}=\sum_{\hat x\hat y}
650:   p(\hat x\hat y)p(x|\hat x\hat y)p(y|\hat x\hat y)$ ($p_{XY}$ is the source);
651: \item and $\expect{d_1\big(X,\hat X\big)}\leq D_1$ and
652:   $\expect{d_2\big(Y,\hat Y\big)}\leq D_2$.
653: \end{itemize}
654: Then, for each $p\in\mathbb{P}_{\mbox{\tiny LB}}$, define
655: \[ \mathcal{R}\big(D_1,D_2,p\big)\;\;\triangleq\;\;
656:    \left\{ (R_1,R_2)\;\left|\;\begin{array}{rcl}
657:            R_1 & \geq & I\big(X\wedge\hat X\hat Y\big|Y\big)[p] \\
658:            R_2 & \geq & I\big(Y\wedge\hat X\hat Y\big|X\big)[p] \\
659:            R_1+R_2 & \geq & I\big(XY\wedge\hat X\hat Y\big)[p]
660:            \end{array}\right.\right\},
661: \]
662: and define also $\mathcal{R}^o(D_1,D_2)\triangleq
663: \bigcup_{p\in\mathbb{P}_{\mbox{\tiny LB}}}\mathcal{R}\big(D_1,D_2,p\big)$.
664: Now we are ready to state our outer bound.
665: 
666: \subsection{Statement of an Outer Bound}
667: 
668: \medskip
669: \begin{center}\textcolor{gray}{\fbox{\begin{minipage}{16cm}
670: \vspace{-4mm}\textcolor{black}{\begin{theorem}
671: \label{thm:main}
672: \[ \mathcal{R}^*\big(D_1,D_2\big)\;\;\subseteq\;\;
673:    \overline{\mathcal{R}^o(D_1,D_2)}.
674: \]
675: \rend
676: \end{theorem}}\end{minipage}}}\end{center}\medskip
677: 
678: The proof of this theorem is given in Section~\ref{sec:main-proof}.
679: Before that, and next in Section~\ref{sec:aux-lemmas}, we develop a
680: number of observations and auxliary results to be used in the main
681: proof.
682: 
683: 
684: \section{Some Useful Observations and Auxiliary Results}
685: \label{sec:aux-lemmas}
686: 
687: \subsection{Distributed Rate-Distortion Codes as Constrained Source Covers}
688: 
689: \subsubsection{Distributed Source Covers}
690: 
691: An equivalent representation for a generic
692: $(2^{nR_1},2^{nR_2},n,\epsilon,D_1,D_2)$ code is given as follows:
693: \begin{itemize}
694: \item Two covers:
695:   $\mathcal{S}_1 = \big\{ \mathbf{S}_{1,i} : i=1...2^{nR_1} \big\}$
696:     of $\mathcal{X}^n$,
697:   and $\mathcal{S}_2 = \big\{ \mathbf{S}_{2,j} : j=1...2^{nR_2} \big\}$
698:     of $\mathcal{Y}^n$.
699:   Any code with encoders $f_1$ and $f_2$ can be represented in terms
700:   of two such covers, by considering $f_1^{-1}(i)= \mathbf{S}_{1,i}$ and
701:   $f_2^{-1}(j)=\mathbf{S}_{2,j}$.\footnote{Note that, strictly speaking,
702:   this definition is correct only when $\mathcal{S}$ is a partition.
703:   Occasionally we might abuse the notation and still refer to the code
704:   specified by a cover, with the understanding that in such cases ties
705:   (of the form of a source sequence being part of two different cover
706:   elements) are broken arbitrarily.  This should not cause any confusion.} \\
707:   (Note: these two covers define a cover $\mathcal{S}=\big(\mathcal{S}_1,
708:   \mathcal{S}_2\big)$ of $\mathcal{X}^n\times \mathcal{Y}^n$, with elements
709:   $\mathbf{S}_{ij} \;=\; \mathbf{S}_{1,i}\times \mathbf{S}_{2,j}$,
710:   for $(i,j)\in\{1...2^{nR_1}\}\times\{1...2^{nR_2}\}$.)
711: \item A pair of reconstruction sequences $\big(\hat{\mathbf{x}}^n(ij),
712:   \hat{\mathbf{y}}^n(ij)\big)=g(i,j)$ associated to each cover element
713:   $\mathbf{S}_{ij}$ of the product source, for all
714:   $(i,j)\in\{1...2^{nR_1}\}\times\{1...2^{nR_2}\}$.
715: \end{itemize}
716: 
717: In general, whenever we refer to a distributed rate-distortion code,
718: we use interchangeably the earlier representation in terms of two
719: encoders and one decoder, and this representation in terms of covers.
720: 
721: \subsubsection{Distributed Typical Sets}
722: 
723: As highlighted in the Introduction, it turns out that covers
724: $\mathbf{S}_{ij}$ of the product source $\mathcal{X}^n\times\mathcal{Y}^n$
725: are constrained beyond the requirements imposed by the fidelity
726: criteria.  That ``extra'' structure is described by
727: Proposition~\ref{prp:distributed-typicality}.
728: 
729: \medskip
730: \begin{center}\textcolor{gray}{\fbox{\begin{minipage}{16cm}
731: \vspace{-4mm}\textcolor{black}{\begin{proposition}
732: \label{prp:distributed-typicality}
733: For any cover $\mathcal S$ of $\mathcal{X}^n\times\mathcal{Y}^n$
734: defined by some $(2^{nR_1},2^{nR_2},n,\epsilon,D_1,D_2)$ distributed
735: rate-distortion code, and for any
736: $(i,j)\in\{1...2^{nR_1}\}\times\{1...2^{nR_2}\}$,
737: $\mathbf{x}^n\in\mathbf{S}_{1,i}$ and $\mathbf{y}^n\in\mathbf{S}_{2,j}$,
738: then it must be the case that 
739: either $(\mathbf{x}^n\mathbf{y}^n)\in\mathbf{S}_{ij}\cap
740: T_\epsilon^n\big(XY\big)$ or $(\mathbf{x}^n\mathbf{y}^n)\not\in
741: T_\epsilon^n\big(XY\big)$.  \rend
742: \end{proposition}}\end{minipage}}}\end{center}\medskip
743: 
744: {\it Proof.} This is rather straightforward.  Take any
745: $\mathbf{x}^n\in\mathbf{S}_{1,i}$ and $\mathbf{y}^n\in\mathbf{S}_{2,j}$.
746: Then:
747: \begin{itemize}
748: \item by construction,
749:   $\big(\mathbf{x}^n\mathbf{y}^n\big)\in\mathbf{S}_{ij}$;
750: \item either $\big(\mathbf{x}^n\mathbf{y}^n\big)\in
751:   T_\epsilon^n\big(XY\big)$ or 
752:   $\big(\mathbf{x}^n\mathbf{y}^n\big)\not\in
753:   T_\epsilon^n\big(XY\big)$ -- a tautology;
754: \item if $\big(\mathbf{x}^n\mathbf{y}^n\big)\in
755:   T_\epsilon^n\big(XY\big)$, then $\big(\mathbf{x}^n\mathbf{y}^n\big)\in
756:   \mathbf{S}_{ij}\cap T_\epsilon^n\big(XY\big)$, and therefore
757:   the proposition is proved;
758: \item and if instead, $\big(\mathbf{x}^n\mathbf{y}^n\big)\not\in
759:   T_\epsilon^n\big(XY\big)$, then the proposition is proved too.
760:   \tend
761: \end{itemize}
762: \medskip
763: 
764: Proposition~\ref{prp:distributed-typicality} formally states the
765: property of covers arising from distributed codes discussed informally
766: in the Introduction (cf.~Sec.~\ref{sec:intro-distributed-covers}): all
767: combinations of an $\mathbf{x}^n$ sequence in $\mathbf{S}_{1,i}$ and
768: a $\mathbf{y}^n$ sequence in $\mathbf{S}_{2,j}$, if they are jointly
769: typical, must appear in $\mathbf{S}_{ij}\cap T_\epsilon^n\big(XY\big)$
770: -- the decoder does not have enough information to discriminate among
771: such pairs.
772: 
773: We now introduce a new definition.
774: Consider any subset $\mathbf{S}\subseteq T_\epsilon^n\big(XY\big)$
775: for which, for any $(\mathbf{x}^n,\mathbf{y}_1^n)\in\mathbf{S}$ and
776: $(\mathbf{x}_1^n,\mathbf{y}^n)\in\mathbf{S}$, we have that either
777: $(\mathbf{x}^n\mathbf{y}^n)\in\mathbf{S}$ or
778: $(\mathbf{x}^n\mathbf{y}^n)\not\in T_\epsilon^n\big(XY\big)$
779: -- that is, the property of Prop.~\ref{prp:distributed-typicality}
780: holds for $\mathbf{S}$.  In this case, we say that $\mathbf{S}$ is
781: is a {\em distributed} typical set.
782: 
783: Clearly there are ``interesting'' distributed typical sets, the
784: concept is not vacuous:
785: \begin{itemize}
786: \item all sets of the form $\mathbf{S} = \{ (\mathbf{x}^n\mathbf{y}^n) \}$,
787:   with $(\mathbf{x}^n\mathbf{y}^n)\in T_\epsilon^n\big(XY\big)$,
788:   are distributed typical sets;
789: \item for any $\mathbf{S}_1\subseteq\mathcal{X}^n$ and any
790:   $\mathbf{S}_2\subseteq\mathcal{Y}^n$,
791:   $\mathbf{S}\triangleq\big[\mathbf{S}_1\!\times\!\mathbf{S}_2\big]\cap
792:   T_\epsilon^n\big(XY\big)$ is a distributed typical set.
793: \end{itemize}
794: The last example provides a natural way of systematically constructing
795: distributed typical sets.
796: 
797: \subsubsection{Source Covers Made of Distributed Typical Sets}
798: 
799: We show next that in multiterminal source coding, the source must
800: be covered with distributed typical sets in which each of the two
801: components of the set gets specified by a different encoder.
802: 
803: Consider a length $n$ $\big(f_1,f_2,g\big)$ code, satisfying the
804: $(\epsilon,D_1,D_2)$-distortion constraint of
805: eqn.~\eqref{eq:distortion-constraint}:
806: \begin{eqnarray*}
807: \lefteqn{P\Big( \Big\{ \big(\mathbf{x}^n\mathbf{y}^n\big) \;\Big|\;
808:           \big(\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)
809:           = g\big(f_1(\mathbf{x}^n),f_2(\mathbf{y}^n)\big) \,\wedge\,
810:           d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big)
811:           < D_1^+ \,\wedge\,
812:           d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big)
813:           < D_2^+ \Big\}
814:     \Big)} \\
815:   & \stackrel{(a)}{=} &
816:           P\Big( \Big\{ \big(\mathbf{x}^n\mathbf{y}^n\big) \;\Big|\;
817:           \big(\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)
818:           = g\big(f_1(\mathbf{x}^n),f_2(\mathbf{y}^n)\big) \,\wedge\,
819:           d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big)
820:           < D_1^+ \,\wedge\,
821:           d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big)
822:           < D_2^+ \Big\}
823:           \cap\bigcup_{(i,j)}\mathbf{S}_{ij}
824:     \Big) \\
825:   & = & P\Big(
826:           \bigcup_{(i,j)}
827:           \Big\{ \big(\mathbf{x}^n\mathbf{y}^n\big) \;\Big|\;
828:           \big(\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)
829:           = g\big(f_1(\mathbf{x}^n),f_2(\mathbf{y}^n)\big) \,\wedge\,
830:           d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big)
831:           < D_1^+ \,\wedge\,
832:           d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big)
833:           < D_2^+ \Big\}
834:           \cap\mathbf{S}_{ij}
835:     \Big) \\
836:   & \stackrel{(b)}{=} & P\Big(
837:           \bigcup_{(i,j)}
838:           \Big\{ \big(\mathbf{x}^n\mathbf{y}^n\big) \;\Big|\;
839:           d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n(ij)\big)<D_1^+
840:           \,\wedge\,\mathbf{x}^n\in\mathbf{S}_{1,i}
841:           \,\wedge\, d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n(ij)\big)<D_2^+
842:           \,\wedge\,\mathbf{y}_2^n\in\mathbf{S}_{2,j} \Big\}
843:     \Big) \\
844:   & = & P\Big( \bigcup_{(i,j)}
845:           \big[\mathbf{S}_{1,i}\!\times\!\mathbf{S}_{2,j}\big]
846:           \cap
847:           \big[B\big(\hat{\mathbf{x}}^n(ij),D_1^+\big)
848:                \!\times\!B\big(\hat{\mathbf{y}}^n(ij),D_2^+\big)\big]
849:          \,\Big) \\
850:   & \stackrel{(c)}{\geq} & 1-\dot{\epsilon},
851: \end{eqnarray*}
852: where (a) follows from 
853: $\big\{ \big(\mathbf{x}^n\mathbf{y}^n\big)\,\Big|\,
854: \big(\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)
855: = g\big(f_1(\mathbf{x}^n),f_2(\mathbf{y}^n)\big) \,\wedge\,
856: d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big) < D_1^+ \,\wedge\,
857: d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big) < D_2^+ \big\}
858: \;\subseteq\;\mathcal{X}^n\times\mathcal{Y}^n
859: \;\subseteq\;\bigcup_{(i,j)} \mathbf{S}_{ij}$;
860: (b) follows from $\mathbf{S}_{ij}=\mathbf{S}_{1,i}\times\mathbf{S}_{2,j}$;
861: and (c) follows from the fact
862: that the code under consideration satisfies the distortion constraint
863: of eqn.~\eqref{eq:distortion-constraint}.  We also know, from basic
864: properties of typical sets, that
865: \[ P\Big( T_\epsilon^n\big(XY\big) \Big) \;\;\geq\;\; 1-\epsilon,
866: \]
867: and so, if we define $\tilde{\mathbf{S}}_{ij}\triangleq
868: \big[\mathbf{S}_{1,i}\times\mathbf{S}_{2,j}\big]\cap
869: T_\epsilon^n\big(XY\big)$, we see that
870: \begin{eqnarray}
871: \lefteqn{P\Big( \bigcup_{(i,j)}
872:           \big[\mathbf{S}_{1,i}\!\times\!\mathbf{S}_{2,j}\big]
873:           \cap
874:           \big[B\big(\hat{\mathbf{x}}^n(ij),D_1^+\big)
875:                \!\times\!B\big(\hat{\mathbf{y}}^n(ij),D_2^+\big)\big]
876:           \cap
877:           T_\epsilon^n\big(XY\big) \,\Big)} \nonumber
878: \hspace{4cm} \\
879:   & = & P\left( \bigcup_{(i,j)}
880:                 \tilde{\mathbf{S}}_{ij} \cap
881:                 \big[B\big(\hat{\mathbf{x}}^n(ij),D_1^+\big)
882:                   \!\times\!B\big(\hat{\mathbf{y}}^n(ij),D_2^+\big)\big]
883:           \right) \nonumber \\
884:   & \geq & 1-\ddot\epsilon;
885:   \label{eq:distortion-constraint-2}
886: \end{eqnarray}
887: that is, since $\tilde{\mathbf{S}}_{ij}$ is a distributed typical set,
888: the source must be covered with the fraction of such sets contained in
889: pairs of balls centered at the reconstruction sequences; furthermore,
890: we note that each component of the distributed typical set must be
891: specified completely by each encoder.
892: 
893: % These constraints on the structure of source covers are significant,
894: % and they were not captured by any previous outer bounds.  The main task
895: % ahead of us then is to make use of this newly discovered structure to
896: % prove a better outer bound.
897: 
898: \subsection{The ``Reverse'' Markov Lemma}
899: \label{sec:reverse-markov-lemma}
900: 
901: \subsubsection{The Standard Form}
902: 
903: Lemma~\ref{lemma:markov} is the Markov lemma as stated
904: in~\cite[pg.\ 202]{Berger:78}, in our own notation.
905: 
906: \medskip\begin{lemma}[Markov]
907: \label{lemma:markov}
908: Consider a Markov chain of the form $X-Z-Y$.  Then, for all $\epsilon>0$,
909: \[ \lim_{n\to\infty}
910:    P\Big( \big(X^n,\mathbf{y}^n\big)\in T_\epsilon^n\big(XY\big)
911:           \;\Big|\;
912:           \big(Z^n,\mathbf{y}^n\big)\in T_\epsilon^n\big(ZY\big)
913:     \Big) \;\;=\;\; 1,
914: \]
915: for any sequence $\mathbf{y}^n\in\mathcal{Y}^n$.
916: \rend
917: \end{lemma}\medskip
918: 
919: The lemma says that for {\em every} $\mathbf{y}^n\in\mathcal{Y}^n$,
920: {\em if} the random vector
921: $\big(Z^n,\mathbf{y}^n\big)\in T_\epsilon^n\big(ZY\big)$, {\em then}
922: the random vector $\big(X^n,\mathbf{y}^n\big)\in T_\epsilon^n\big(XY\big)$,
923: with high probability.  This is not true in general: if we have two pairs
924: of sequences $\big(\mathbf{x}^n\mathbf{z}^n\big)\in T_\epsilon^n\big(XZ\big)$
925: and $\big(\mathbf{z}^n\mathbf{y}^n\big)\in T_\epsilon^n\big(ZY\big)$, it
926: is not always the case that
927: $\big(\mathbf{x}^n\mathbf{z}^n\mathbf{y}^n\big)\in T_\epsilon^n\big(XZY\big)$,
928: and therefore that 
929: $\big(\mathbf{x}^n\mathbf{y}^n\big)\in T_\epsilon^n\big(XY\big)$; that
930: is, joint typicality is {\em not} a transitive relation.  However,
931: if $X-Z-Y$ forms a Markov chain, and then only in a high probability
932: sense, said transitivity property holds.
933: 
934: \subsubsection{A Converse Statement}
935: 
936: We are interested in a converse form of the Markov lemma.  Suppose
937: we are given an arbitrary distribution $p(xyz)$, whose typical
938: sets satisfy the constraints imposed by the Markov lemma: can we say
939: that $p$ itself must be a Markov chain?  It turns out the answer is
940: {\em almost yes} -- if some arbitrary distribution $p$ induces typical
941: sets like those of a Markov chain, then there must exist a Markov
942: chain $p'$ within $L_1$ distance $2\epsilon$ of $p$.  This statement
943: is made precise in the following lemma.
944: 
945: \medskip\begin{center}\textcolor{gray}{\fbox{\begin{minipage}{16cm}
946: \vspace{-4mm}\textcolor{black}{\begin{lemma}[Reverse Markov]
947: \label{lemma:reverse-markov}
948: Fix $n$, $\epsilon>0$.  Consider any distribution
949: $p(xyz)$ for which, for some $\mathbf{z}^n$,
950: \[
951:   T_\epsilon^n\big(X\big|\mathbf{z}^n\big)[p]
952:   \times T_\epsilon^n\big(Y\big|\mathbf{z}^n\big)[p]
953:   \;\;=\;\; T_\epsilon^n\big(XY\big|\mathbf{z}^n\big)[p].
954: \]
955: Define a Markov chain $p'(xyz)=p(z)p(x|z)p(y|z)$, with the components
956: $p(z)$, $p(x|z)$ and $p(y|z)$ taken from the given $p(xyz)$.  Then,
957: $\big|\big|p-p'\big|\big|_1\,<\,2\epsilon$.
958: \rend
959: \end{lemma}}\end{minipage}}}\end{center}\medskip
960: 
961: {\it Proof.}
962: Consider any $\mathbf{z}^n$ for which
963: $T_\epsilon^n\big(XY\big|\mathbf{z}^n\big)[p]\neq\emptyset$.
964: Since $p'$ is a Markov chain, from the direct form of the Markov
965: lemma we know that
966: \[
967:    T_\epsilon^n\big(X\big|\mathbf{z}^n\big)[p']
968:    \times T_\epsilon^n\big(Y\big|\mathbf{z}^n\big)[p']
969:    \;\;\subseteq_{\epsilon'}\;\;
970:    T_\epsilon^n\big(XY\big|\mathbf{z}^n\big)[p'];
971: \]
972: and clearly,
973: $\emptyset\neq
974: T_\epsilon^n\big(XY\big|\mathbf{z}^n\big)[p]
975: =
976: T_\epsilon^n\big(X\big|\mathbf{z}^n\big)[p]
977:  \times T_\epsilon^n\big(Y\big|\mathbf{z}^n\big)[p]
978: =
979: T_\epsilon^n\big(X\big|\mathbf{z}^n\big)[p']
980:  \times T_\epsilon^n\big(Y\big|\mathbf{z}^n\big)[p']$,
981: since we choose $p'$ to coincide with $p$ on the corresponding marginals,
982: and from our choice of $\mathbf{z}^n$.  So, this last inclusion can be
983: written as
984: \[
985:    T_\epsilon^n\big(X\big|\mathbf{z}^n\big)[p]
986:         \times T_\epsilon^n\big(Y\big|\mathbf{z}^n\big)[p]
987:    \;\;\subseteq_{\epsilon'}\;\;
988:    T_\epsilon^n\big(XY\big|\mathbf{z}^n\big)[p'],
989: \]
990: and therefore we see that
991: \[
992:    \emptyset \;\;\neq\;\;
993:    T_\epsilon^n\big(X\big|\mathbf{z}^n\big)[p]
994:         \times T_\epsilon^n\big(Y\big|\mathbf{z}^n\big)[p]
995:    \;\;\subseteq_{\epsilon'}\;\;
996:    T_\epsilon^n\big(XY\big|\mathbf{z}^n\big)[p]
997:     \cap T_\epsilon^n\big(XY\big|\mathbf{z}^n\big)[p'];
998: \]
999: thus, there must exist at least one triplet of sequences
1000: $\big(\mathbf{x}^n\mathbf{y}^n\mathbf{z}^n\big)$ that
1001: is jointly typical under both $p$ and $p'$.  So for these particular
1002: sequences, it follows from the definition of strong typicality that
1003: both
1004: \[ \forall xyz: \big|\mbox{$\frac 1 n$}N\big(xyz;
1005:    \mathbf{x}^n\mathbf{y}^n\mathbf{z}^n\big)-
1006:    p(xyz)\big|\,<\,\mbox{$\frac\epsilon{|\mathcal{X}|
1007:    |\mathcal{Y}||\mathcal{Z}|}$}
1008:    \;\textrm{ and }\;
1009:    \forall xyz: \big|\mbox{$\frac 1 n$}N\big(xyz;
1010:    \mathbf{x}^n\mathbf{y}^n\mathbf{z}^n\big)-
1011:    p'(xyz)\big|\,<\,\mbox{$\frac\epsilon{|\mathcal{X}|
1012:    |\mathcal{Y}||\mathcal{Z}|}$},
1013: \]
1014: and therefore the $L_1$ norm of $p-p'$ can be written as
1015: \begin{eqnarray*}
1016: \big|\big|p'-p\big|\big|_1
1017:   & = & \sum_{xyz}\big|p(xyz)-p'(xyz)\big| \\
1018:   & = & \sum_{xyz}\big|p(xyz)-\mbox{$\frac 1 n$}
1019:         N\big(xyz;\mathbf{x}^n\mathbf{y}^n\mathbf{z}^n\big)+\mbox{$\frac 1 n$}
1020:         N\big(xyz;\mathbf{x}^n\mathbf{y}^n\mathbf{z}^n\big)-p'(xyz)\big| \\
1021:   & \leq & \sum_{xyz}\big|\mbox{$\frac 1 n$}N\big(xyz;
1022:         \mathbf{x}^n\mathbf{y}^n\mathbf{z}^n\big)
1023:         -p(xyz)\big|
1024:         +\sum_{xyz}\big|\mbox{$\frac 1 n$}
1025:         N\big(xyz;\mathbf{x}^n\mathbf{y}^n\mathbf{z}^n\big)-p'(xyz)\big| \\
1026:   & < & 2\epsilon,
1027: \end{eqnarray*}
1028: thus proving the lemma.
1029: \tend\bigskip
1030: 
1031: Our interest in this question stems from the fact that, from the
1032: requirement to cover a product source with distributed typical sets,
1033: we do get constraints on the shape of various typical sets.  So we
1034: need to characterize what distributions can give rise to those sets,
1035: and this lemma plays an important role in that.
1036: 
1037: 
1038: \subsection{Upper Bounds on the Size of Distributed Typical Cover Elements}
1039: 
1040: \medskip
1041: \begin{center}\textcolor{gray}{\fbox{\begin{minipage}{16cm}
1042: \vspace{-4mm}\textcolor{black}{\begin{lemma}
1043: \label{lemma:bound-size}
1044: Consider any $\big(2^{nR_1},2^{nR_2},n,\epsilon,D_1,D_2\big)$ distributed
1045: rate-distortion code, represented by a cover $\mathcal{S}$.  Then, there
1046: exists a distribution $\pi\in\mathbb{P}_{\mbox{\tiny LB}}$ such that, for
1047: all $(i,j)\in\{1...2^{nR_1}\}\times\{1...2^{nR_2}\}$ and all $\epsilon>0$,
1048: \[ \big|\mathbf{S}_{ij}\,\cap\,T_\epsilon^n\big(XY\big)\big|
1049:    \;\;\leq\;\;
1050:    2^{n(H(XY|\hat X\hat Y)[\pi]+\ddot\epsilon)},
1051: \]
1052: provided $n$ is large enough.  Furthermore, for all
1053: $\mathbf{y}^n\in\mathcal{Y}^n$,
1054: \[ \big|\mathbf{S}_{1,i}\cap T_\epsilon^n\big(X\big|\mathbf{y}^n\big)\big|
1055:    \;\;\leq\;\;
1056:    2^{n(H(X|\hat X\hat YY)[\pi]+\ddot\epsilon')},
1057: \]
1058: and similarly for all $\mathbf{x}^n\in\mathcal{X}^n$,
1059: \[ \big|\mathbf{S}_{2,j}\cap T_\epsilon^n\big(Y\big|\mathbf{x}^n\big)\big|
1060:    \;\;\leq\;\;
1061:    2^{n(H(Y|\hat X\hat YX)[\pi]+\ddot\epsilon'')},
1062: \]
1063: also provided $n$ is large enough.
1064: \rend
1065: \end{lemma}}\end{minipage}}}\end{center}\medskip
1066: 
1067: {\it Proof.}  From the two-terminal rate-distortion
1068: theorem~\cite[Thm.\ 2.2.3]{CsiszarK:81}, we know there exists a
1069: distribution $p(xy\hat x\hat y)=p(xy)p(\hat x\hat y|xy)$, with
1070: $p(xy)$ the given source, $\expect{d_1\big(X,\hat X\big)}\leq D_1$
1071: and $\expect{d_2\big(Y,\hat Y\big)}\leq D_2$, and
1072: sequences $\hat{\mathbf{x}}^n(ij)$ and $\hat{\mathbf{y}}^n(ij)$
1073: such that, for all
1074: $(i,j)\in\{1...2^{nR_1}\}\times\{1...2^{nR_2}\}$ and all $\epsilon>0$,
1075: \begin{equation}
1076:  \tilde{\mathbf{S}}_{ij}
1077:  \;\;\subseteq\;\;
1078:  T_\epsilon^n\big(XY\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big),
1079:  \label{eq:const-std-rd}
1080: \end{equation}
1081: provided $n$ is large enough.  But since for distributed codes we
1082: have $\tilde{\mathbf{S}}_{ij}=
1083: \big[\mathbf{S}_{1,i}\times\mathbf{S}_{2,j}\big]\cap T_\epsilon^n\big(XY\big)$,
1084: it follows from standard properties of typical sets that
1085: \[ \mathbf{S}_{1,i}\cap T_\epsilon^n\big(X\big|\mathbf{S}_{2,j}\big)
1086:    \;\;\subseteq\;\;
1087:    T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)
1088:    \mbox{\hspace{1cm}and\hspace{1cm}}
1089:    \mathbf{S}_{2,j}\cap T_\epsilon^n\big(Y\big|\mathbf{S}_{1,i}\big)
1090:    \;\;\subseteq\;\;
1091:    T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big).
1092: \]
1093: Consider now a new cover $\mathcal{S}'$, having the property that
1094: \[ \mathbf{S}'_{1,i}\cap T_\epsilon^n\big(X\big|\mathbf{S}'_{2,j}\big)
1095:    \;\;=\;\;
1096:    T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)
1097:    \mbox{\hspace{1cm}and\hspace{1cm}}
1098:    \mathbf{S}'_{2,j}\cap T_\epsilon^n\big(X\big|\mathbf{S}'_{1,i}\big)
1099:    \;\;=\;\;
1100:    T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big).
1101: \]
1102: A simple expression for the cover element $\mathbf{S}'_{1,i}$ is obtained
1103: as follows.  Fix an index $i\in\{1...2^{nR_1}\}$:
1104: \[\begin{array}{lrcl}
1105:   & \forall k: \mathbf{S}'_{1,i}\cap
1106:                T_\epsilon^n\big(X\big|\mathbf{S}'_{2,k}\big)
1107:     & =
1108:     & T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ik)\hat{\mathbf{y}}^n(ik)\big)
1109:     \\
1110:   \Rightarrow\hspace{6mm}
1111:     & \bigcup_{k=1}^{2^{nR_2}}
1112:       \mathbf{S}'_{1,i}\cap T_\epsilon^n\big(X\big|\mathbf{S}'_{2,k}\big)
1113:     & =
1114:     & \bigcup_{k=1}^{2^{nR_2}}
1115:       T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ik)\hat{\mathbf{y}}^n(ik)\big)
1116:     \\
1117:   \Rightarrow
1118:     & \mathbf{S}'_{1,i}\cap \bigcup_{k=1}^{2^{nR_2}}
1119:           T_\epsilon^n\big(X\big|\mathbf{S}'_{2,k}\big)
1120:     & =
1121:     & \bigcup_{k=1}^{2^{nR_2}}
1122:       T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ik)\hat{\mathbf{y}}^n(ik)\big)
1123:     \\
1124:   \Rightarrow
1125:     & \mathbf{S}'_{1,i}\cap S_{\epsilon,Y}^n\big(X\big)
1126:     & =
1127:     & \bigcup_{k=1}^{2^{nR_2}}
1128:       T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ik)\hat{\mathbf{y}}^n(ik)\big),
1129: \end{array}\]
1130: and since $P\big(S_{\epsilon,Y}^n\big(X\big)\big)>1-\dot\epsilon$,
1131: $\mathbf{S}'_{1,i}$ is determined up to a set of vanishing measure;
1132: similarly, fixing $j\in\{1...2^{nR_2}\}$, we get
1133: $\mathbf{S}'_{2,j}\cap S_{\epsilon,X}^n\big(Y\big) = \bigcup_{l=1}^{2^{nR_1}}
1134: T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n(lj)\hat{\mathbf{y}}^n(lj)\big)$.
1135: 
1136: The new cover $\mathcal{S}'$ has some useful properties:
1137: \begin{itemize}
1138: \item for all $(i,j)$, $\mathbf{S}_{1,i}\cap S_{\epsilon,Y}^n\big(X\big)
1139:   \subseteq\mathbf{S}'_{1,i}\cap S_{\epsilon,Y}^n\big(X\big)$ and
1140:   $\mathbf{S}_{2,j}\cap S_{\epsilon,X}^n\big(Y\big)\subseteq
1141:   \mathbf{S}'_{2,j}\cap S_{\epsilon,X}^n\big(Y\big)$, and therefore
1142:   $\tilde{\mathbf{S}}_{ij}\subseteq\tilde{\mathbf{S}}'_{ij}$ as
1143:   well, by construction;
1144: \item for all $\big(\mathbf{x}^n\mathbf{y}^n\big)\in\tilde{\mathbf{S}}'_{ij}$,
1145:   $d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n(ij)\big)<D_1^+$ and
1146:   $d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n(ij)\big)<D_2^+$, from the
1147:   joint typicality conditions defining $\mathbf{S}'_{1,i}$ and
1148:   $\mathbf{S}'_{2,j}$;
1149: \item and $P\Big(\bigcup_{ij}\tilde{\mathbf{S}}'_{ij}\Big) \geq
1150:   P\Big(\bigcup_{ij}\tilde{\mathbf{S}}_{ij}\Big) > 1-\dot\epsilon$;
1151: \end{itemize}
1152: so, $\mathcal{S}'$ ``dominates'' $\mathcal{S}$ (in that every element
1153: in $\mathcal{S}$ is contained in one element of $\mathcal{S}'$), and
1154: $\mathcal{S}'$ satisfies the same distortion constraints that $\mathcal{S}$
1155: does.  Therefore, an upper bound on the size of the elements in the new
1156: cover $\mathcal{S}'$ is also an upper bound on the size of the elements
1157: in the given cover $\mathcal{S}$.
1158: 
1159: Next we observe that new cover element $\tilde{\mathbf{S}}'_{ij}$ can be
1160: ``sandwiched'' in between two other terms:
1161: \begin{eqnarray*}
1162: \Big[T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)
1163:      \times
1164:      T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)
1165:      \Big]\cap T_\epsilon^n\big(XY\big)
1166:   & \stackrel{(a)}{\subseteq} &
1167:     \big[\mathbf{S}'_{1,i}\times\mathbf{S}'_{2,j}\big]
1168:     \cap T_\epsilon^n\big(XY\big) \\
1169:   & \stackrel{(b)}{\subseteq} &
1170:     T_\epsilon^n\big(XY\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big),
1171: \end{eqnarray*}
1172: where (a) follows from our choice of $\mathbf{S}'_{1,i}$ and
1173: $\mathbf{S}'_{2,j}$, and from elementary algebra of sets; and (b)
1174: follows from eqn.~\eqref{eq:const-std-rd}, and from the product form
1175: of distributed covers.  So, since the other inclusion always holds,
1176: \[
1177:    \Big[
1178:    T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)
1179:    \times
1180:    T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)
1181:    \Big]\cap T_\epsilon^n\big(XY\big)
1182:    \;\;=\;\;
1183:    T_\epsilon^n\big(XY\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)
1184: \]
1185: is a necessary condition on any suitable distribution $p(xy\hat x\hat y)$
1186: whose typical sets can be used to construct the cover $\mathcal{S}'$; or
1187: equivalently, since this must hold for every $(i,j)$,
1188: \[ \Big[
1189:    T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)
1190:    \times
1191:    T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)
1192:    \Big]\cap T_\epsilon^n\big(XY\big)
1193:    \;\;=\;\;
1194:    T_\epsilon^n\big(XY\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big),
1195: \]
1196: for any sequences $\hat{\mathbf{x}}^n$ and $\hat{\mathbf{y}}^n$ such
1197: that $T_\epsilon^n\big(XY\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)
1198: \neq\emptyset$.  Finally we note that this last condition is equivalent
1199: to
1200: \begin{equation}
1201:    T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)
1202:    \times
1203:    T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)
1204:    \;\;=\;\;
1205:    T_\epsilon^n\big(XY\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big).
1206:   \label{eq:const-typsets-1}
1207: \end{equation}
1208: This is because this last equality already forces any $\mathbf{x}^n
1209: \in T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)$
1210: and $\mathbf{y}^n\in
1211: T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)$ to
1212: be jointly typical.  Therefore, from the reverse Markov lemma, we
1213: conclude there exists a distribution $\pi(xy\hat x\hat y)$, which
1214: satisfies a Markov chain of the form $X-\hat X\hat Y-Y$, such that
1215: $\big|\big|p-\pi\big|\big|_1<2\epsilon$.
1216: 
1217: \centerline{---------------------}
1218: 
1219: Next we observe that if $\big|\big|p-\pi\big|\big|_1<2\epsilon$,
1220: then conditionals and marginals of $p$ and of $\pi$ are also close.
1221: Consider, for example,
1222: $p_{\hat X\hat Y}(\hat x\hat y)=\sum_{xy}p_{XY\hat X\hat Y}(xy\hat x\hat y)$
1223: and $\pi_{\hat X\hat Y}(\hat x\hat y)
1224: =\sum_{xy}\pi_{XY\hat X\hat Y}(xy\hat x\hat y)$:
1225:   \begin{eqnarray*}
1226:   \big|\big|p_{\hat X\hat Y}(\cdot)-\pi_{\hat X\hat Y}(\cdot)\big|\big|_1
1227:      & = & \sum_{\hat x\hat y}
1228:            \big|p_{\hat X\hat Y}(\hat x\hat y)
1229:                 -\pi_{\hat X\hat Y}(\hat x\hat y)\big| \\
1230:      & = & \sum_{\hat x\hat y}
1231:            \Big|\Big(\sum_{x'y'}p_{XY\hat X\hat Y}(x'y'\hat x\hat y)\Big)
1232:                 -\Big(\sum_{x''y''}\pi_{XY\hat X\hat Y}(x''y''\hat x\hat y)
1233:                  \Big)\Big| \\
1234:      & = & \sum_{\hat x\hat y}
1235:            \Big|\sum_{xy}p_{XY\hat X\hat Y}(xy\hat x\hat y)
1236:                 -\pi_{XY\hat X\hat Y}(xy\hat x\hat y)\Big| \\
1237:      & \leq & \sum_{xy\hat x\hat y}
1238:            \big|p_{XY\hat X\hat Y}(xy\hat x\hat y)
1239:                 -\pi_{XY\hat X\hat Y}(xy\hat x\hat y)\big| \\
1240:      & < & 2\epsilon.
1241:   \end{eqnarray*}
1242: For the conditional $p_{XY|\hat X\hat Y}(xy|\hat x\hat y)$:
1243:   \begin{eqnarray*}
1244:   \lefteqn{\big|\big|p_{XY|\hat X\hat Y}(\cdot|\hat x\hat y)
1245:                      -\pi_{XY|\hat X\hat Y}(\cdot|\hat x\hat y)\big|\big|_1
1246:      \;\; = \;\; \sum_{xy} \big|p_{XY|\hat X\hat Y}(xy|\hat x\hat y)
1247:                           -p_{XY|\hat X\hat Y}(xy|\hat x\hat y)\big|} \\
1248:      & = & \sum_{xy} \Big|\frac{p_{XY\hat X\hat Y}(xy\hat x\hat y)}
1249:                                {p_{\hat X\hat Y}(\hat x\hat y)}
1250:                           -\frac{\pi_{XY\hat X\hat Y}(xy\hat x\hat y)}
1251:                                 {\pi_{\hat X\hat Y}(\hat x\hat y)}\Big| \\
1252:      & = & \mbox{$\frac{1}{p_{\hat X\hat Y}(\hat x\hat y)
1253:                            \pi_{\hat X\hat Y}(\hat x\hat y)}$}
1254:            \sum_{xy} \big|p_{XY\hat X\hat Y}(xy\hat x\hat y)
1255:                           \pi_{\hat X\hat Y}(\hat x\hat y)
1256:                          -\pi_{XY\hat X\hat Y}(xy\hat x\hat y)
1257:                           p_{\hat X\hat Y}(\hat x\hat y)\big| \\
1258:      & \stackrel{(a)}{<} & \mbox{$\frac{1}{p_{\hat X\hat Y}(\hat x\hat y)
1259:                                            \pi_{\hat X\hat Y}(\hat x\hat y)}$}
1260:            \sum_{xy} \big|p_{XY\hat X\hat Y}(xy\hat x\hat y)
1261:                            p_{\hat X\hat Y}(\hat x\hat y)
1262:                           +p_{XY\hat X\hat Y}(xy\hat x\hat y)2\epsilon
1263:                           -\pi_{XY\hat X\hat Y}(xy\hat x\hat y)
1264:                            p_{\hat X\hat Y}(\hat x\hat y)\big| \\
1265:      & \leq & \mbox{$\frac{1}{p_{\hat X\hat Y}(\hat x\hat y)
1266:                               \pi_{\hat X\hat Y}(\hat x\hat y)}$}
1267:            \sum_{xy}\Big(2\epsilon p_{XY\hat X\hat Y}(xy\hat x\hat y)
1268:                     +p_{\hat X\hat Y}(\hat x\hat y)
1269:                      \big|p_{XY\hat X\hat Y}(xy\hat x\hat y)
1270:                           -\pi_{XY\hat X\hat Y}(xy\hat x\hat y)\big|\Big) \\
1271:      & = & \mbox{$\frac{1}{p_{\hat X\hat Y}(\hat x\hat y)
1272:                               \pi_{\hat X\hat Y}(\hat x\hat y)}$}
1273:            \left(2\epsilon p_{\hat X\hat Y}(\hat x\hat y)
1274:                     +p_{\hat X\hat Y}(\hat x\hat y)\sum_{xy}
1275:                      \big|p_{XY\hat X\hat Y}(xy\hat x\hat y)
1276:                           -\pi_{XY\hat X\hat Y}(xy\hat x\hat y)\big|\right) \\
1277:      & \leq & \frac{4\epsilon}{\pi_{\hat X\hat Y}(\hat x\hat y)} \\
1278:      & \triangleq & \epsilon_1,
1279:   \end{eqnarray*}
1280: where (a) follows from the $L_1$ bound on the marginals
1281: $p_{\hat X\hat Y}$ and $\pi_{\hat X\hat Y}$ above; and provided both
1282: $p_{\hat X\hat Y}(\hat x\hat y)\neq 0$ and
1283: $\pi_{\hat X\hat Y}(\hat x\hat y)\neq 0$.  We also note that under the
1284: assumption that
1285: $\big|\big|p_{XY\hat X\hat Y}-\pi_{XY\hat X\hat Y}\big|\big|_1<2\epsilon$,
1286: there exists a value $\hat\epsilon$ such that, for all
1287: $0<\epsilon<\hat\epsilon$, it is not possible to have a pair
1288: $(\hat x_0\hat y_0)$ such that $p_{\hat X\hat Y}(\hat x_0\hat y_0)>0$
1289: but $\pi_{\hat X\hat Y}(\hat x_0\hat y_0)=0$, or vice versa.  This is
1290: because $\pi_{\hat X\hat Y}(\hat x_0\hat y_0)=0$ means that for all $xy$,
1291: $\pi_{XY\hat X\hat Y}(xy\hat x_0\hat y_0)=0$.  But if
1292: $p_{\hat X\hat Y}(\hat x_0\hat y_0)>0$, this means there exists at
1293: least one $x_0y_0$ such that $p_{XY\hat X\hat Y}(x_0y_0\hat x_0\hat y_0)>0$,
1294: and as a result,
1295: $\big|\big|p_{XY\hat X\hat Y}-\pi_{XY\hat X\hat Y}\big|\big|_1\geq
1296: p_{XY\hat X\hat Y}(x_0y_0\hat x_0\hat y_0)$; thus, setting
1297: $\hat\epsilon\triangleq p_{XY\hat X\hat Y}(x_0y_0\hat x_0\hat y_0)$,
1298: we get the sought contradiction.  Thus, for all $\epsilon$ small enough,
1299: the bound on the conditionals holds as well, and so we have
1300: from~\cite[Thm.\ 16.3.2]{CoverT:91} that
1301: \begin{equation}
1302:    \Big|H\big(XY\big|\hat X=\hat x,\hat Y=\hat y\big)[p]
1303:         -H\big(XY\big|\hat X=\hat x,\hat Y=\hat y\big)[\pi]\Big|
1304:    \;\;<\;\;
1305:    -\epsilon_1\log\Big(\mbox{$\frac{\mbox{\normalsize $\epsilon_1$}}{|\mathcal{X}||\mathcal{Y}|
1306:    |\hat{\mathcal{X}}||\hat{\mathcal{Y}}|}$}\Big)
1307:    \;\;\triangleq\;\; \epsilon_2,
1308:   \label{eq:l1-bound-cond-entropy}
1309: \end{equation}
1310: and so,
1311: \begin{eqnarray*}
1312: \lefteqn{\Big|H\big(XY\big|\hat X\hat Y\big)[p]
1313:               -H\big(XY\big|\hat X\hat Y\big)[\pi]\Big|} \\
1314:   & \leq & \sum_{\hat x\hat y}
1315:            \Big|p_{\hat X\hat Y}(\hat x\hat y)
1316:                 H\big(XY\big|\hat X=\hat x,\hat Y=\hat y\big)[p]
1317:                 -\pi_{\hat X\hat Y}(\hat x\hat y)
1318:                 H\big(XY\big|\hat X=\hat x,\hat Y=\hat y\big)[\pi]\Big| \\
1319:   & \stackrel{(a)}{\leq} &
1320:     \big|\hat{\mathcal{X}}\big|\cdot\big|\hat{\mathcal{Y}}\big|\cdot
1321:     \Big|p_{\hat X\hat Y}(\hat x^*\hat y^*)
1322:           H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[p]
1323:          -\pi_{\hat X\hat Y}(\hat x^*\hat y^*)
1324:           H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[\pi]\Big| \\
1325:   & \stackrel{(b)}{\leq} &
1326:     \big|\hat{\mathcal{X}}\big|\cdot\big|\hat{\mathcal{Y}}\big|\cdot
1327:     \Big|\pi_{\hat X\hat Y}(\hat x^*\hat y^*)
1328:          H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[p]
1329:          +2\epsilon H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[p]
1330:          \\&&\mbox{\hspace{1.7cm}}
1331:          -\pi_{\hat X\hat Y}(\hat x^*\hat y^*)
1332:           H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[\pi]\Big| \\
1333:   & = &
1334:     \big|\hat{\mathcal{X}}\big|\cdot\big|\hat{\mathcal{Y}}\big|\cdot
1335:     \Big|2\epsilon H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[p]
1336:     \\&&\mbox{\hspace{1.7cm}}
1337:          +\pi_{\hat X\hat Y}(\hat x^*\hat y^*)
1338:           \Big(H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[p]
1339:                -H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[\pi]\Big)\Big|
1340:          \\
1341:   & \stackrel{(c)}{\leq} &
1342:     \big|\hat{\mathcal{X}}\big|\cdot\big|\hat{\mathcal{Y}}\big|\cdot
1343:     \Big(2\epsilon H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[p]
1344:          +p_{\hat X\hat Y}(\hat x^*\hat y^*)\epsilon_2\Big) \\
1345:   & \triangleq & \epsilon_3,
1346: \end{eqnarray*}
1347: where (a) follows from choosing $\hat x^*\hat y^*$ as the pair
1348: $\hat x\hat y\in\hat{\mathcal{X}}\times\hat{\mathcal{Y}}$ that makes
1349: the difference $\big|p_{\hat X\hat Y}(\hat x\hat y)
1350: H\big(XY\big|\hat X=\hat x,\hat Y=\hat y\big)[p]
1351: -\pi_{\hat X\hat Y}(\hat x\hat y)
1352: H\big(XY\big|\hat X=\hat x,\hat Y=\hat y\big)[\pi]\big|$ largest;
1353: (b) follows from
1354: $\big|\big|p_{\hat X\hat Y}-\pi_{\hat X\hat Y}\big|\big|_1<2\epsilon$;
1355: and (c) follows from eqn.~\eqref{eq:l1-bound-cond-entropy} above, and
1356: from the triangle inequality.
1357: 
1358: We conclude this part of the proof by noting that completely analogous
1359: arguments can be made to show that
1360: \[ \Big|H\big(X\big|\hat X\hat YY\big)[p]
1361:         -H\big(X\big|\hat X\hat YY\big)[\pi]\Big|
1362:    \;\;\leq\;\;\epsilon_4
1363:    \mbox{\hspace{1cm}and\hspace{1cm}}
1364:    \Big|H\big(Y\big|\hat X\hat YX\big)[p]
1365:         -H\big(Y\big|\hat X\hat YX\big)[\pi]\Big|
1366:    \;\;\leq\;\;\epsilon_5.
1367: \]
1368: 
1369: \centerline{---------------------}
1370: 
1371: We are now ready to prove our desired bounds.
1372: 
1373: Since for all $(i,j)$, $\tilde{\mathbf{S}}_{ij}
1374: \subseteq \tilde{\mathbf{S}}'_{ij} =
1375: T_\epsilon^n\big(XY\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)$,
1376: \[ \big|\tilde{\mathbf{S}}_{ij}\big|
1377:    \;\;\leq\;\;
1378:    2^{n(H(XY|\hat X\hat Y)[p]+\epsilon)}
1379:    \;\;\leq\;\;
1380:    2^{n(H(XY|\hat X\hat Y)[\pi]+\epsilon+\epsilon_3)};
1381: \]
1382: therefore, choosing $\ddot\epsilon\triangleq\epsilon+\epsilon_3$,
1383: the first bound specified by the lemma follows.
1384: 
1385: For the other two bounds, fix now $\mathbf{y}^n\in\mathcal{Y}^n$.
1386: Since $\mathcal{S}$ is a cover, there must exist at least one value
1387: $j_0\in\{1...2^{nR_2}\}$, such that $\mathbf{y}^n\in\mathbf{S}_{2,j_0}$.
1388: So consider any $i\in\{1...2^{nR_1}\}$, and assume $\mathbf{S}_{1,i}
1389: \cap T_\epsilon^n\big(X\big|\mathbf{y}^n\big)\neq\emptyset$; based on
1390: this assumption, pick any $\mathbf{x}^n\in\mathbf{S}_{1,i}\cap
1391: T_\epsilon^n\big(X\big|\mathbf{y}^n\big)$.  This means that
1392: $\big(\mathbf{x}^n\mathbf{y}^n\big)\in
1393: \big[\mathbf{S}_{1,i}\times\mathbf{S}_{2,j_0}\big]\cap
1394: T_\epsilon^n\big(XY\big)$, and therefore that
1395: $\big(\mathbf{x}^n\mathbf{y}^n\big)\in
1396: \big[\mathbf{S}'_{1,i}\times\mathbf{S}'_{2,j_0}\big]\cap
1397: T_\epsilon^n\big(XY\big)$, and hence from eqn.~\eqref{eq:const-std-rd}
1398: we have that $\big(\mathbf{x}^n\mathbf{y}^n\hat{\mathbf{x}}^n(ij_0)
1399: \hat{\mathbf{y}}^n(ij_0)\big)\in T_\epsilon^n\big(XY\hat X\hat Y\big)$,
1400: and therefore we conclude that
1401: \[ \mathbf{S}_{1,i}\cap T_\epsilon^n\big(X\big|\mathbf{y}^n\big)
1402:    \;\;\subseteq\;\;
1403:    T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ij_0)\hat{\mathbf{y}}^n(ij_0)
1404:                           \mathbf{y}^n).
1405: \]
1406: We also note that if $\mathbf{S}_{1,i}\cap
1407: T_\epsilon^n\big(X\big|\mathbf{y}^n\big)=\emptyset$, then the last inclusion
1408: holds trivially.  Thus,
1409: \[ \big|\mathbf{S}_{1,i}\cap T_\epsilon^n\big(X\big|\mathbf{y}^n\big)\big|
1410:    \;\;\leq\;\;
1411:    2^{n(H(X|\hat X\hat YY)[p]+\epsilon)}
1412:    \;\;\leq\;\;
1413:    2^{n(H(X|\hat X\hat YY)[\pi]+\epsilon+\epsilon_4)},
1414: \]
1415: Therefore, choosing $\ddot\epsilon'\triangleq\epsilon+\epsilon_4$, the
1416: second bound specified by the lemma holds.  And the third (and last)
1417: bound follows from an argument identical to this last one.  So the lemma
1418: is proved.
1419: \tend\bigskip
1420: 
1421: 
1422: \section{Proof of Theorem~\ref{thm:main}}
1423: \label{sec:main-proof}
1424: 
1425: Consider any $\big(2^{nR_1},2^{nR_2},n,\epsilon,D_1,D_2\big)$ distributed
1426: rate-distortion code, represented by a cover $\mathcal{S}$.  Then,
1427: \begin{eqnarray*}
1428: \lefteqn{n(R_1+R_2) \;\; \geq \;\; H\big(f_1(X^n)f_2(Y^n)\big)} \\
1429:   & = & H\big(f_1(X^n)f_2(Y^n)\big)
1430:         - H\big(f_1(X^n)f_2(Y^n)\big|X^nY^n\big) \\
1431:   & = & I\big(X^nY^n\wedge f_1(X^n)f_2(Y^n)\big) \\
1432:   & = & H\big(X^nY^n\big)
1433:         - H\big(X^nY^n\big|f_1(X^n)f_2(Y^n)\big) \\
1434:   & = & nH\big(XY\big)
1435:         - \sum_{1\leq i\leq 2^{nR_1},1\leq j\leq 2^{nR_2}}
1436:           P\big(f_1(X^n)=i,f_2(Y^n)=j\big)
1437:           H\big(X^nY^n\big|f_1(X^n)=i,f_2(Y^n)=j\big) \\
1438:   & \geq & nH\big(XY\big) -
1439:         \Big[ \max_{1\leq i\leq 2^{nR_1},1\leq j\leq 2^{nR_2}}
1440:                H\big(X^nY^n\big|f_1(X^n)=i,f_2(Y^n)=j\big)
1441:         \Big] \\&&\mbox{\hspace{2.06cm}}
1442:         \Big[ \sum_{1\leq i\leq 2^{nR_1},1\leq j\leq 2^{nR_2}}
1443:                P\big(f_1(X^n)=i,f_2(Y^n)=j\big)
1444:         \Big] \\
1445:   & = & nH\big(XY\big)
1446:         - \max_{1\leq i\leq 2^{nR_1},1\leq j\leq 2^{nR_2}}
1447:           H\big(X^nY^n\big|f_1(X^n)=i,f_2(Y^n)=j\big) \\
1448:   & \stackrel{(a)}{\geq} & nH\big(XY\big)
1449:         - \Big[\max_{1\leq i\leq 2^{nR_1},1\leq j\leq 2^{nR_2}}
1450:           \log\big|\tilde{\mathbf{S}}_{ij}\big|\Big]-n\epsilon_1 \\
1451:   & \stackrel{(b)}{\geq} &
1452:            nH\big(XY\big) - nH\big(XY\big|\hat X\hat Y\big)[\pi]
1453:                           - n\ddot\epsilon - n\epsilon_1 \\
1454:   & = & nI\big(XY\wedge \hat X\hat Y\big)[\pi] - n\ddot\epsilon - n\epsilon_1,
1455: \end{eqnarray*}
1456: where (a) follows from splitting outcomes of $X^nY^n$ into typical and
1457: non-typical ones, and from bounding the entropy of the typical ones with
1458: a uniform distribution; and (b) follows from Lemma~\ref{lemma:bound-size},
1459: for some $\pi\in\mathbb{P}_{\mbox{\tiny LB}}$.
1460: 
1461: For the individual rates, we have the following chain of inequalities:
1462: \begin{eqnarray*}
1463: nR_1 & \geq & H\big(f_1(X^n)\big) \\
1464:   & \geq & H\big(f_1(X^n)\big|Y^n\big) \\
1465:   & = & H\big(f_1(X^n)\big|Y^n\big)-H\big(f_1(X^n)\big|X^nY^n\big) \\
1466:   & = & I\big(X^n\wedge f_1(X^n)\big|Y^n\big) \\
1467:   & = & H\big(X^n\big|Y^n\big)-H\big(X^n\big|f_1(X^n)Y^n\big) \\
1468:   & = & nH\big(X\big|Y\big)-H\big(X^n\big|f_1(X^n)Y^n\big) \\
1469:   & = & nH\big(X\big|Y\big)
1470:         -\sum_{\mathbf{y}^n\in\mathcal{Y}^n}\sum_{i=1}^{2^{nR_1}}
1471:          P\big(f_1(X^n)=i,Y^n=\mathbf{y}^n\big)
1472:          H\big(X^n\big|f_1(X^n)=i,Y^n=\mathbf{y}^n\big) \\
1473:   & \geq & nH\big(X\big|Y\big)
1474:         - \Big[ \max_{i=1...2^{nR_1},\mathbf{y}^n\in\mathcal{Y}^n}
1475:                 H\big(X^n\big|f_1(X^n)=i,Y^n=\mathbf{y}^n\big) \Big]
1476:           \\&&\mbox{\hspace{2.18cm}}
1477:           \Big[ \sum_{\mathbf{y}^n\in\mathcal{Y}^n}\sum_{i=1}^{2^{nR_1}}
1478:                 P\big(f_1(X^n)=i,Y^n=\mathbf{y}^n\big) \Big] \\
1479:   & = & nH\big(X\big|Y\big)
1480:         - \max_{i=1...2^{nR_1},\mathbf{y}^n\in\mathcal{Y}^n}
1481:           H\big(X^n\big|f_1(X^n)=i,Y^n=\mathbf{y}^n\big) \\
1482:   & \stackrel{(a)}{\geq} & nH\big(X\big|Y\big)
1483:         - \Big[\max_{i=1...2^{nR_1},\mathbf{y}^n\in\mathcal{Y}^n}
1484:           \log_2\big|\mathbf{S}_{1,i}\cap
1485:            T_\epsilon^n\big(X\big|\mathbf{y}^n\big)\big|\Big]-n\epsilon_1 \\
1486:   & \stackrel{(b)}{\geq} & nH\big(X\big|Y\big)
1487:            - nH\big(X\big|\hat X\hat YY\big)[\pi]
1488:            - n\ddot\epsilon' - n\epsilon_1 \\
1489:   & = & nI\big(X\wedge \hat X\hat Y\big|Y\big)[\pi]
1490:            - n\ddot\epsilon' - n\epsilon_1,
1491: \end{eqnarray*}
1492: where (a) follows from splitting the outcomes of $X^n$ into those
1493: that are jointly typical with the given sequence $\mathbf{y}^n$ and
1494: those that are not, and from bounding the entropy of the typical
1495: ones with a uniform distribution; and (b) follows from
1496: Lemma~\ref{lemma:bound-size}.  An identical argument shows that
1497: $nR_2\geq nI\big(Y\wedge\hat X\hat Y\big|X\big)[\pi]-n\ddot\epsilon''
1498: -n\epsilon_1$.  And since these conditions must hold for all
1499: $\epsilon>0$, the theorem follows.
1500: \tend
1501: 
1502: 
1503: \section{Discussion}
1504: \label{sec:discussion}
1505: 
1506: We conclude the first part of this paper with some discussion on
1507: the results proved so far.
1508: 
1509: \subsection{Finite Parameterization of $\mathcal{R}^o(D_1,D_2)$}
1510: 
1511: The class of distributions used to define the Berger-Tung inner bound
1512: is given by:
1513: \[ \mathbb{P}_{\mbox{\tiny BT}}
1514:    \;\;\triangleq\;\;
1515:    \left\{p_{XYUV}\left|\begin{array}{rl}
1516:                         \bullet & p(xy)=\sum_{uv}p_{XYUV}(xyuv) \\
1517:                         \bullet & U-X-Y-V\textrm{ is a Markov chain} \\
1518:                         \bullet & \expect{d_1\big(X,\gamma_1(U,V)\big)}\leq D_1
1519:                                   \textrm{ and }
1520:                                   \expect{d_2\big(Y,\gamma_2(U,V)\big)}\leq D_2
1521:                         \end{array}\right\}\right.,
1522: \]
1523: for fixed distortions $(D_1,D_2)$, source $p(xy)$, and some functions
1524: $\gamma_1:\mathcal{U}\times\mathcal{V}\to\hat{\mathcal{X}}$ and
1525: $\gamma_2:\mathcal{U}\times\mathcal{V}\to\hat{\mathcal{Y}}$.
1526: To make a direct comparison
1527: with $\mathbb{P}_{\mbox{\tiny BT}}$ easier, we rewrite
1528: $\mathbb{P}_{\mbox{\tiny LB}}$ in terms of two
1529: variables $U$ and $V$ as follows:
1530: \begin{itemize}
1531: \item Set $\mathcal{U}\triangleq\hat{\mathcal{X}}$ and
1532:   $\mathcal{V}\triangleq\hat{\mathcal{V}}$.
1533: \item For any $p_{XY\hat X\hat Y}\in\mathbb{P}_{\mbox{\tiny LB}}$,
1534:   set $p_{XYUV}(xyuv)\triangleq p_{XY\hat X\hat Y}(xy\hat x\hat y)$.
1535: \end{itemize}
1536: Then, it is clear that $\mathbb{P}'_{\mbox{\tiny LB}}$, defined by
1537: \[ \mathbb{P}'_{\mbox{\tiny LB}}
1538:    \;\;\triangleq\;\;
1539:    \left\{p_{XYUV}\left|\begin{array}{rl}
1540:                         \bullet & p(xy)=\sum_{uv}p_{XYUV}(xyuv) \\
1541:                         \bullet & X-UV-Y\textrm{ is a Markov chain} \\
1542:                         \bullet & \expect{d_1\big(X,\gamma_1(U,V)\big)}\leq D_1
1543:                                   \textrm{ and }
1544:                                   \expect{d_2\big(Y,\gamma_2(U,V)\big)}\leq D_2
1545:                         \end{array}\right\}\right.,
1546: \]
1547: again for fixed distortions $(D_1,D_2)$, source $p(xy)$, and some
1548: functions $\gamma_1:\mathcal{U}\times\mathcal{V}\to\hat{\mathcal{X}}$
1549: and $\gamma_2:\mathcal{U}\times\mathcal{V}\to\hat{\mathcal{Y}}$, is
1550: just a relabeling of $\mathbb{P}_{\mbox{\tiny LB}}$.
1551: 
1552: In terms of these sets, we can state the following bounds on
1553: $\mathcal{R}^*(D_1,D_2)$:
1554: \begin{equation}
1555:    \overline{\bigcup_{p\in\mathbb{P}_{\mbox{\tiny BT}}}\mathcal{R}(D_1,D_2,p)}
1556:    \;\;\subseteq\;\;
1557:    \mathcal{R}^*(D_1,D_2)
1558:    \;\;\subseteq\;\;
1559:    \overline{\bigcup_{p\in\mathbb{P}'_{\mbox{\tiny LB}}}\mathcal{R}(D_1,D_2,p)}.
1560:   \label{eq:region-bounds}
1561: \end{equation}
1562: $\mathcal{R}^*(D_1,D_2)$ is not a characterization of the region of
1563: achievable rates that we would normally consider satisfactory, in that
1564: it is not ``computable,'' in the sense of~\cite[pg.\ 259]{CsiszarK:81}.
1565: Yet with eqn.~\eqref{eq:region-bounds}, we have managed to ``sandwich''
1566: the uncomputable $\mathcal{R}^*(D_1,D_2)$ region in between two
1567: other regions, both of which are computable:
1568: \begin{itemize}
1569: \item in $\mathbb{P}'_{\mbox{\tiny LB}}$, $U$ and $V$ are taken
1570:   over finite alphabets ($\mathcal{U}=\hat{\mathcal{X}}$ and
1571:   $\mathcal{V}=\hat{\mathcal{Y}}$);
1572: \item and in $\mathbb{P}_{\mbox{\tiny BT}}$, although we have
1573:   not been able to find anywhere in the literature a proof that
1574:   the cardinality of $U$ and $V$ must be finite, presumably a
1575:   direct application of the method of Ahlswede and K\"orner should
1576:   produce the desired bounds~\cite{AhlswedeK:75, Salehi:78}.
1577: \end{itemize}
1578: This is of interest because, as far as we can tell, none of the outer
1579: bounds we have found in the literature are computable.
1580: 
1581: \subsection{Relationship to the Berger-Tung Outer Bound}
1582: 
1583: One simple sufficient condition (which unfortunately does not hold)
1584: for proving the inclusions in eqn.~\eqref{eq:region-bounds} to be
1585: in fact equalities would have been to show that
1586: $\mathbb{P}'_{\mbox{\tiny LB}}\subseteq\mathbb{P}_{\mbox{\tiny BT}}$.
1587: However, a direct comparison among these two sets is still revealing.
1588: Consider any distribution $p$ that satisfies the constraints of both
1589: sets (i.e., $p\in\mathbb{P}_{\mbox{\tiny LB}}\cap
1590: \mathbb{P}_{\mbox{\tiny BT}}$), and elements $xyuv$ for which
1591: $p(xyuv)\neq 0$.  Then, this $p$ admits two different factorizations:
1592: \[\begin{array}{crcl}
1593:   & p(uv)p(x|uv)p(y|uv) & = & p(xy)p(u|x)p(v|y) \\
1594: \Leftrightarrow & p(uv)\frac{p(uv|x)p(x)}{p(uv)}\frac{p(uv|y)p(y)}{p(uv)} 
1595:                         & = & p(xy)p(u|x)p(v|y) \\
1596: \Leftrightarrow & p(uv|x)p(x)p(uv|y)p(y) & = & p(xy)p(u|x)p(v|y)p(uv) \\
1597: \Leftrightarrow & p(u|x)p(v|x)p(x)p(u|y)p(v|y)p(y)
1598:                         & = & p(xy)p(u|x)p(v|y)p(uv) \\
1599: \Leftrightarrow & p(v|x)p(x)p(u|y)p(y) & = & p(xy)p(uv) \\
1600: \Leftrightarrow & p(xv)p(yu) & = & p(xy)p(uv).
1601: \end{array}\]
1602: Clearly, any distribution in this intersection must make all
1603: variables pairwise independent: integrate any two of them, the
1604: other two can be expressed as the product of their marginals.
1605: 
1606: We find this observation interesting because it provides clear
1607: evidence that our lower bound is very different in nature from the
1608: Berger-Tung outer bound~\cite{Berger:78, Tung:PhD}.  In that bound,
1609: the set of distributions in the outer bound (all Markov chains of
1610: the form $U-X-Y$ and $X-Y-V$) strictly contains
1611: $\mathbb{P}_{\mbox{\tiny BT}}$; that means, there is a subset of
1612: the distributions in the outer bound that generates all rates we
1613: know to be achievable.  In our bound, since
1614: $\mathbb{P}_{\mbox{\tiny LB}}\cap\mathbb{P}_{\mbox{\tiny BT}}$
1615: is a degenerate set, {\em none} of the distributions
1616: in $p\in\mathbb{P}_{\mbox{\tiny LB}}$ can be used to define a code
1617: construction based on known methods,\footnote{Except of course for
1618: trivial cases, such as when the two sources $X$ and $Y$ are independent,
1619: and the distortion is maximum.}
1620: such as the ``quantize-then-bin'' strategy used in the proof of
1621: the Berger-Tung inner bound.
1622: 
1623: \subsection{Computation of the Outer Bound}
1624: 
1625: The finite parameterization of our outer bound is an important
1626: contribution in itself we believe, given the fact that the Berger-Tung
1627: outer bound is not computable.\footnote{And neither is the more modern
1628: outer bound of Wagner and Anantharam~\cite{Wagner:PhD, WagnerA:05},
1629: also mentioned in the introduction.}  This is of interest in part
1630: because, at least in principle, this finite parameterization renders
1631: the problem amenable to analysis using computational methods.  Finding
1632: an efficient algorithm for computing solutions to the optimization
1633: problem defined by Theorem~\ref{thm:main}, similar in spirit to the
1634: Blahut-Arimoto algorithm for the numerical evaluation of channel
1635: capacity and rate-distortion functions~\cite{Arimoto:72, Blahut:72},
1636: certainly is an interesting challenge in its own right.
1637: 
1638: More fundamentally though, we believe the computability of our
1639: bound holds the key to complete a proof of the optimality of the
1640: Berger-Tung inner bound for the problem setup of Fig.~\ref{fig:setup}:
1641: \begin{itemize}
1642: \item Computational methods are of interest not only because they
1643:   lead to answers that are ``useful in practice;'' discovering
1644:   efficient algorithms invariably requires the uncovering of structure
1645:   in the problem.  A good example in our field: the characterization
1646:   by Chiang and Boyd of the Lagrange duals of channel capacity and
1647:   rate-distortion as convex geometric programs~\cite{ChiangB:04}.
1648: \item Last but not least, an efficient algorithm to compute the
1649:   sandwich terms in eqn.~\eqref{eq:region-bounds} provides a fallback
1650:   strategy.  If all else fails, at least by means of numerical methods
1651:   we can check whether, in concrete instances of the problem, the
1652:   lower and upper bounds coincide or not.
1653: \end{itemize}
1654: The achievability of the set of rates defined by Theorem~\ref{thm:main},
1655: and the effective computation of the bounds of eqn.~\eqref{eq:region-bounds},
1656: are the main topics considered in Part II.
1657: 
1658: 
1659: \bigskip\noindent{\em Acknowledgements}--In the final version.
1660: 
1661: 
1662: %\pagebreak
1663: %\bibliographystyle{plain}
1664: %\bibliography{library}
1665: \begin{thebibliography}{10}
1666: 
1667: \bibitem{AhlswedeK:75}
1668: R.~Ahlswede and J.~K{\"o}rner.
1669: \newblock {Source Coding with Side Information and a Converse for Degraded
1670:   Broadcast Channels}.
1671: \newblock {\em IEEE Trans. Inform. Theory}, IT-21(6):629--637, 1975.
1672: 
1673: \bibitem{Arimoto:72}
1674: S.~Arimoto.
1675: \newblock {An Algorithm for Computing the Capacity of Arbitrary Discrete
1676:   Memoryless Channels}.
1677: \newblock {\em IEEE Trans. Inform. Theory}, IT-18(1):14--20, 1972.
1678: 
1679: \bibitem{BarrosS:06}
1680: J.~Barros and S.~D. Servetto.
1681: \newblock {Network Information Flow with Correlated Sources}.
1682: \newblock {\em IEEE Trans. Inform. Theory}, 52(1):155--170, 2006.
1683: 
1684: \bibitem{Berger:78}
1685: T.~Berger.
1686: \newblock {\em The Information Theory Approach to Communications (G. Longo,
1687:   ed.)}, chapter Multiterminal Source Coding.
1688: \newblock Springer-Verlag, 1978.
1689: 
1690: \bibitem{BergerHOTW:79}
1691: T.~Berger, K.~B. Housewright, J.~K. Omura, S.~Tung, and J.~Wolfowitz.
1692: \newblock {An Upper Bound on the Rate Distortion Function for Source Coding
1693:   with Partial Side Information at the Decoder}.
1694: \newblock {\em IEEE Trans. Inform. Theory}, 25(6):664--666, 1979.
1695: 
1696: \bibitem{BergerS:07}
1697: T.~Berger and S.~D. Servetto.
1698: \newblock {Multiterminal Source Coding -- 30 Years Later}.
1699: \newblock In preparation, for Foundations and Trends in Communications and
1700:   Information Theory.
1701: 
1702: \bibitem{BergerY:89}
1703: T.~Berger and R.~W. Yeung.
1704: \newblock {Multiterminal Source Encoding with One Distortion Criterion}.
1705: \newblock {\em IEEE Trans. Inform. Theory}, 35(2):228--236, 1989.
1706: 
1707: \bibitem{BergerZV:96}
1708: T.~Berger, Z.~Zhang, and H.~Viswanathan.
1709: \newblock {The CEO Problem}.
1710: \newblock {\em IEEE Trans. Inform. Theory}, 42(3):887--902, 1996.
1711: 
1712: \bibitem{Blahut:72}
1713: R.~E. Blahut.
1714: \newblock {Computation of Channel Capacity and Rate-Distortion Functions}.
1715: \newblock {\em IEEE Trans. Inform. Theory}, IT-18(4):460--473, 1972.
1716: 
1717: \bibitem{ChiangB:04}
1718: M.~Chiang and S.~Boyd.
1719: \newblock {Geometric Programming Duals of Channel Capacity and Rate
1720:   Distortion}.
1721: \newblock {\em IEEE Trans. Inform. Theory}, 50(2):245--258, 2004.
1722: 
1723: \bibitem{Cover:75b}
1724: T.~M. Cover.
1725: \newblock {A Proof of the Data Compression Theorem of Slepian and Wolf for
1726:   Ergodic Sources}.
1727: \newblock {\em IEEE Trans. Inform. Theory}, IT-21(2):226--228, 1975.
1728: 
1729: \bibitem{CoverT:91}
1730: T.~M. Cover and J.~Thomas.
1731: \newblock {\em {Elements of Information Theory}}.
1732: \newblock John Wiley and Sons, Inc., 1991.
1733: 
1734: \bibitem{CsiszarK:80}
1735: I.~Csisz\'ar and J.\ K{\"o}rner.
1736: \newblock {Towards a General Theory of Source Networks}.
1737: \newblock {\em IEEE Trans. Inform. Theory}, 26(2):155--166, 1980.
1738: 
1739: \bibitem{CsiszarK:81}
1740: I.~Csisz\'ar and J.~K{\"o}rner.
1741: \newblock {\em {Information Theory: Coding Theorems for Discrete Memoryless
1742:   Systems}}.
1743: \newblock Acad\'emiai Kiad\'o, Budapest, 1981.
1744: 
1745: \bibitem{DobrushinT:62}
1746: R.~L. Dobrushin and B.~S. Tsybakov.
1747: \newblock {Information Transmission with Additional Noise}.
1748: \newblock {\em IEEE Trans. Inform. Theory}, 8(5):293--304, 1962.
1749: 
1750: \bibitem{Salehi:78}
1751: M.~Salehi.
1752: \newblock {Cardinality Bounds on Auxiliary Variables in Multiple-User Theory
1753:   via the Method of Ahlswede and K{\"o}rner}.
1754: \newblock Technical Report~33, Statistics Department, Stanford University,
1755:   August 1978.
1756: 
1757: \bibitem{Shannon:59}
1758: C.~E. Shannon.
1759: \newblock {Coding Theorems for a Discrete Source with a Fidelity Criterion}.
1760: \newblock {\em IRE Nat. Conv. Rec.}, 4:142--163, 1959.
1761: 
1762: \bibitem{SlepianW:73b}
1763: D.~Slepian and J.~K. Wolf.
1764: \newblock {Noiseless Coding of Correlated Information Sources}.
1765: \newblock {\em IEEE Trans. Inform. Theory}, IT-19(4):471--480, 1973.
1766: 
1767: \bibitem{Tung:PhD}
1768: S.~Y. Tung.
1769: \newblock {\em {Multiterminal Source Coding}}.
1770: \newblock PhD thesis, Cornell University, 1978.
1771: 
1772: \bibitem{Wagner:PhD}
1773: A.~B. Wagner.
1774: \newblock {\em {Methods of Offine Distributed Detection: Interacting Particle
1775:   Models and Information-Theoretic Limits}}.
1776: \newblock PhD thesis, University of California, Berkeley, 2005.
1777: 
1778: \bibitem{WagnerA:05}
1779: A.~B. Wagner and V.~Anantharam.
1780: \newblock {An Improved Outer Bound for the Multiterminal Source Coding
1781:   Problem}.
1782: \newblock In {\em Proc. IEEE Int. Symp. Inform. Theory (ISIT)}, Adelaide,
1783:   Australia, 2005.
1784: \newblock Extended version submitted to the IEEE Transactions on Information
1785:   Theory. Available from \href{http://arxiv.org/abs/cs.IT/0511103/} {{\tt
1786:   http://arxiv.org/abs/cs.IT/0511103/}}.
1787: 
1788: \bibitem{Wyner:75}
1789: A.~D. Wyner.
1790: \newblock {On Source Coding with Side Information at the Decoder}.
1791: \newblock {\em IEEE Trans. Inform. Theory}, IT-21(3):294--300, 1975.
1792: 
1793: \bibitem{WynerZ:76}
1794: A.~D. Wyner and J.~Ziv.
1795: \newblock {The Rate-Distortion Function for Source Coding with Side Information
1796:   at the Decoder}.
1797: \newblock {\em IEEE Trans. Inform. Theory}, IT-22(1):1--10, 1976.
1798: 
1799: \bibitem{Yeung:PhD}
1800: R.~W. Yeung.
1801: \newblock {\em {Some Results on Multiterminal Source Coding}}.
1802: \newblock PhD thesis, Cornell University, 1988.
1803: 
1804: \bibitem{Yeung:01}
1805: R.~W. Yeung.
1806: \newblock {\em {A First Course in Information Theory}}.
1807: \newblock Kluwer Academic Publishers, 2001.
1808: 
1809: \end{thebibliography}
1810: 
1811: 
1812: 
1813: \end{document}
1814: 
1815: