0604:cs0604005/pp.tex

1:

2: \documentclass[11pt,onecolumn,dvips,draftcls]{IEEEtran}

3: % \documentclass[10pt,twocolumn,dvips,final]{IEEEtran}

4: \usepackage{psfig, graphics, amsfonts, amsmath, color, amssymb, amsxtra, times}

5: \definecolor{gray}{cmyk}{.2,0.2,.3,.1}

6: \definecolor{dred}{cmyk}{0,0.9,0.4,0.3}

7: \definecolor{dblue}{rgb}{0,0,0.5}

8: \definecolor{dgreen}{rgb}{0,0.3,0}

9: \definecolor{dgray}{rgb}{0.3,0.3,0}

10: \usepackage[breaklinks=true, colorlinks=true, linkcolor=black, urlcolor=dblue,

11:   citecolor=black, pdfpagemode=None, pdfstartview=FitH]{hyperref}

12:

13: \DeclareOldFontCommand{\rm}{\normalfont\rmfamily}{\mathrm}

14: \DeclareOldFontCommand{\sf}{\normalfont\sffamily}{\mathsf}

15: \DeclareOldFontCommand{\tt}{\normalfont\ttfamily}{\mathtt}

16: \DeclareOldFontCommand{\bf}{\normalfont\bfseries}{\mathbf}

17: \DeclareOldFontCommand{\it}{\normalfont\itshape}{\mathit}

18: \DeclareOldFontCommand{\sl}{\normalfont\slshape}{\@nomath\sl}

19: \DeclareOldFontCommand{\sc}{\normalfont\scshape}{\@nomath\sc}

20:

21: \newtheorem{theorem}{Theorem}

22: \newtheorem{proposition}{Proposition}

23: \newtheorem{lemma}{Lemma}

24: \setcounter{tocdepth}{2}

25: \setlength{\topmargin}{-15mm}

26: \setlength{\textwidth}{17cm}

27: \setlength{\textheight}{23cm}

28: \setlength{\oddsidemargin}{-2mm}

29:

30: \newcommand{\rend}{\hfill$\square$}

31: \newcommand{\tend}{\hfill$\blacksquare$}

32: \newcommand{\epsfig}{\psfig}

33: \newcommand{\fig}[1]{{Fig.~\ref{#1}}}

34: \newcommand{\eq}[1]{\eqref{#1}}

35: \newcommand{\expect}[1]{\ensuremath{\operatorname{E}\left[#1\right]}}

36: \newcommand{\secref}[1]{Section~\ref{#1}}

37: \newcommand{\ie}{i.e.~}

38: \newcommand{\eg}{e.g.~}

39: \newcommand{\theorref}[1]{{\itshape Theorem~\ref{#1}}}

40:

41:

42: \title{Multiterminal Source Coding With Two Encoders--I: A Computable

43:   Outer Bound

44:   \thanks{The author is with the School of Electrical and Computer

45:   Engineering, Cornell University, Ithaca, NY.  URL:

46:   \href{http://cn.ece.cornell.edu/}{{\tt http://cn.ece.cornell.edu/}}.

47:   Work supported by the National Science Foundation, under awards

48:   CCR-0238271 (CAREER), CCR-0330059, and ANR-0325556.}}

49: \author{Sergio D.\ Servetto}

50: \date{November 12, 2006.}

51:

52:

53: \begin{document}

54: \maketitle

55: \thispagestyle{empty}

56:

57: \begin{picture}(0,0)

58: \put(-5,210){\tt\small Submitted to the IEEE Transactions on Information

59:   Theory, April 2006;  Revised,}

60: \put(-5,200){\tt\small November 2006.}

61: \end{picture}

62: \vspace{-4mm}

63:

64: \begin{abstract}

65: \noindent\it

66: In this first part, a computable outer bound is proved for the

67: multiterminal source coding problem, for a setup with two encoders,

68: discrete memoryless sources, and bounded distortion measures.

69: \end{abstract}

70:

71: \vspace{1cm}

72: \noindent{\bf Index terms:} multiterminal source coding, distributed

73: source coding, network source coding, rate-distortion theory, rate-distortion

74: with side information, network information theory.

75:

76: \vspace{9.3cm}

77: \pagebreak

78: \setcounter{page}{1}

79:

80:

81: \section{Introduction}

82:

83: \subsection{The Problem of Multiterminal Source Coding}

84:

85: Consider two dependent sources $X$ and $Y$, with joint distribution

86: $p(xy)$.  These sources are to be encoded by two separate encoders,

87: each of which observes only one of them, and are to be decoded by a

88: single joint decoder.  $X$ is encoded at rate $R_1$ and with average

89: distortion $D_1$, and $Y$ is encoded at rate $R_2$ and with average

90: distortion $D_2$.  This setup is illustrated in Fig.~\ref{fig:setup}.

91:

92: \begin{figure}[!ht]

93: \centerline{\psfig{file=setup.eps,height=3cm,width=12cm}}

94: \caption{System setup for multiterminal source coding.}

95: \label{fig:setup}

96: \end{figure}

97:

98: In the classical {\em multiterminal source coding} problem, as

99: formulated in~\cite{Berger:78, Tung:PhD}, the goal

100: is to determine the region of all achievable rate-distortion tuples

101: $(R_1,R_2,D_1,D_2)$.  Although relatively simple to describe (a

102: formal description is given later), the multiterminal source coding

103: problem was one of the long-standing open problems in information

104: theory -- see, e.g.,~\cite[pg.\ 443]{CoverT:91}.  Furthermore,

105: besides its historical interest, this problem also comes up naturally

106: in the context of a sensor networking problem of interest to

107: us~\cite{BarrosS:06}.

108:

109: Multiterminal source coding has rich history, among which

110: fundamental contributions, in chronological order, are the

111: works of: a) Dobrushin-Tsybakhov~\cite{DobrushinT:62}, with the

112: first rate-distortion problem with a Markov chain constraint; b)

113: Slepian-Wolf~\cite{SlepianW:73b}, with the formulation and solution

114: to the first distributed source coding problem, and

115: Cover~\cite{Cover:75b}, with a simpler proof of the Slepian-Wolf

116: result, a proof method widely in use today; c)

117: Ahlswede-K\"orner~\cite{AhlswedeK:75} and Wyner~\cite{Wyner:75},

118: with the first use of an auxiliary random variable to describe

119: the rate region of a source coding problem, and with it the need

120: to introduce proof methods to bound their cardinality; d)

121: Wyner-Ziv~\cite{WynerZ:76}, with the first characterization of a

122: multiterminal rate-distortion function; e) Berger-Tung~\cite{Berger:78,

123: Tung:PhD}, with the first formulation and partial results on the

124: multiterminal source coding problem as formulated in Fig.~\ref{fig:setup};

125: and f) Berger-Yeung~\cite{BergerY:89, Yeung:PhD}, with a complete

126: solution to a more general form of the Wyner-Ziv problem.  For

127: details on these, and on {\em many} more important contributions,

128: as well as for historical information on the problem, the reader

129: is referred to~\cite{BergerS:07}.

130:

131: The setup of Fig.~\ref{fig:setup} represents what we feel was the

132: simplest yet unsolved instance of a multiterminal source coding problem.

133: The problem of Fig.~\ref{fig:setup}, and the CEO problem~\cite{BergerZV:96}

134: are, to the best of our knowledge, the last two known special cases of

135: the general entropy characterization of problem of Csisz\'ar and

136: K\"orner~\cite{CsiszarK:80} that remained unsolved.  This hierarchy

137: of problems is illustrated in Fig.~\ref{fig:hierarchy}.

138:

139: \begin{figure}[ht]

140: \centerline{\resizebox{10cm}{6cm}{\input{hierarchy.pstex_t}}}

141: \caption{A hierarchy of problems in multiterminal source coding

142:   with two encoders and one decoder: an arrow from problem X to

143:   problem Y indicates that X is a special case of Y, in the sense

144:   that a solution to Y automatically provides a solution to X.

145:   Abbreviations -- SC: two-terminal lossless source coding;

146:   RD: two-terminal rate-distortion~\cite{Shannon:59}; SW: distributed

147:   coding of dependent sources~\cite{SlepianW:73b}; AK/W: source

148:   coding with side information~\cite{AhlswedeK:75, Wyner:75};

149:   WZ: rate-distortion with side information~\cite{WynerZ:76};

150:   BY: the Berger-Yeung extension of WZ theory~\cite{BergerY:89};

151:   DT: rate-distortion with a remote source~\cite{DobrushinT:62};

152:   BHOTW: a rate-distortion formulation of the Ahlswede-K\"orner-Wyner

153:   problem~\cite{BergerHOTW:79}; CEO: the CEO problem~\cite{BergerZV:96};

154:   MTRD: the problem setup of Fig.~\ref{fig:setup}; EC: the entropy

155:   characterization problem~\cite{CsiszarK:80}.  Asterisks are used

156:   to indicate problems whose solution was previously known.}

157: \label{fig:hierarchy}

158: \end{figure}

159:

160: It should be pointed out though that the setup of Fig.~\ref{fig:setup}

161: is by no means the most general formulation of a multiterminal source

162: coding problem we could have given, there are many other ways in which

163: we could have chosen to formulate these problems: we could have chosen

164: a network with $M$ encoders and a single decoder which attempts to

165: reconstruct $L$ different functions of the sources, we could have

166: considered continuous-alphabet and/or general ergodic sources, we

167: could have considered feedback and interactive communication, we could

168: have studied how this problem relates to the network coding problem,

169: and we could have considered network

170: topologies with multiple decoders as well.  All these alternative

171: possible formulations are discussed in detail in~\cite{BergerS:07}.

172:

173: \subsection{Difficulties in Proving a Converse}

174: \label{sec:difficulties}

175:

176: Among the limited number of references mentioned above, we included

177: the Berger-Tung bounds~\cite{Berger:78, Tung:PhD}.  These bounds do

178: provide the best known descriptions of the region of achievable rates

179: for the problem setup of Fig.~\ref{fig:setup},\footnote{We note that

180: recently, a new outer bound has been proposed for a version of

181: multiterminal source coding

182: that contains the formulation of~\cite{Berger:78, Tung:PhD} considered

183: here as a special case~\cite{Wagner:PhD, WagnerA:05}.  The new

184: bound has many desirable properties: it unifies known bounds custom

185: developed for seemingly different problems, and it provides a conclusive

186: answer for a previously unsolved instance.  However, when specialized

187: to our two-encoder setup, it is unclear if the new bound provides

188: an improvement over the Berger-Tung outer bound.  So, due to the

189: simplicity of the latter, we have chosen here to focus on that one

190: instead of on the more modern form.}

191: and so we elaborate on those now.

192:

193: \medskip\begin{proposition}[Berger-Tung Bounds]

194: \label{prp:bt-bounds}

195: Fix $(D_1,D_2)$.  Let $X$ and $Y$ be two sources out of which

196: pairs of sequences $\big(X^n,Y^n\big)$ are drawn i.i.d.~$\sim p(xy)$;

197: and let $U$ and $V$ be auxiliary variables defined over alphabets

198: $\mathcal{U}$ and $\mathcal{V}$, such that there exist functions

199: $\gamma_1:\mathcal{U}\times\mathcal{V}\to\hat{\mathcal{X}}$

200: and $\gamma_2:\mathcal{U}\times\mathcal{V}\to\hat{\mathcal{Y}}$,

201: for which $\expect{d_1\big(X,\gamma_1(UV)\big)}\leq D_1$ and

202: $\expect{d_2\big(Y,\gamma_2(UV)\big)}\leq D_2$.  Consider rates

203: $(R_1,R_2)$, such that $R_1 \geq I(XY\wedge U|V)$,

204: $R_2 \geq I(XY\wedge V|U)$, and $R_1+R_2 \geq I(XY\wedge UV)$,

205: for some joint distribution $p(xyuv)$.  Now:

206: \begin{itemize}

207: \item for any $p(xyuv)$ that satisfies a Markov chain of the form

208:   $U-X-Y-V$, all rates $(R_1,R_2)$ obtained for any such

209:   $p$ are achievable;

210: \item if there exists a $p(xyuv)$ that satisfies two Markov chains of

211:   the form $U-X-Y$ and $X-Y-V$, then if we consider the union of the

212:   set of rates defined for each such $p(xyuv)$, we must have that any

213:   achievable rates are included in that union;

214: \end{itemize}

215: that is, the first condition defines an {\em inner} bound, and the second

216: an {\em outer} bound to the rate region. \rend

217: \end{proposition}\medskip

218:

219: The regions defined by these bounds, when regarded as images of maps

220: that transform probability distributions into rate pairs, have a

221: property that is a source of many difficulties: the mutual information

222: expressions that define the inner and the outer bounds are identical,

223: it is only the {\em domains} of the two maps that differ; as such,

224: comparing the resulting regions is difficult.  This difference between

225: the inner and outer bounds has been the state of affairs in multiterminal

226: source coding, since 1978.

227:

228: A close examination of these distributions suggested to us that the

229: gap might not be due to a suboptimal coding strategy used in the inner

230: bound, but instead that perhaps the outer bound allows for the inclusion

231: of dependencies that cannot be physically realized by any distributed

232: code.  Consider these distributions:

233: \begin{itemize}

234: \item For the inner bound, $p(xyuv)$ = $p(xy)p(u|x)p(v|y)$.

235: \item For the outer bound, $p(xyuv)$

236:   = $p(xy)p(u|x)p(v|y\underline{xu})$

237:   = $p(xy)p(v|y)p(u|x\underline{yv})$.

238: \end{itemize}

239: If we choose to interpret $U$ and $V$ as instantaneous descriptions

240: of encodings of $X$ and $Y$,

241: then we see that the outer bound says that the encoding

242: $V$ is allowed to contain information about $X$ {\em beyond} that

243: which can be extracted from $Y$, and likewise for $U$ and

244: $Y$.\footnote{Note: this interpretation comes from the inner bound,

245: and is only justified for {\em blocks}.  $U^n$ does represent an encoding

246: of $X^n$, but it would be incorrect to say that the variable $U$ is

247: an encoding of $X$ (and likewise for $V$ and $Y$).  These insights

248: can only be carried so far, but at this point we are only trying to

249: build some intuition, and thus it is permissible to take such liberties.}

250: Motivated by this observation, in the first part of this work we

251: set ourselves the goal of finding a new outer bound.

252:

253: \subsection{An Interpretation of Distributed Rate-Distortion Codes

254:   as Constrained Source Covers}

255:

256: In Part I of this paper we present a finitely parameterized outer

257: bound for the region of achievable rates of the multiterminal source

258: coding problem of Fig.~\ref{fig:setup}, based on what we

259: believe is an original proof technique.  Some highlights of that

260: proof method, formally developed in later sections, are provided here.

261:

262: \subsubsection{Rate-Distortion Codes $\equiv$ Source Covers}

263: \label{sec:intro-distributed-covers}

264:

265: Our proof tightens existing converses by means of identifying a

266: constraint that {\em all} codes are subject to, but that is not

267: captured by any existing outer bound.  To explain what the constraint

268: is, the easiest way to get started is by drawing an analogy to

269: classical, two-terminal rate-distortion codes.

270:

271: In the standard, two-terminal rate-distortion problem, a generic

272: code consists of the following elements:

273: \begin{itemize}

274: \item A block length $n$.

275: \item A cover $\big\{ \mathbf{S}_i \;:\; i=1...2^{nR} \big\}$ of the source

276:   $\mathcal{X}^n$.

277: \item A reconstruction sequence $\hat{\mathbf{x}}^n(i)$, associated to each

278:   cover element $\mathbf{S}_i$.

279: \end{itemize}

280: Given this description, an encoder $f:\mathcal{X}^n\to\{1...2^{nR}\}$

281: makes $f\big(\mathbf{x}^n\big)=i$ for some source sequence $\mathbf{x}^n$

282: and some index $i$, if

283: $\mathbf{x}^n\in \mathbf{S}_i$, with ties broken arbitrarily; a decoder

284: $g:\{1...2^{nR}\}\to

285: \hat{\mathcal{X}}^n$ simply maps $g(i)=\hat{\mathbf{x}}^n(i)$.  And we say

286: that the encoder/decoder pair $(f,g)$ satisfies a distortion constraint $D$

287: if, roughly, $P\Big(d\big(\mathbf{x}^n,g(f(\mathbf{x}^n))\big)\leq D\Big)

288: \approx 1$, for all $n$ large enough.  Such a representation is illustrated

289: in Fig.~\ref{fig:covers-classical}.

290:

291: \begin{figure}[ht]

292: \centerline{\resizebox{15cm}{4cm}{\input{covers-classical.pstex_t}}}

293: \vspace{-2mm}

294: \caption{Cover-based representation of a classical rate-distortion code.}

295: \label{fig:covers-classical}

296: \end{figure}

297:

298: In an analogous manner, we specify an arbitrary {\em distributed}

299: rate-distortion code as follows:

300: \begin{itemize}

301: \item A block length $n$.

302: \item {\em Two} covers:

303:   \begin{itemize}

304:   \item A cover $\big\{ \mathbf{S}_{1,i} \;:\; i=1...2^{nR_1}\big\}$ of the

305:     source $\mathcal{X}^n$.

306:   \item A cover $\big\{ \mathbf{S}_{2,j} \;:\; j=1...2^{nR_2}\big\}$ of the

307:     source $\mathcal{Y}^n$.

308:   \end{itemize}

309:   Indirectly, these two covers specify a cover $\mathbf{S}_{ij}\;\triangleq\;

310:     \big\{ \mathbf{S}_{1,i}\times\mathbf{S}_{2,j} : i=1...2^{nR_1},

311:     j=1...2^{nR_2}\big\}$ of the product alphabet $\mathcal{X}^n\times

312:     \mathcal{Y}^n$.

313: \item For each cover element $\mathbf{S}_{ij}$, we specify

314:   {\em two} reconstruction sequences

315:   $\big(\hat{\mathbf{x}}^n(ij),\hat{\mathbf{y}}^n(ij)\big)$.

316: \end{itemize}

317: Given this description, an encoder $f_1:\mathcal{X}^n\to\{1...2^{nR_1}\}$

318: for node 1 makes $f_1\big(\mathbf{x}^n\big)=i$ for some source sequence

319: $\mathbf{x}^n$ and some index $i$, if $\mathbf{x}^n\in\mathbf{S}_{1,i}$,

320: with ties broken arbitrarily (and similarly for an encoder $f_2$ at node 2);

321: a decoder $g:\{1...2^{nR_1}\}\times\{1...2^{nR_2}\}\to

322: \hat{\mathcal{X}}^n\times\hat{\mathcal{Y}}^n$ simply maps

323: $g(i,j)=\big(\hat{\mathbf{x}}^n(ij),\hat{\mathbf{y}}^n(ij)\big)$.

324: And we say that the distributed code $(f_1,f_2,g)$ satisfies two distortion

325: constraints $D_1$ and $D_2$ if, roughly,

326: $P\Big(d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big)\leq D_1

327: \mbox{ and }

328: d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big)\leq D_2\Big)\approx 1$,

329: for all $n$ large enough, and for

330: $\big(\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)=

331: g\big(f_1(\mathbf{x}^n),f_2(\mathbf{y}^n)\big)$.  Such a representation is

332: illustrated in Fig.~\ref{fig:covers-distributed}.

333:

334: \begin{figure}[ht]

335: \centerline{\psfig{file=covers-distributed.eps,height=6cm,width=16cm}}

336: \caption{Cover-based representation of a {\em distributed} rate-distortion

337:   code.}

338: \label{fig:covers-distributed}

339: \end{figure}

340:

341: \subsubsection{Constraints on the Structure of Source Covers}

342:

343: Our main insight is that, whereas in the classical problem

344: any arbitrary cover defines a valid rate-distortion code, in multiterminal

345: source coding this is no longer the case: {\em covers of the product source

346: $\mathcal{X}^n\times\mathcal{Y}^n$ only of the form $\mathbf{S}_{ij}

347: = \mathbf{S}_{1,i}\times\mathbf{S}_{2,j}$ can be realized by distributed

348: codes}.  The significance of this requirement is illustrated with an

349: example in Fig.~\ref{fig:binary-example}.

350:

351: \begin{figure}[!h]

352: \centerline{\psfig{file=dtyp-bin.eps,height=6cm}\hspace{1cm}

353:             \psfig{file=dtyp-ex.eps,height=6cm}}

354: \caption{An example, to illustrate the significance of the

355:   requirement that cover elements $\mathbf{S}_{ij}$ take a product form.

356:   Let $\mathcal{X}=\mathcal{Y}=\{0,1\}$, and $p(xy)=p(x)p(y|x)$

357:   specified by a $p(x)$ such that $P(X=0)=P(X=1)=\frac 1 2$,

358:   and $p(y|x)$ a binary symmetric channel with crossover probability

359:   $p_c$.  Left: for each typical $\mathbf{x}^n$, there is a ``ring'' of

360:   $\mathbf{y}^n$'s jointly typical with it, centered at $\mathbf{x}^n$

361:   and of radius $\approx np_c$.  Right: consider pairs

362:   $\big(\mathbf{x}_1^n\mathbf{y}_1^n\big)$ and $\big(\mathbf{x}_2^n

363:   \mathbf{y}_2^n\big)$ in $\mathbf{S}_{ij}$; dashed circles denote

364:   distortion balls centered at $\hat{\mathbf{x}}^n(ij)$ and

365:   $\hat{\mathbf{y}}^n(ij)$ (with the centers omitted, for clarity),

366:   and dark shaded regions denote the intersection of two rings.

367:   Suppose now that all four pairs $(\mathbf{x}_1^n\mathbf{y}_1^n)$,

368:   $(\mathbf{x}_1^n\mathbf{y}_2^n)$ $(\mathbf{x}_2^n\mathbf{y}_1^n)$,

369:   and $(\mathbf{x}_2^n\mathbf{y}_2^n)$ are in $T_\epsilon^n\big(XY\big)$.

370:   Because $\mathbf{S}_{ij}= \mathbf{S}_{1,i}\times\mathbf{S}_{2,j}$,

371:   {\em all four pairs must be in $\mathbf{S}_{ij}$ as well:} the decoder

372:   does not have enough information to discriminate among these pairs.

373:   No such constraint exists with a centralized encoder.}

374: \label{fig:binary-example}

375: \end{figure}

376:

377: From the informal argument of Fig.~\ref{fig:binary-example},

378: we see how the fact that distributed codes produce covers only of the

379: form $\mathbf{S}_{ij}=\mathbf{S}_{1,i}\times\mathbf{S}_{2,j}$ results

380: in constraints on the sets used to cover the typical set

381: $T_\epsilon^n\big(XY\big)$: there are certain groups of typical

382: sequences that cannot be broken, in the sense that either all of them

383: appear together in a cover element $\mathbf{S}_{ij}$, or none of them

384: appear.  We believe this is significant for two main reasons:

385: \begin{itemize}

386: \item If we compare to a classical rate-distortion code, this constraint

387:   is clearly not there.  Provided the distortion constraints are met, a

388:   classical code would be able to split the typical set into distortion

389:   balls, without any further constraints.

390: \item More fundamentally though, we view this constraint as a form of

391:   ``independence,'' reminiscent to us of the extra independence assumption

392:   required by the long Markov chain used in the definition of the

393:   Berger-Tung inner bound, which is not there in the definition of the

394:   outer bound, as highlighted in Section~\ref{sec:difficulties} earlier.

395: \end{itemize}

396: This latter observation is perhaps the strongest piece of evidence that

397: suggested to us that the Berger-Tung inner bound might be tight.

398:

399: \subsection{Main Contributions and Organization of the Paper}

400:

401: The main contribution presented in Part I of this paper is the

402: development of an outer bound to the region of achievable rates

403: for multiterminal source coding.  This outer bound has two salient

404: properties that distinguish it from existing bounds in the literature:

405: \begin{itemize}

406: \item it is based on explicitly modeling a constraint on the

407:   structure of codes that, as we understand things, had not been

408:   captured by any previously developed bound;

409: \item and also unlike existing bounds, it is finitely parameterized.

410: \end{itemize}

411: We believe that this outer bound coincides with the set of achievable

412: rates defined by the Berger-Tung inner bound.  This issue is thoroughly

413: explored in Part II of this paper, in the context of our study of

414: algorithmic issues involved in the effective computation of this bound.

415:

416: The rest of this paper is organized as follows.  In

417: Section~\ref{sec:preliminaries} we define our notation, and state

418: our main result.  In Section~\ref{sec:aux-lemmas} we state and prove

419: some auxiliary lemmas that greatly simplify the proof of the main

420: theorem, a proof that is fully developed in Section~\ref{sec:main-proof}.

421: The paper concludes with an extensive discussion on our main result

422: and its implications, in Section~\ref{sec:discussion}.

423:

424:

425: \section{Preliminaries}

426: \label{sec:preliminaries}

427:

428: \subsection{Definitions and Notation}

429:

430: First, a word about notation.  Random variables are denoted with

431: capital letters, e.g., $X$.  Realizations of these variables are

432: denoted with lower case letters: e.g., $X=x$ means that the random

433: variable $X$ takes on the value $x$.  Script letters are typically

434: used to denote alphabets, e.g., the random variable $X$ takes values

435: on an alphabet $\mathcal{X}$.  The alphabets of all random variables

436: considered in this work are always assumed finite.  Sets in general

437: are denoted by capital boldface symbols, e.g., $\mathbf{S}$.

438: The size of a set is denoted by $\big|\mathbf{S}\big|$.  A

439: probability mass function on $\mathcal{X}$ is denoted by $p_X(x)$,

440: or simply $p(x)$ when the variable that it applies to is clear from

441: the context.  Sequences of elements from an alphabet $\mathcal{X}$

442: are denoted by boldface symbols $\mathbf{x}^n$,

443: and its $i$-th element by $\mathbf{x}_i$; this sequence is an element

444: of the extension alphabet $\mathcal{X}^n$.  The expression

445: $\mathbf{x}_i^{j,n}$ denotes a subsequence of $\mathbf{x}^n$ consisting

446: of the elements $[\mathbf{x}_i,\mathbf{x}_{i+1},...,\mathbf{x}_j]$,

447: whenever $i\leq j$, otherwise it denotes an empty sequence; also,

448: sometimes the length $n$ of the sequence will be clear from the

449: context, and then we simply write $\mathbf{x}_i^j$ instead of

450: $\mathbf{x}_i^{j,n}$, whenever this does not cause confusion.  The

451: expression $\mathbf{x}^{-i,n}$ denotes the sequence

452: $[\mathbf{x}_1,...,\mathbf{x}_{i-1},\mathbf{x}_{i+1},...,\mathbf{x}_n]$,

453: and again, we write this as $\mathbf{x}^{-i}$ whenever $n$ is

454: clear from the context.  The same conventions are followed for

455: sequences of random variables.

456:

457: Given a boolean predicate $b(\mathbf{x})$ depending on a variable

458: $\mathbf{x}$, we write $1_{\{b(\mathbf{x})\}}$ to denote

459: the indicator function for the predicate: this is a function that

460: takes the value 1 whenever $b(\mathbf{x})$ is true, and 0 whenever

461: it is false.  Given a sequence $\mathbf{x}^n\in\mathcal{X}^n$,

462: and an element $x\in\mathcal{X}$, we denote by $N(x;\mathbf{x}^n)$

463: the type of $\mathbf{x}^n$, defined as

464: $N(x;\mathbf{x}^n)=\sum_{i=1}^n 1_{\{\mathbf{x}_i=x\}}$.  Then,

465: for any random variable $X$, any real number $\epsilon>0$, and

466: any integer $n>0$, we denote by $T_\epsilon^n(X)$ the strongly typical

467: set of $X$ with parameters $n$ and $\epsilon$, defined as

468: \[ T_\epsilon^n(X) \;\;=\;\; \Big\{ \mathbf{x}^n\in\mathcal{X}^n \;\Big|\;

469:    \forall x\in\mathcal{X}:

470:    \big|\mbox{$\frac 1 n$}N(x;\mathbf{x}^n)-p_X(x)\big|

471:    < \mbox{$\frac\epsilon{|\mathcal{X}|}$} \Big\}.

472: \]

473: In some situations, we need to compare typical sets defined for

474: the same set of variables, but induced by different distributions on

475: these variables.  To resolve this ambiguity, we denote by

476: $T_\epsilon^n\big(X\big)[p_X]$ the typical set corresponding to a

477: distribution $p_X$.  The same convention is followed when there is

478: similar ambiguity in the evaluation of entropies (denoted

479: $H\big(X\big)[p_X]$), and mutual information expressions (denoted

480: $I\big(X\wedge Y\big)[p_{XY}]$).

481:

482: Vector extensions $N(xy;\mathbf{x}^n\mathbf{y}^n)$, $T_\epsilon^n(XY)$,

483: etc., are defined by considering the same definitions as above, over a

484: suitable product alphablet $\mathcal{X}\times\mathcal{Y}$.  Similarly,

485: given two random variables $X$ and $Y$, a joint probability mass

486: function $p_{XY}(xy)$,

487: and a sequence $\mathbf{y}^n$, we denote by $T_\epsilon^n(X|\mathbf{y}^n)$

488: the conditional typical set of $X$ given $\mathbf{y}^n$, defined as

489: \[ T_\epsilon^n\big(X\big|\mathbf{y}^n\big)

490:    \;\;=\;\; \Big\{ \mathbf{x}^n\in\mathcal{X}^n \;\Big|\;

491:    \forall x\in\mathcal{X},y\in\mathcal{Y}:

492:    \big|\mbox{$\frac 1 n$}N(xy;\mathbf{x}^n\mathbf{y}^n)-p_{XY}(xy)\big|

493:    < \mbox{$\frac\epsilon{|\mathcal{X}||\mathcal{Y}|}$} \Big\}.

494: \]

495: We will also consider situations where we need to refer to the set of

496: all typical sequences which are jointly typical with at least one of a

497: group.  In that case, for a set $\mathbf{S}\subseteq\mathcal{Y}^n$, we

498: write

499: \[ T_\epsilon^n\big(X\big|\mathbf{S}\big)

500:    \;\;=\;\; \bigcup_{\mathbf{y}^n\in\mathbf{S}}

501:              T_\epsilon^n\big(X\big|\mathbf{y}^n\big).

502: \]

503:

504: Given any $\epsilon>0$, many times we require to make reference

505: to quantities which are deterministic functions of $\epsilon$, having

506: the property that as $\epsilon\to 0$, these quantities also vanish.

507: Such small quantities are denoted by $\epsilon_1$, $\epsilon_2$,

508: $\dot\epsilon$, $\ddot\epsilon$, $\epsilon'$, $\epsilon''$, etc.;

509: and the value of $\epsilon$ on which they depend is either mentioned

510: explicitly or should be clear from the context.

511:

512: Consider two random variables

513: $X$ and $Y$ with joint distribution $p(xy)$.  $T_\epsilon^n\big(X)$

514: is the usual typical set.  Sometimes we also need to consider

515: the set $S_{\epsilon,Y}^n(X)\triangleq\Big\{\mathbf{x}^n\,\Big|\,

516: T_\epsilon^n\big(Y\big|\mathbf{x}^n\big)\neq\emptyset\Big\}$.  Clearly,

517: $S_{\epsilon,Y}^n(X)\subseteq T_\epsilon^n\big(X)$.  But we also

518: know from~\cite[Ch.\ 5]{Yeung:01}, that

519: $\Big|\frac 1 n\log\big|S_{\epsilon,Y}^n(X)\big|-H(X)\Big|<\dot\epsilon$.

520: That is, although there may exist strongly typical sequences $\mathbf{x}^n$

521: for which there are no sequences $\mathbf{y}^n$ jointly typical with them,

522: these $\mathbf{x}^n$'s form a set of vanishing measure.

523:

524: Some standard operations on sets are intersection

525: ($\mathbf{A}\cap\mathbf{B}$), union ($\mathbf{A}\cup\mathbf{B}$),

526: complementation ($\mathbf{A}^c$) and difference

527: ($\mathbf{A}\backslash\mathbf{B}$).   The set of all subsets of

528: $\mathbf{S}$ is denoted by $2^{\mathbf{S}}$.  The convex closure of $\mathbf{S}$

529: is denoted by

530: $\overline{\mathbf{S}}=\bigcap\big\{\mathbf{S}'\;\big|\;\mathbf{S}\subseteq

531: \mathbf{S}'\,\wedge\,\mathbf{S}'\mbox{ is closed and convex}\big\}$.

532: Given a set $\mathbf{S}$, a cover of size $N$ of $\mathbf{S}$ is a

533: collection of sets $\mathcal{S}=\big\{\mathbf{S}_i:i=1...N\big\}$,

534: such that $\mathbf{S}\subseteq\bigcup_{i=1}^N\mathbf{S}_i$.  If a

535: cover further satisfies that $\mathbf{S}_i\cap \mathbf{S}_j=\emptyset$

536: ($1\leq i\neq j\leq N$), and that $\mathbf{S}=\bigcup_{i=1}^N

537: \mathbf{S}_i$, then we say that $\mathcal{S}$ is a {\em partition}

538: of $\mathbf{S}$.

539:

540: Consider two sets, $\mathbf{A}$ and $\mathbf{B}$, for which

541: $P\big(\mathbf{B}\big|\mathbf{A}\big)=1$: clearly,

542: $P\big(\mathbf{A}\cap\mathbf{B}\big)=P\big(\mathbf{A}\big)$,

543: and hence $\mathbf{A}\subseteq\mathbf{B}$, except perhaps for

544: a set of measure zero.  If instead we have a slightly weaker

545: condition, namely that $P\big(\mathbf{B}\big|\mathbf{A}\big)>1-\epsilon$,

546: then we say that $\mathbf{A}$ is {\em weakly included} in $\mathbf{B}$,

547: and we denote this by $\mathbf{A}\subseteq_\epsilon\mathbf{B}$.

548:

549: \subsection{Distributed Rate-Distortion Codes}

550:

551: Consider two sources $X$ and $Y$, out of which random pairs of

552: sequences $\big(X^n,Y^n\big)$ are drawn i.i.d.~$\sim p(xy)$ from two

553: finite alphabets, denoted $\mathcal{X}$ and

554: $\mathcal{Y}$, and reproduced with elements of two other alphabets

555: $\hat{\mathcal{X}}$ and $\hat{\mathcal{Y}}$.  The two sources

556: $X$ and $Y$ are processed by two separate encoders.  The

557: {\em encoders} are two functions:

558: \[ f_1:\; \mathcal{X}^n \;\;\to\;\; \big\{1,2,\dots,2^{nR_1}\big\}

559:    \mbox{\hspace{1cm}and\hspace{1cm}}

560:    f_2:\; \mathcal{Y}^n \;\;\to\;\; \big\{1,2,\dots,2^{nR_2}\big\}.

561: \]

562: These encoding functions map a block of $n$ source symbols to discrete

563: indices.  The {\em decoder} is a function

564: \[ g:\;\big\{1,2,\dots,2^{nR_1}\big\}\times\big\{1,2,\dots,2^{nR_2}\big\}

565:        \;\;\to\;\; \hat{\mathcal{X}}^n \times \hat{\mathcal{Y}}^n, \]

566: which maps a pair of indices into two blocks of reconstructed

567: source sequences.

568:

569: Two distortion

570: measures $d_1:\mathcal{X}\times\hat{\mathcal{X}}\to[0,\infty)$ and

571: $d_2:\mathcal{Y}\times\hat{\mathcal{Y}}\to[0,\infty)$ are used to

572: define reconstruction quality.  Since $\infty$ is not in their

573: range and the alphabets are finite, these distortion measures

574: are necessarily bounded, so we denote these largest values by

575: $\max\limits_{x\in\mathcal{X},\hat x\in\hat{\mathcal{X}}}

576: d_1(x,\hat x)\triangleq d_{1,\mbox{\tiny MAX}}$,

577: $\max\limits_{y\in\mathcal{Y},\hat y\in\hat{\mathcal{Y}}}

578: d_2(y,\hat y) \triangleq d_{2,\mbox{\tiny MAX}}$, and

579: $\max\big(d_{1,\mbox{\tiny MAX}},d_{2,\mbox{\tiny MAX}}\big)

580: \triangleq d_{\mbox{\tiny MAX}}<\infty$.

581: $d_1^n\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big)

582: \triangleq\frac 1 n\sum_{i=1}^n d_1\big(x_i,\hat x_i\big)$

583: and $d_2^n\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big)

584: \triangleq\frac 1 n\sum_{i=1}^n d_2\big(y_i,\hat y_i\big)$

585: denote the corresponding extensions to blocks.  Oftentimes, the

586: symbols $d_1$ and $d_2$ are used for both the single-letter

587: and the block extensions; which is the intended meaning should

588: be clear from the context.  For any distortion measure

589: $d:\mathcal{X}^n\times\hat{\mathcal{X}}^n\to[0,\infty)$, an element

590: $\hat{\mathbf{x}}^n\in\hat{\mathcal{X}}^n$ and a number $D\geq 0$,

591: a ``ball'' of radius $D$ centered at $\hat{\mathbf{x}}^n$ is the

592: set $B\big(\hat{\mathbf{x}}^n,D\big)=\big\{\mathbf{x}^n\in\mathcal{X}^n

593: \,\big|\,d\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big))<D\big\}$

594: (and similarly for a ball $B\big(\hat{\mathbf{y}}^n,D\big)$).

595: For any $D$, $D^+$ is shorthand for $D+\dot\epsilon$, for an

596: $\epsilon$ that is always clear from the context.

597:

598: Fix now encoders and decoder $(f_1,f_2,g)$ operating on blocks of length

599: $n$, and a real number $\epsilon>0$.  If we have that

600: \begin{equation}

601:    P\Big( \Big\{ \big(\mathbf{x}^n\mathbf{y}^n\big) \;\Big|\;

602:           \big(\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)

603:           = g\big(f_1(\mathbf{x}^n),f_2(\mathbf{y}^n)\big) \,\wedge\,

604:           d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big)

605:           < D_1^+ \,\wedge\,

606:           d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big)

607:           < D_2^+ \Big\}

608:     \Big) \;\;\geq\;\; 1-\dot{\epsilon},

609:   \label{eq:distortion-constraint}

610: \end{equation}

611: then we say that $(f_1,f_2,g)$ satisfies the $(\epsilon,D_1,D_2)$-distortion

612: constraint.\footnote{This form of a distortion constraint is referred

613: to as an {\it $\epsilon$-fidelity criterion} in~\cite[pg.\ 123]{CsiszarK:81}.

614: An alternative form to this ``local'' condition is given by requiring a

615: ``global'' average constraint of the form

616: $\expect{d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big)}<D_1^+$ and

617: $\expect{d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big)}<D_2^+$.  For

618: the purpose of our developments, the local form lends itself more

619: readily to analysis, and hence is the one we adopt.}

620:

621: \subsection{Achievable Rates}

622:

623: A $\big(2^{nR_1},2^{nR_2},n,\epsilon,D_1,D_2\big)$ distributed

624: rate-distortion code is defined by a block length $n$, a

625: parameter $\epsilon>0$, two encoding functions $f_1$ and $f_2$

626: with ranges of size $2^{nR_1}$ and $2^{nR_2}$, and a decoding

627: function $g$, such that $(f_1,f_2,g)$ satisfies the

628: $\big(\epsilon,D_1,D_2\big)$-distortion constraints.

629:

630: We say that the rate-distortion tuple $(R_1,R_2,D_1,D_2)$ is

631: $\epsilon$-{\em achievable} if a

632: $\big(2^{nR_1},2^{nR_2},n,\epsilon,D_1,D_2\big)$ distributed

633: code exists; for fixed parameters $\big(\epsilon,D_1,D_2\big)$,

634: we denote the set of all $\epsilon$-achievable pairs $(R_1,R_2)$

635: by $\mathcal{R}_\epsilon(D_1,D_2)$.  Then, the {\em rate region}

636: ${\cal R}^*(D_1,D_2)$ of the two sources is defined by

637: \[ \mathcal{R}^*(D_1,D_2)

638:      \;\;\triangleq\;\; \bigcap_{\epsilon>0}\,\mathcal{R}_\epsilon(D_1,D_2).

639: \]

640:

641: Now we are going to describe a different set of rates.  Define

642: $\mathbb{P}_{\mbox{\tiny LB}}$ to be the set of all probability

643: distributions $p(xy\hat x\hat y)$ over

644: $\mathcal{X}\times\mathcal{Y}\times\hat{\mathcal{X}}\times\hat{\mathcal{Y}}$,

645: such that:

646: \begin{itemize}

647: \item $p(xy\hat x\hat y)=p(\hat x\hat y)p(x|\hat x\hat y)p(y|\hat x\hat y)$

648:   (that is, $X-\hat X\hat Y-Y$ forms a Markov chain);

649: \item $p_{XY}=\sum_{\hat x\hat y}

650:   p(\hat x\hat y)p(x|\hat x\hat y)p(y|\hat x\hat y)$ ($p_{XY}$ is the source);

651: \item and $\expect{d_1\big(X,\hat X\big)}\leq D_1$ and

652:   $\expect{d_2\big(Y,\hat Y\big)}\leq D_2$.

653: \end{itemize}

654: Then, for each $p\in\mathbb{P}_{\mbox{\tiny LB}}$, define

655: \[ \mathcal{R}\big(D_1,D_2,p\big)\;\;\triangleq\;\;

656:    \left\{ (R_1,R_2)\;\left|\;\begin{array}{rcl}

657:            R_1 & \geq & I\big(X\wedge\hat X\hat Y\big|Y\big)[p] \\

658:            R_2 & \geq & I\big(Y\wedge\hat X\hat Y\big|X\big)[p] \\

659:            R_1+R_2 & \geq & I\big(XY\wedge\hat X\hat Y\big)[p]

660:            \end{array}\right.\right\},

661: \]

662: and define also $\mathcal{R}^o(D_1,D_2)\triangleq

663: \bigcup_{p\in\mathbb{P}_{\mbox{\tiny LB}}}\mathcal{R}\big(D_1,D_2,p\big)$.

664: Now we are ready to state our outer bound.

665:

666: \subsection{Statement of an Outer Bound}

667:

668: \medskip

669: \begin{center}\textcolor{gray}{\fbox{\begin{minipage}{16cm}

670: \vspace{-4mm}\textcolor{black}{\begin{theorem}

671: \label{thm:main}

672: \[ \mathcal{R}^*\big(D_1,D_2\big)\;\;\subseteq\;\;

673:    \overline{\mathcal{R}^o(D_1,D_2)}.

674: \]

675: \rend

676: \end{theorem}}\end{minipage}}}\end{center}\medskip

677:

678: The proof of this theorem is given in Section~\ref{sec:main-proof}.

679: Before that, and next in Section~\ref{sec:aux-lemmas}, we develop a

680: number of observations and auxliary results to be used in the main

681: proof.

682:

683:

684: \section{Some Useful Observations and Auxiliary Results}

685: \label{sec:aux-lemmas}

686:

687: \subsection{Distributed Rate-Distortion Codes as Constrained Source Covers}

688:

689: \subsubsection{Distributed Source Covers}

690:

691: An equivalent representation for a generic

692: $(2^{nR_1},2^{nR_2},n,\epsilon,D_1,D_2)$ code is given as follows:

693: \begin{itemize}

694: \item Two covers:

695:   $\mathcal{S}_1 = \big\{ \mathbf{S}_{1,i} : i=1...2^{nR_1} \big\}$

696:     of $\mathcal{X}^n$,

697:   and $\mathcal{S}_2 = \big\{ \mathbf{S}_{2,j} : j=1...2^{nR_2} \big\}$

698:     of $\mathcal{Y}^n$.

699:   Any code with encoders $f_1$ and $f_2$ can be represented in terms

700:   of two such covers, by considering $f_1^{-1}(i)= \mathbf{S}_{1,i}$ and

701:   $f_2^{-1}(j)=\mathbf{S}_{2,j}$.\footnote{Note that, strictly speaking,

702:   this definition is correct only when $\mathcal{S}$ is a partition.

703:   Occasionally we might abuse the notation and still refer to the code

704:   specified by a cover, with the understanding that in such cases ties

705:   (of the form of a source sequence being part of two different cover

706:   elements) are broken arbitrarily.  This should not cause any confusion.} \\

707:   (Note: these two covers define a cover $\mathcal{S}=\big(\mathcal{S}_1,

708:   \mathcal{S}_2\big)$ of $\mathcal{X}^n\times \mathcal{Y}^n$, with elements

709:   $\mathbf{S}_{ij} \;=\; \mathbf{S}_{1,i}\times \mathbf{S}_{2,j}$,

710:   for $(i,j)\in\{1...2^{nR_1}\}\times\{1...2^{nR_2}\}$.)

711: \item A pair of reconstruction sequences $\big(\hat{\mathbf{x}}^n(ij),

712:   \hat{\mathbf{y}}^n(ij)\big)=g(i,j)$ associated to each cover element

713:   $\mathbf{S}_{ij}$ of the product source, for all

714:   $(i,j)\in\{1...2^{nR_1}\}\times\{1...2^{nR_2}\}$.

715: \end{itemize}

716:

717: In general, whenever we refer to a distributed rate-distortion code,

718: we use interchangeably the earlier representation in terms of two

719: encoders and one decoder, and this representation in terms of covers.

720:

721: \subsubsection{Distributed Typical Sets}

722:

723: As highlighted in the Introduction, it turns out that covers

724: $\mathbf{S}_{ij}$ of the product source $\mathcal{X}^n\times\mathcal{Y}^n$

725: are constrained beyond the requirements imposed by the fidelity

726: criteria.  That ``extra'' structure is described by

727: Proposition~\ref{prp:distributed-typicality}.

728:

729: \medskip

730: \begin{center}\textcolor{gray}{\fbox{\begin{minipage}{16cm}

731: \vspace{-4mm}\textcolor{black}{\begin{proposition}

732: \label{prp:distributed-typicality}

733: For any cover $\mathcal S$ of $\mathcal{X}^n\times\mathcal{Y}^n$

734: defined by some $(2^{nR_1},2^{nR_2},n,\epsilon,D_1,D_2)$ distributed

735: rate-distortion code, and for any

736: $(i,j)\in\{1...2^{nR_1}\}\times\{1...2^{nR_2}\}$,

737: $\mathbf{x}^n\in\mathbf{S}_{1,i}$ and $\mathbf{y}^n\in\mathbf{S}_{2,j}$,

738: then it must be the case that

739: either $(\mathbf{x}^n\mathbf{y}^n)\in\mathbf{S}_{ij}\cap

740: T_\epsilon^n\big(XY\big)$ or $(\mathbf{x}^n\mathbf{y}^n)\not\in

741: T_\epsilon^n\big(XY\big)$.  \rend

742: \end{proposition}}\end{minipage}}}\end{center}\medskip

743:

744: {\it Proof.} This is rather straightforward.  Take any

745: $\mathbf{x}^n\in\mathbf{S}_{1,i}$ and $\mathbf{y}^n\in\mathbf{S}_{2,j}$.

746: Then:

747: \begin{itemize}

748: \item by construction,

749:   $\big(\mathbf{x}^n\mathbf{y}^n\big)\in\mathbf{S}_{ij}$;

750: \item either $\big(\mathbf{x}^n\mathbf{y}^n\big)\in

751:   T_\epsilon^n\big(XY\big)$ or

752:   $\big(\mathbf{x}^n\mathbf{y}^n\big)\not\in

753:   T_\epsilon^n\big(XY\big)$ -- a tautology;

754: \item if $\big(\mathbf{x}^n\mathbf{y}^n\big)\in

755:   T_\epsilon^n\big(XY\big)$, then $\big(\mathbf{x}^n\mathbf{y}^n\big)\in

756:   \mathbf{S}_{ij}\cap T_\epsilon^n\big(XY\big)$, and therefore

757:   the proposition is proved;

758: \item and if instead, $\big(\mathbf{x}^n\mathbf{y}^n\big)\not\in

759:   T_\epsilon^n\big(XY\big)$, then the proposition is proved too.

760:   \tend

761: \end{itemize}

762: \medskip

763:

764: Proposition~\ref{prp:distributed-typicality} formally states the

765: property of covers arising from distributed codes discussed informally

766: in the Introduction (cf.~Sec.~\ref{sec:intro-distributed-covers}): all

767: combinations of an $\mathbf{x}^n$ sequence in $\mathbf{S}_{1,i}$ and

768: a $\mathbf{y}^n$ sequence in $\mathbf{S}_{2,j}$, if they are jointly

769: typical, must appear in $\mathbf{S}_{ij}\cap T_\epsilon^n\big(XY\big)$

770: -- the decoder does not have enough information to discriminate among

771: such pairs.

772:

773: We now introduce a new definition.

774: Consider any subset $\mathbf{S}\subseteq T_\epsilon^n\big(XY\big)$

775: for which, for any $(\mathbf{x}^n,\mathbf{y}_1^n)\in\mathbf{S}$ and

776: $(\mathbf{x}_1^n,\mathbf{y}^n)\in\mathbf{S}$, we have that either

777: $(\mathbf{x}^n\mathbf{y}^n)\in\mathbf{S}$ or

778: $(\mathbf{x}^n\mathbf{y}^n)\not\in T_\epsilon^n\big(XY\big)$

779: -- that is, the property of Prop.~\ref{prp:distributed-typicality}

780: holds for $\mathbf{S}$.  In this case, we say that $\mathbf{S}$ is

781: is a {\em distributed} typical set.

782:

783: Clearly there are ``interesting'' distributed typical sets, the

784: concept is not vacuous:

785: \begin{itemize}

786: \item all sets of the form $\mathbf{S} = \{ (\mathbf{x}^n\mathbf{y}^n) \}$,

787:   with $(\mathbf{x}^n\mathbf{y}^n)\in T_\epsilon^n\big(XY\big)$,

788:   are distributed typical sets;

789: \item for any $\mathbf{S}_1\subseteq\mathcal{X}^n$ and any

790:   $\mathbf{S}_2\subseteq\mathcal{Y}^n$,

791:   $\mathbf{S}\triangleq\big[\mathbf{S}_1\!\times\!\mathbf{S}_2\big]\cap

792:   T_\epsilon^n\big(XY\big)$ is a distributed typical set.

793: \end{itemize}

794: The last example provides a natural way of systematically constructing

795: distributed typical sets.

796:

797: \subsubsection{Source Covers Made of Distributed Typical Sets}

798:

799: We show next that in multiterminal source coding, the source must

800: be covered with distributed typical sets in which each of the two

801: components of the set gets specified by a different encoder.

802:

803: Consider a length $n$ $\big(f_1,f_2,g\big)$ code, satisfying the

804: $(\epsilon,D_1,D_2)$-distortion constraint of

805: eqn.~\eqref{eq:distortion-constraint}:

806: \begin{eqnarray*}

807: \lefteqn{P\Big( \Big\{ \big(\mathbf{x}^n\mathbf{y}^n\big) \;\Big|\;

808:           \big(\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)

809:           = g\big(f_1(\mathbf{x}^n),f_2(\mathbf{y}^n)\big) \,\wedge\,

810:           d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big)

811:           < D_1^+ \,\wedge\,

812:           d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big)

813:           < D_2^+ \Big\}

814:     \Big)} \\

815:   & \stackrel{(a)}{=} &

816:           P\Big( \Big\{ \big(\mathbf{x}^n\mathbf{y}^n\big) \;\Big|\;

817:           \big(\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)

818:           = g\big(f_1(\mathbf{x}^n),f_2(\mathbf{y}^n)\big) \,\wedge\,

819:           d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big)

820:           < D_1^+ \,\wedge\,

821:           d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big)

822:           < D_2^+ \Big\}

823:           \cap\bigcup_{(i,j)}\mathbf{S}_{ij}

824:     \Big) \\

825:   & = & P\Big(

826:           \bigcup_{(i,j)}

827:           \Big\{ \big(\mathbf{x}^n\mathbf{y}^n\big) \;\Big|\;

828:           \big(\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)

829:           = g\big(f_1(\mathbf{x}^n),f_2(\mathbf{y}^n)\big) \,\wedge\,

830:           d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big)

831:           < D_1^+ \,\wedge\,

832:           d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big)

833:           < D_2^+ \Big\}

834:           \cap\mathbf{S}_{ij}

835:     \Big) \\

836:   & \stackrel{(b)}{=} & P\Big(

837:           \bigcup_{(i,j)}

838:           \Big\{ \big(\mathbf{x}^n\mathbf{y}^n\big) \;\Big|\;

839:           d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n(ij)\big)<D_1^+

840:           \,\wedge\,\mathbf{x}^n\in\mathbf{S}_{1,i}

841:           \,\wedge\, d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n(ij)\big)<D_2^+

842:           \,\wedge\,\mathbf{y}_2^n\in\mathbf{S}_{2,j} \Big\}

843:     \Big) \\

844:   & = & P\Big( \bigcup_{(i,j)}

845:           \big[\mathbf{S}_{1,i}\!\times\!\mathbf{S}_{2,j}\big]

846:           \cap

847:           \big[B\big(\hat{\mathbf{x}}^n(ij),D_1^+\big)

848:                \!\times\!B\big(\hat{\mathbf{y}}^n(ij),D_2^+\big)\big]

849:          \,\Big) \\

850:   & \stackrel{(c)}{\geq} & 1-\dot{\epsilon},

851: \end{eqnarray*}

852: where (a) follows from

853: $\big\{ \big(\mathbf{x}^n\mathbf{y}^n\big)\,\Big|\,

854: \big(\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)

855: = g\big(f_1(\mathbf{x}^n),f_2(\mathbf{y}^n)\big) \,\wedge\,

856: d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n\big) < D_1^+ \,\wedge\,

857: d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n\big) < D_2^+ \big\}

858: \;\subseteq\;\mathcal{X}^n\times\mathcal{Y}^n

859: \;\subseteq\;\bigcup_{(i,j)} \mathbf{S}_{ij}$;

860: (b) follows from $\mathbf{S}_{ij}=\mathbf{S}_{1,i}\times\mathbf{S}_{2,j}$;

861: and (c) follows from the fact

862: that the code under consideration satisfies the distortion constraint

863: of eqn.~\eqref{eq:distortion-constraint}.  We also know, from basic

864: properties of typical sets, that

865: \[ P\Big( T_\epsilon^n\big(XY\big) \Big) \;\;\geq\;\; 1-\epsilon,

866: \]

867: and so, if we define $\tilde{\mathbf{S}}_{ij}\triangleq

868: \big[\mathbf{S}_{1,i}\times\mathbf{S}_{2,j}\big]\cap

869: T_\epsilon^n\big(XY\big)$, we see that

870: \begin{eqnarray}

871: \lefteqn{P\Big( \bigcup_{(i,j)}

872:           \big[\mathbf{S}_{1,i}\!\times\!\mathbf{S}_{2,j}\big]

873:           \cap

874:           \big[B\big(\hat{\mathbf{x}}^n(ij),D_1^+\big)

875:                \!\times\!B\big(\hat{\mathbf{y}}^n(ij),D_2^+\big)\big]

876:           \cap

877:           T_\epsilon^n\big(XY\big) \,\Big)} \nonumber

878: \hspace{4cm} \\

879:   & = & P\left( \bigcup_{(i,j)}

880:                 \tilde{\mathbf{S}}_{ij} \cap

881:                 \big[B\big(\hat{\mathbf{x}}^n(ij),D_1^+\big)

882:                   \!\times\!B\big(\hat{\mathbf{y}}^n(ij),D_2^+\big)\big]

883:           \right) \nonumber \\

884:   & \geq & 1-\ddot\epsilon;

885:   \label{eq:distortion-constraint-2}

886: \end{eqnarray}

887: that is, since $\tilde{\mathbf{S}}_{ij}$ is a distributed typical set,

888: the source must be covered with the fraction of such sets contained in

889: pairs of balls centered at the reconstruction sequences; furthermore,

890: we note that each component of the distributed typical set must be

891: specified completely by each encoder.

892:

893: % These constraints on the structure of source covers are significant,

894: % and they were not captured by any previous outer bounds.  The main task

895: % ahead of us then is to make use of this newly discovered structure to

896: % prove a better outer bound.

897:

898: \subsection{The ``Reverse'' Markov Lemma}

899: \label{sec:reverse-markov-lemma}

900:

901: \subsubsection{The Standard Form}

902:

903: Lemma~\ref{lemma:markov} is the Markov lemma as stated

904: in~\cite[pg.\ 202]{Berger:78}, in our own notation.

905:

906: \medskip\begin{lemma}[Markov]

907: \label{lemma:markov}

908: Consider a Markov chain of the form $X-Z-Y$.  Then, for all $\epsilon>0$,

909: \[ \lim_{n\to\infty}

910:    P\Big( \big(X^n,\mathbf{y}^n\big)\in T_\epsilon^n\big(XY\big)

911:           \;\Big|\;

912:           \big(Z^n,\mathbf{y}^n\big)\in T_\epsilon^n\big(ZY\big)

913:     \Big) \;\;=\;\; 1,

914: \]

915: for any sequence $\mathbf{y}^n\in\mathcal{Y}^n$.

916: \rend

917: \end{lemma}\medskip

918:

919: The lemma says that for {\em every} $\mathbf{y}^n\in\mathcal{Y}^n$,

920: {\em if} the random vector

921: $\big(Z^n,\mathbf{y}^n\big)\in T_\epsilon^n\big(ZY\big)$, {\em then}

922: the random vector $\big(X^n,\mathbf{y}^n\big)\in T_\epsilon^n\big(XY\big)$,

923: with high probability.  This is not true in general: if we have two pairs

924: of sequences $\big(\mathbf{x}^n\mathbf{z}^n\big)\in T_\epsilon^n\big(XZ\big)$

925: and $\big(\mathbf{z}^n\mathbf{y}^n\big)\in T_\epsilon^n\big(ZY\big)$, it

926: is not always the case that

927: $\big(\mathbf{x}^n\mathbf{z}^n\mathbf{y}^n\big)\in T_\epsilon^n\big(XZY\big)$,

928: and therefore that

929: $\big(\mathbf{x}^n\mathbf{y}^n\big)\in T_\epsilon^n\big(XY\big)$; that

930: is, joint typicality is {\em not} a transitive relation.  However,

931: if $X-Z-Y$ forms a Markov chain, and then only in a high probability

932: sense, said transitivity property holds.

933:

934: \subsubsection{A Converse Statement}

935:

936: We are interested in a converse form of the Markov lemma.  Suppose

937: we are given an arbitrary distribution $p(xyz)$, whose typical

938: sets satisfy the constraints imposed by the Markov lemma: can we say

939: that $p$ itself must be a Markov chain?  It turns out the answer is

940: {\em almost yes} -- if some arbitrary distribution $p$ induces typical

941: sets like those of a Markov chain, then there must exist a Markov

942: chain $p'$ within $L_1$ distance $2\epsilon$ of $p$.  This statement

943: is made precise in the following lemma.

944:

945: \medskip\begin{center}\textcolor{gray}{\fbox{\begin{minipage}{16cm}

946: \vspace{-4mm}\textcolor{black}{\begin{lemma}[Reverse Markov]

947: \label{lemma:reverse-markov}

948: Fix $n$, $\epsilon>0$.  Consider any distribution

949: $p(xyz)$ for which, for some $\mathbf{z}^n$,

950: \[

951:   T_\epsilon^n\big(X\big|\mathbf{z}^n\big)[p]

952:   \times T_\epsilon^n\big(Y\big|\mathbf{z}^n\big)[p]

953:   \;\;=\;\; T_\epsilon^n\big(XY\big|\mathbf{z}^n\big)[p].

954: \]

955: Define a Markov chain $p'(xyz)=p(z)p(x|z)p(y|z)$, with the components

956: $p(z)$, $p(x|z)$ and $p(y|z)$ taken from the given $p(xyz)$.  Then,

957: $\big|\big|p-p'\big|\big|_1\,<\,2\epsilon$.

958: \rend

959: \end{lemma}}\end{minipage}}}\end{center}\medskip

960:

961: {\it Proof.}

962: Consider any $\mathbf{z}^n$ for which

963: $T_\epsilon^n\big(XY\big|\mathbf{z}^n\big)[p]\neq\emptyset$.

964: Since $p'$ is a Markov chain, from the direct form of the Markov

965: lemma we know that

966: \[

967:    T_\epsilon^n\big(X\big|\mathbf{z}^n\big)[p']

968:    \times T_\epsilon^n\big(Y\big|\mathbf{z}^n\big)[p']

969:    \;\;\subseteq_{\epsilon'}\;\;

970:    T_\epsilon^n\big(XY\big|\mathbf{z}^n\big)[p'];

971: \]

972: and clearly,

973: $\emptyset\neq

974: T_\epsilon^n\big(XY\big|\mathbf{z}^n\big)[p]

975: =

976: T_\epsilon^n\big(X\big|\mathbf{z}^n\big)[p]

977:  \times T_\epsilon^n\big(Y\big|\mathbf{z}^n\big)[p]

978: =

979: T_\epsilon^n\big(X\big|\mathbf{z}^n\big)[p']

980:  \times T_\epsilon^n\big(Y\big|\mathbf{z}^n\big)[p']$,

981: since we choose $p'$ to coincide with $p$ on the corresponding marginals,

982: and from our choice of $\mathbf{z}^n$.  So, this last inclusion can be

983: written as

984: \[

985:    T_\epsilon^n\big(X\big|\mathbf{z}^n\big)[p]

986:         \times T_\epsilon^n\big(Y\big|\mathbf{z}^n\big)[p]

987:    \;\;\subseteq_{\epsilon'}\;\;

988:    T_\epsilon^n\big(XY\big|\mathbf{z}^n\big)[p'],

989: \]

990: and therefore we see that

991: \[

992:    \emptyset \;\;\neq\;\;

993:    T_\epsilon^n\big(X\big|\mathbf{z}^n\big)[p]

994:         \times T_\epsilon^n\big(Y\big|\mathbf{z}^n\big)[p]

995:    \;\;\subseteq_{\epsilon'}\;\;

996:    T_\epsilon^n\big(XY\big|\mathbf{z}^n\big)[p]

997:     \cap T_\epsilon^n\big(XY\big|\mathbf{z}^n\big)[p'];

998: \]

999: thus, there must exist at least one triplet of sequences

1000: $\big(\mathbf{x}^n\mathbf{y}^n\mathbf{z}^n\big)$ that

1001: is jointly typical under both $p$ and $p'$.  So for these particular

1002: sequences, it follows from the definition of strong typicality that

1003: both

1004: \[ \forall xyz: \big|\mbox{$\frac 1 n$}N\big(xyz;

1005:    \mathbf{x}^n\mathbf{y}^n\mathbf{z}^n\big)-

1006:    p(xyz)\big|\,<\,\mbox{$\frac\epsilon{|\mathcal{X}|

1007:    |\mathcal{Y}||\mathcal{Z}|}$}

1008:    \;\textrm{ and }\;

1009:    \forall xyz: \big|\mbox{$\frac 1 n$}N\big(xyz;

1010:    \mathbf{x}^n\mathbf{y}^n\mathbf{z}^n\big)-

1011:    p'(xyz)\big|\,<\,\mbox{$\frac\epsilon{|\mathcal{X}|

1012:    |\mathcal{Y}||\mathcal{Z}|}$},

1013: \]

1014: and therefore the $L_1$ norm of $p-p'$ can be written as

1015: \begin{eqnarray*}

1016: \big|\big|p'-p\big|\big|_1

1017:   & = & \sum_{xyz}\big|p(xyz)-p'(xyz)\big| \\

1018:   & = & \sum_{xyz}\big|p(xyz)-\mbox{$\frac 1 n$}

1019:         N\big(xyz;\mathbf{x}^n\mathbf{y}^n\mathbf{z}^n\big)+\mbox{$\frac 1 n$}

1020:         N\big(xyz;\mathbf{x}^n\mathbf{y}^n\mathbf{z}^n\big)-p'(xyz)\big| \\

1021:   & \leq & \sum_{xyz}\big|\mbox{$\frac 1 n$}N\big(xyz;

1022:         \mathbf{x}^n\mathbf{y}^n\mathbf{z}^n\big)

1023:         -p(xyz)\big|

1024:         +\sum_{xyz}\big|\mbox{$\frac 1 n$}

1025:         N\big(xyz;\mathbf{x}^n\mathbf{y}^n\mathbf{z}^n\big)-p'(xyz)\big| \\

1026:   & < & 2\epsilon,

1027: \end{eqnarray*}

1028: thus proving the lemma.

1029: \tend\bigskip

1030:

1031: Our interest in this question stems from the fact that, from the

1032: requirement to cover a product source with distributed typical sets,

1033: we do get constraints on the shape of various typical sets.  So we

1034: need to characterize what distributions can give rise to those sets,

1035: and this lemma plays an important role in that.

1036:

1037:

1038: \subsection{Upper Bounds on the Size of Distributed Typical Cover Elements}

1039:

1040: \medskip

1041: \begin{center}\textcolor{gray}{\fbox{\begin{minipage}{16cm}

1042: \vspace{-4mm}\textcolor{black}{\begin{lemma}

1043: \label{lemma:bound-size}

1044: Consider any $\big(2^{nR_1},2^{nR_2},n,\epsilon,D_1,D_2\big)$ distributed

1045: rate-distortion code, represented by a cover $\mathcal{S}$.  Then, there

1046: exists a distribution $\pi\in\mathbb{P}_{\mbox{\tiny LB}}$ such that, for

1047: all $(i,j)\in\{1...2^{nR_1}\}\times\{1...2^{nR_2}\}$ and all $\epsilon>0$,

1048: \[ \big|\mathbf{S}_{ij}\,\cap\,T_\epsilon^n\big(XY\big)\big|

1049:    \;\;\leq\;\;

1050:    2^{n(H(XY|\hat X\hat Y)[\pi]+\ddot\epsilon)},

1051: \]

1052: provided $n$ is large enough.  Furthermore, for all

1053: $\mathbf{y}^n\in\mathcal{Y}^n$,

1054: \[ \big|\mathbf{S}_{1,i}\cap T_\epsilon^n\big(X\big|\mathbf{y}^n\big)\big|

1055:    \;\;\leq\;\;

1056:    2^{n(H(X|\hat X\hat YY)[\pi]+\ddot\epsilon')},

1057: \]

1058: and similarly for all $\mathbf{x}^n\in\mathcal{X}^n$,

1059: \[ \big|\mathbf{S}_{2,j}\cap T_\epsilon^n\big(Y\big|\mathbf{x}^n\big)\big|

1060:    \;\;\leq\;\;

1061:    2^{n(H(Y|\hat X\hat YX)[\pi]+\ddot\epsilon'')},

1062: \]

1063: also provided $n$ is large enough.

1064: \rend

1065: \end{lemma}}\end{minipage}}}\end{center}\medskip

1066:

1067: {\it Proof.}  From the two-terminal rate-distortion

1068: theorem~\cite[Thm.\ 2.2.3]{CsiszarK:81}, we know there exists a

1069: distribution $p(xy\hat x\hat y)=p(xy)p(\hat x\hat y|xy)$, with

1070: $p(xy)$ the given source, $\expect{d_1\big(X,\hat X\big)}\leq D_1$

1071: and $\expect{d_2\big(Y,\hat Y\big)}\leq D_2$, and

1072: sequences $\hat{\mathbf{x}}^n(ij)$ and $\hat{\mathbf{y}}^n(ij)$

1073: such that, for all

1074: $(i,j)\in\{1...2^{nR_1}\}\times\{1...2^{nR_2}\}$ and all $\epsilon>0$,

1075: \begin{equation}

1076:  \tilde{\mathbf{S}}_{ij}

1077:  \;\;\subseteq\;\;

1078:  T_\epsilon^n\big(XY\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big),

1079:  \label{eq:const-std-rd}

1080: \end{equation}

1081: provided $n$ is large enough.  But since for distributed codes we

1082: have $\tilde{\mathbf{S}}_{ij}=

1083: \big[\mathbf{S}_{1,i}\times\mathbf{S}_{2,j}\big]\cap T_\epsilon^n\big(XY\big)$,

1084: it follows from standard properties of typical sets that

1085: \[ \mathbf{S}_{1,i}\cap T_\epsilon^n\big(X\big|\mathbf{S}_{2,j}\big)

1086:    \;\;\subseteq\;\;

1087:    T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)

1088:    \mbox{\hspace{1cm}and\hspace{1cm}}

1089:    \mathbf{S}_{2,j}\cap T_\epsilon^n\big(Y\big|\mathbf{S}_{1,i}\big)

1090:    \;\;\subseteq\;\;

1091:    T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big).

1092: \]

1093: Consider now a new cover $\mathcal{S}'$, having the property that

1094: \[ \mathbf{S}'_{1,i}\cap T_\epsilon^n\big(X\big|\mathbf{S}'_{2,j}\big)

1095:    \;\;=\;\;

1096:    T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)

1097:    \mbox{\hspace{1cm}and\hspace{1cm}}

1098:    \mathbf{S}'_{2,j}\cap T_\epsilon^n\big(X\big|\mathbf{S}'_{1,i}\big)

1099:    \;\;=\;\;

1100:    T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big).

1101: \]

1102: A simple expression for the cover element $\mathbf{S}'_{1,i}$ is obtained

1103: as follows.  Fix an index $i\in\{1...2^{nR_1}\}$:

1104: \[\begin{array}{lrcl}

1105:   & \forall k: \mathbf{S}'_{1,i}\cap

1106:                T_\epsilon^n\big(X\big|\mathbf{S}'_{2,k}\big)

1107:     & =

1108:     & T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ik)\hat{\mathbf{y}}^n(ik)\big)

1109:     \\

1110:   \Rightarrow\hspace{6mm}

1111:     & \bigcup_{k=1}^{2^{nR_2}}

1112:       \mathbf{S}'_{1,i}\cap T_\epsilon^n\big(X\big|\mathbf{S}'_{2,k}\big)

1113:     & =

1114:     & \bigcup_{k=1}^{2^{nR_2}}

1115:       T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ik)\hat{\mathbf{y}}^n(ik)\big)

1116:     \\

1117:   \Rightarrow

1118:     & \mathbf{S}'_{1,i}\cap \bigcup_{k=1}^{2^{nR_2}}

1119:           T_\epsilon^n\big(X\big|\mathbf{S}'_{2,k}\big)

1120:     & =

1121:     & \bigcup_{k=1}^{2^{nR_2}}

1122:       T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ik)\hat{\mathbf{y}}^n(ik)\big)

1123:     \\

1124:   \Rightarrow

1125:     & \mathbf{S}'_{1,i}\cap S_{\epsilon,Y}^n\big(X\big)

1126:     & =

1127:     & \bigcup_{k=1}^{2^{nR_2}}

1128:       T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ik)\hat{\mathbf{y}}^n(ik)\big),

1129: \end{array}\]

1130: and since $P\big(S_{\epsilon,Y}^n\big(X\big)\big)>1-\dot\epsilon$,

1131: $\mathbf{S}'_{1,i}$ is determined up to a set of vanishing measure;

1132: similarly, fixing $j\in\{1...2^{nR_2}\}$, we get

1133: $\mathbf{S}'_{2,j}\cap S_{\epsilon,X}^n\big(Y\big) = \bigcup_{l=1}^{2^{nR_1}}

1134: T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n(lj)\hat{\mathbf{y}}^n(lj)\big)$.

1135:

1136: The new cover $\mathcal{S}'$ has some useful properties:

1137: \begin{itemize}

1138: \item for all $(i,j)$, $\mathbf{S}_{1,i}\cap S_{\epsilon,Y}^n\big(X\big)

1139:   \subseteq\mathbf{S}'_{1,i}\cap S_{\epsilon,Y}^n\big(X\big)$ and

1140:   $\mathbf{S}_{2,j}\cap S_{\epsilon,X}^n\big(Y\big)\subseteq

1141:   \mathbf{S}'_{2,j}\cap S_{\epsilon,X}^n\big(Y\big)$, and therefore

1142:   $\tilde{\mathbf{S}}_{ij}\subseteq\tilde{\mathbf{S}}'_{ij}$ as

1143:   well, by construction;

1144: \item for all $\big(\mathbf{x}^n\mathbf{y}^n\big)\in\tilde{\mathbf{S}}'_{ij}$,

1145:   $d_1\big(\mathbf{x}^n,\hat{\mathbf{x}}^n(ij)\big)<D_1^+$ and

1146:   $d_2\big(\mathbf{y}^n,\hat{\mathbf{y}}^n(ij)\big)<D_2^+$, from the

1147:   joint typicality conditions defining $\mathbf{S}'_{1,i}$ and

1148:   $\mathbf{S}'_{2,j}$;

1149: \item and $P\Big(\bigcup_{ij}\tilde{\mathbf{S}}'_{ij}\Big) \geq

1150:   P\Big(\bigcup_{ij}\tilde{\mathbf{S}}_{ij}\Big) > 1-\dot\epsilon$;

1151: \end{itemize}

1152: so, $\mathcal{S}'$ ``dominates'' $\mathcal{S}$ (in that every element

1153: in $\mathcal{S}$ is contained in one element of $\mathcal{S}'$), and

1154: $\mathcal{S}'$ satisfies the same distortion constraints that $\mathcal{S}$

1155: does.  Therefore, an upper bound on the size of the elements in the new

1156: cover $\mathcal{S}'$ is also an upper bound on the size of the elements

1157: in the given cover $\mathcal{S}$.

1158:

1159: Next we observe that new cover element $\tilde{\mathbf{S}}'_{ij}$ can be

1160: ``sandwiched'' in between two other terms:

1161: \begin{eqnarray*}

1162: \Big[T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)

1163:      \times

1164:      T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)

1165:      \Big]\cap T_\epsilon^n\big(XY\big)

1166:   & \stackrel{(a)}{\subseteq} &

1167:     \big[\mathbf{S}'_{1,i}\times\mathbf{S}'_{2,j}\big]

1168:     \cap T_\epsilon^n\big(XY\big) \\

1169:   & \stackrel{(b)}{\subseteq} &

1170:     T_\epsilon^n\big(XY\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big),

1171: \end{eqnarray*}

1172: where (a) follows from our choice of $\mathbf{S}'_{1,i}$ and

1173: $\mathbf{S}'_{2,j}$, and from elementary algebra of sets; and (b)

1174: follows from eqn.~\eqref{eq:const-std-rd}, and from the product form

1175: of distributed covers.  So, since the other inclusion always holds,

1176: \[

1177:    \Big[

1178:    T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)

1179:    \times

1180:    T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)

1181:    \Big]\cap T_\epsilon^n\big(XY\big)

1182:    \;\;=\;\;

1183:    T_\epsilon^n\big(XY\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)

1184: \]

1185: is a necessary condition on any suitable distribution $p(xy\hat x\hat y)$

1186: whose typical sets can be used to construct the cover $\mathcal{S}'$; or

1187: equivalently, since this must hold for every $(i,j)$,

1188: \[ \Big[

1189:    T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)

1190:    \times

1191:    T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)

1192:    \Big]\cap T_\epsilon^n\big(XY\big)

1193:    \;\;=\;\;

1194:    T_\epsilon^n\big(XY\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big),

1195: \]

1196: for any sequences $\hat{\mathbf{x}}^n$ and $\hat{\mathbf{y}}^n$ such

1197: that $T_\epsilon^n\big(XY\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)

1198: \neq\emptyset$.  Finally we note that this last condition is equivalent

1199: to

1200: \begin{equation}

1201:    T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)

1202:    \times

1203:    T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)

1204:    \;\;=\;\;

1205:    T_\epsilon^n\big(XY\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big).

1206:   \label{eq:const-typsets-1}

1207: \end{equation}

1208: This is because this last equality already forces any $\mathbf{x}^n

1209: \in T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)$

1210: and $\mathbf{y}^n\in

1211: T_\epsilon^n\big(Y\big|\hat{\mathbf{x}}^n\hat{\mathbf{y}}^n\big)$ to

1212: be jointly typical.  Therefore, from the reverse Markov lemma, we

1213: conclude there exists a distribution $\pi(xy\hat x\hat y)$, which

1214: satisfies a Markov chain of the form $X-\hat X\hat Y-Y$, such that

1215: $\big|\big|p-\pi\big|\big|_1<2\epsilon$.

1216:

1217: \centerline{---------------------}

1218:

1219: Next we observe that if $\big|\big|p-\pi\big|\big|_1<2\epsilon$,

1220: then conditionals and marginals of $p$ and of $\pi$ are also close.

1221: Consider, for example,

1222: $p_{\hat X\hat Y}(\hat x\hat y)=\sum_{xy}p_{XY\hat X\hat Y}(xy\hat x\hat y)$

1223: and $\pi_{\hat X\hat Y}(\hat x\hat y)

1224: =\sum_{xy}\pi_{XY\hat X\hat Y}(xy\hat x\hat y)$:

1225:   \begin{eqnarray*}

1226:   \big|\big|p_{\hat X\hat Y}(\cdot)-\pi_{\hat X\hat Y}(\cdot)\big|\big|_1

1227:      & = & \sum_{\hat x\hat y}

1228:            \big|p_{\hat X\hat Y}(\hat x\hat y)

1229:                 -\pi_{\hat X\hat Y}(\hat x\hat y)\big| \\

1230:      & = & \sum_{\hat x\hat y}

1231:            \Big|\Big(\sum_{x'y'}p_{XY\hat X\hat Y}(x'y'\hat x\hat y)\Big)

1232:                 -\Big(\sum_{x''y''}\pi_{XY\hat X\hat Y}(x''y''\hat x\hat y)

1233:                  \Big)\Big| \\

1234:      & = & \sum_{\hat x\hat y}

1235:            \Big|\sum_{xy}p_{XY\hat X\hat Y}(xy\hat x\hat y)

1236:                 -\pi_{XY\hat X\hat Y}(xy\hat x\hat y)\Big| \\

1237:      & \leq & \sum_{xy\hat x\hat y}

1238:            \big|p_{XY\hat X\hat Y}(xy\hat x\hat y)

1239:                 -\pi_{XY\hat X\hat Y}(xy\hat x\hat y)\big| \\

1240:      & < & 2\epsilon.

1241:   \end{eqnarray*}

1242: For the conditional $p_{XY|\hat X\hat Y}(xy|\hat x\hat y)$:

1243:   \begin{eqnarray*}

1244:   \lefteqn{\big|\big|p_{XY|\hat X\hat Y}(\cdot|\hat x\hat y)

1245:                      -\pi_{XY|\hat X\hat Y}(\cdot|\hat x\hat y)\big|\big|_1

1246:      \;\; = \;\; \sum_{xy} \big|p_{XY|\hat X\hat Y}(xy|\hat x\hat y)

1247:                           -p_{XY|\hat X\hat Y}(xy|\hat x\hat y)\big|} \\

1248:      & = & \sum_{xy} \Big|\frac{p_{XY\hat X\hat Y}(xy\hat x\hat y)}

1249:                                {p_{\hat X\hat Y}(\hat x\hat y)}

1250:                           -\frac{\pi_{XY\hat X\hat Y}(xy\hat x\hat y)}

1251:                                 {\pi_{\hat X\hat Y}(\hat x\hat y)}\Big| \\

1252:      & = & \mbox{$\frac{1}{p_{\hat X\hat Y}(\hat x\hat y)

1253:                            \pi_{\hat X\hat Y}(\hat x\hat y)}$}

1254:            \sum_{xy} \big|p_{XY\hat X\hat Y}(xy\hat x\hat y)

1255:                           \pi_{\hat X\hat Y}(\hat x\hat y)

1256:                          -\pi_{XY\hat X\hat Y}(xy\hat x\hat y)

1257:                           p_{\hat X\hat Y}(\hat x\hat y)\big| \\

1258:      & \stackrel{(a)}{<} & \mbox{$\frac{1}{p_{\hat X\hat Y}(\hat x\hat y)

1259:                                            \pi_{\hat X\hat Y}(\hat x\hat y)}$}

1260:            \sum_{xy} \big|p_{XY\hat X\hat Y}(xy\hat x\hat y)

1261:                            p_{\hat X\hat Y}(\hat x\hat y)

1262:                           +p_{XY\hat X\hat Y}(xy\hat x\hat y)2\epsilon

1263:                           -\pi_{XY\hat X\hat Y}(xy\hat x\hat y)

1264:                            p_{\hat X\hat Y}(\hat x\hat y)\big| \\

1265:      & \leq & \mbox{$\frac{1}{p_{\hat X\hat Y}(\hat x\hat y)

1266:                               \pi_{\hat X\hat Y}(\hat x\hat y)}$}

1267:            \sum_{xy}\Big(2\epsilon p_{XY\hat X\hat Y}(xy\hat x\hat y)

1268:                     +p_{\hat X\hat Y}(\hat x\hat y)

1269:                      \big|p_{XY\hat X\hat Y}(xy\hat x\hat y)

1270:                           -\pi_{XY\hat X\hat Y}(xy\hat x\hat y)\big|\Big) \\

1271:      & = & \mbox{$\frac{1}{p_{\hat X\hat Y}(\hat x\hat y)

1272:                               \pi_{\hat X\hat Y}(\hat x\hat y)}$}

1273:            \left(2\epsilon p_{\hat X\hat Y}(\hat x\hat y)

1274:                     +p_{\hat X\hat Y}(\hat x\hat y)\sum_{xy}

1275:                      \big|p_{XY\hat X\hat Y}(xy\hat x\hat y)

1276:                           -\pi_{XY\hat X\hat Y}(xy\hat x\hat y)\big|\right) \\

1277:      & \leq & \frac{4\epsilon}{\pi_{\hat X\hat Y}(\hat x\hat y)} \\

1278:      & \triangleq & \epsilon_1,

1279:   \end{eqnarray*}

1280: where (a) follows from the $L_1$ bound on the marginals

1281: $p_{\hat X\hat Y}$ and $\pi_{\hat X\hat Y}$ above; and provided both

1282: $p_{\hat X\hat Y}(\hat x\hat y)\neq 0$ and

1283: $\pi_{\hat X\hat Y}(\hat x\hat y)\neq 0$.  We also note that under the

1284: assumption that

1285: $\big|\big|p_{XY\hat X\hat Y}-\pi_{XY\hat X\hat Y}\big|\big|_1<2\epsilon$,

1286: there exists a value $\hat\epsilon$ such that, for all

1287: $0<\epsilon<\hat\epsilon$, it is not possible to have a pair

1288: $(\hat x_0\hat y_0)$ such that $p_{\hat X\hat Y}(\hat x_0\hat y_0)>0$

1289: but $\pi_{\hat X\hat Y}(\hat x_0\hat y_0)=0$, or vice versa.  This is

1290: because $\pi_{\hat X\hat Y}(\hat x_0\hat y_0)=0$ means that for all $xy$,

1291: $\pi_{XY\hat X\hat Y}(xy\hat x_0\hat y_0)=0$.  But if

1292: $p_{\hat X\hat Y}(\hat x_0\hat y_0)>0$, this means there exists at

1293: least one $x_0y_0$ such that $p_{XY\hat X\hat Y}(x_0y_0\hat x_0\hat y_0)>0$,

1294: and as a result,

1295: $\big|\big|p_{XY\hat X\hat Y}-\pi_{XY\hat X\hat Y}\big|\big|_1\geq

1296: p_{XY\hat X\hat Y}(x_0y_0\hat x_0\hat y_0)$; thus, setting

1297: $\hat\epsilon\triangleq p_{XY\hat X\hat Y}(x_0y_0\hat x_0\hat y_0)$,

1298: we get the sought contradiction.  Thus, for all $\epsilon$ small enough,

1299: the bound on the conditionals holds as well, and so we have

1300: from~\cite[Thm.\ 16.3.2]{CoverT:91} that

1301: \begin{equation}

1302:    \Big|H\big(XY\big|\hat X=\hat x,\hat Y=\hat y\big)[p]

1303:         -H\big(XY\big|\hat X=\hat x,\hat Y=\hat y\big)[\pi]\Big|

1304:    \;\;<\;\;

1305:    -\epsilon_1\log\Big(\mbox{$\frac{\mbox{\normalsize $\epsilon_1$}}{|\mathcal{X}||\mathcal{Y}|

1306:    |\hat{\mathcal{X}}||\hat{\mathcal{Y}}|}$}\Big)

1307:    \;\;\triangleq\;\; \epsilon_2,

1308:   \label{eq:l1-bound-cond-entropy}

1309: \end{equation}

1310: and so,

1311: \begin{eqnarray*}

1312: \lefteqn{\Big|H\big(XY\big|\hat X\hat Y\big)[p]

1313:               -H\big(XY\big|\hat X\hat Y\big)[\pi]\Big|} \\

1314:   & \leq & \sum_{\hat x\hat y}

1315:            \Big|p_{\hat X\hat Y}(\hat x\hat y)

1316:                 H\big(XY\big|\hat X=\hat x,\hat Y=\hat y\big)[p]

1317:                 -\pi_{\hat X\hat Y}(\hat x\hat y)

1318:                 H\big(XY\big|\hat X=\hat x,\hat Y=\hat y\big)[\pi]\Big| \\

1319:   & \stackrel{(a)}{\leq} &

1320:     \big|\hat{\mathcal{X}}\big|\cdot\big|\hat{\mathcal{Y}}\big|\cdot

1321:     \Big|p_{\hat X\hat Y}(\hat x^*\hat y^*)

1322:           H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[p]

1323:          -\pi_{\hat X\hat Y}(\hat x^*\hat y^*)

1324:           H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[\pi]\Big| \\

1325:   & \stackrel{(b)}{\leq} &

1326:     \big|\hat{\mathcal{X}}\big|\cdot\big|\hat{\mathcal{Y}}\big|\cdot

1327:     \Big|\pi_{\hat X\hat Y}(\hat x^*\hat y^*)

1328:          H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[p]

1329:          +2\epsilon H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[p]

1330:          \\&&\mbox{\hspace{1.7cm}}

1331:          -\pi_{\hat X\hat Y}(\hat x^*\hat y^*)

1332:           H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[\pi]\Big| \\

1333:   & = &

1334:     \big|\hat{\mathcal{X}}\big|\cdot\big|\hat{\mathcal{Y}}\big|\cdot

1335:     \Big|2\epsilon H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[p]

1336:     \\&&\mbox{\hspace{1.7cm}}

1337:          +\pi_{\hat X\hat Y}(\hat x^*\hat y^*)

1338:           \Big(H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[p]

1339:                -H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[\pi]\Big)\Big|

1340:          \\

1341:   & \stackrel{(c)}{\leq} &

1342:     \big|\hat{\mathcal{X}}\big|\cdot\big|\hat{\mathcal{Y}}\big|\cdot

1343:     \Big(2\epsilon H\big(XY\big|\hat X=\hat x^*,\hat Y=\hat y^*\big)[p]

1344:          +p_{\hat X\hat Y}(\hat x^*\hat y^*)\epsilon_2\Big) \\

1345:   & \triangleq & \epsilon_3,

1346: \end{eqnarray*}

1347: where (a) follows from choosing $\hat x^*\hat y^*$ as the pair

1348: $\hat x\hat y\in\hat{\mathcal{X}}\times\hat{\mathcal{Y}}$ that makes

1349: the difference $\big|p_{\hat X\hat Y}(\hat x\hat y)

1350: H\big(XY\big|\hat X=\hat x,\hat Y=\hat y\big)[p]

1351: -\pi_{\hat X\hat Y}(\hat x\hat y)

1352: H\big(XY\big|\hat X=\hat x,\hat Y=\hat y\big)[\pi]\big|$ largest;

1353: (b) follows from

1354: $\big|\big|p_{\hat X\hat Y}-\pi_{\hat X\hat Y}\big|\big|_1<2\epsilon$;

1355: and (c) follows from eqn.~\eqref{eq:l1-bound-cond-entropy} above, and

1356: from the triangle inequality.

1357:

1358: We conclude this part of the proof by noting that completely analogous

1359: arguments can be made to show that

1360: \[ \Big|H\big(X\big|\hat X\hat YY\big)[p]

1361:         -H\big(X\big|\hat X\hat YY\big)[\pi]\Big|

1362:    \;\;\leq\;\;\epsilon_4

1363:    \mbox{\hspace{1cm}and\hspace{1cm}}

1364:    \Big|H\big(Y\big|\hat X\hat YX\big)[p]

1365:         -H\big(Y\big|\hat X\hat YX\big)[\pi]\Big|

1366:    \;\;\leq\;\;\epsilon_5.

1367: \]

1368:

1369: \centerline{---------------------}

1370:

1371: We are now ready to prove our desired bounds.

1372:

1373: Since for all $(i,j)$, $\tilde{\mathbf{S}}_{ij}

1374: \subseteq \tilde{\mathbf{S}}'_{ij} =

1375: T_\epsilon^n\big(XY\big|\hat{\mathbf{x}}^n(ij)\hat{\mathbf{y}}^n(ij)\big)$,

1376: \[ \big|\tilde{\mathbf{S}}_{ij}\big|

1377:    \;\;\leq\;\;

1378:    2^{n(H(XY|\hat X\hat Y)[p]+\epsilon)}

1379:    \;\;\leq\;\;

1380:    2^{n(H(XY|\hat X\hat Y)[\pi]+\epsilon+\epsilon_3)};

1381: \]

1382: therefore, choosing $\ddot\epsilon\triangleq\epsilon+\epsilon_3$,

1383: the first bound specified by the lemma follows.

1384:

1385: For the other two bounds, fix now $\mathbf{y}^n\in\mathcal{Y}^n$.

1386: Since $\mathcal{S}$ is a cover, there must exist at least one value

1387: $j_0\in\{1...2^{nR_2}\}$, such that $\mathbf{y}^n\in\mathbf{S}_{2,j_0}$.

1388: So consider any $i\in\{1...2^{nR_1}\}$, and assume $\mathbf{S}_{1,i}

1389: \cap T_\epsilon^n\big(X\big|\mathbf{y}^n\big)\neq\emptyset$; based on

1390: this assumption, pick any $\mathbf{x}^n\in\mathbf{S}_{1,i}\cap

1391: T_\epsilon^n\big(X\big|\mathbf{y}^n\big)$.  This means that

1392: $\big(\mathbf{x}^n\mathbf{y}^n\big)\in

1393: \big[\mathbf{S}_{1,i}\times\mathbf{S}_{2,j_0}\big]\cap

1394: T_\epsilon^n\big(XY\big)$, and therefore that

1395: $\big(\mathbf{x}^n\mathbf{y}^n\big)\in

1396: \big[\mathbf{S}'_{1,i}\times\mathbf{S}'_{2,j_0}\big]\cap

1397: T_\epsilon^n\big(XY\big)$, and hence from eqn.~\eqref{eq:const-std-rd}

1398: we have that $\big(\mathbf{x}^n\mathbf{y}^n\hat{\mathbf{x}}^n(ij_0)

1399: \hat{\mathbf{y}}^n(ij_0)\big)\in T_\epsilon^n\big(XY\hat X\hat Y\big)$,

1400: and therefore we conclude that

1401: \[ \mathbf{S}_{1,i}\cap T_\epsilon^n\big(X\big|\mathbf{y}^n\big)

1402:    \;\;\subseteq\;\;

1403:    T_\epsilon^n\big(X\big|\hat{\mathbf{x}}^n(ij_0)\hat{\mathbf{y}}^n(ij_0)

1404:                           \mathbf{y}^n).

1405: \]

1406: We also note that if $\mathbf{S}_{1,i}\cap

1407: T_\epsilon^n\big(X\big|\mathbf{y}^n\big)=\emptyset$, then the last inclusion

1408: holds trivially.  Thus,

1409: \[ \big|\mathbf{S}_{1,i}\cap T_\epsilon^n\big(X\big|\mathbf{y}^n\big)\big|

1410:    \;\;\leq\;\;

1411:    2^{n(H(X|\hat X\hat YY)[p]+\epsilon)}

1412:    \;\;\leq\;\;

1413:    2^{n(H(X|\hat X\hat YY)[\pi]+\epsilon+\epsilon_4)},

1414: \]

1415: Therefore, choosing $\ddot\epsilon'\triangleq\epsilon+\epsilon_4$, the

1416: second bound specified by the lemma holds.  And the third (and last)

1417: bound follows from an argument identical to this last one.  So the lemma

1418: is proved.

1419: \tend\bigskip

1420:

1421:

1422: \section{Proof of Theorem~\ref{thm:main}}

1423: \label{sec:main-proof}

1424:

1425: Consider any $\big(2^{nR_1},2^{nR_2},n,\epsilon,D_1,D_2\big)$ distributed

1426: rate-distortion code, represented by a cover $\mathcal{S}$.  Then,

1427: \begin{eqnarray*}

1428: \lefteqn{n(R_1+R_2) \;\; \geq \;\; H\big(f_1(X^n)f_2(Y^n)\big)} \\

1429:   & = & H\big(f_1(X^n)f_2(Y^n)\big)

1430:         - H\big(f_1(X^n)f_2(Y^n)\big|X^nY^n\big) \\

1431:   & = & I\big(X^nY^n\wedge f_1(X^n)f_2(Y^n)\big) \\

1432:   & = & H\big(X^nY^n\big)

1433:         - H\big(X^nY^n\big|f_1(X^n)f_2(Y^n)\big) \\

1434:   & = & nH\big(XY\big)

1435:         - \sum_{1\leq i\leq 2^{nR_1},1\leq j\leq 2^{nR_2}}

1436:           P\big(f_1(X^n)=i,f_2(Y^n)=j\big)

1437:           H\big(X^nY^n\big|f_1(X^n)=i,f_2(Y^n)=j\big) \\

1438:   & \geq & nH\big(XY\big) -

1439:         \Big[ \max_{1\leq i\leq 2^{nR_1},1\leq j\leq 2^{nR_2}}

1440:                H\big(X^nY^n\big|f_1(X^n)=i,f_2(Y^n)=j\big)

1441:         \Big] \\&&\mbox{\hspace{2.06cm}}

1442:         \Big[ \sum_{1\leq i\leq 2^{nR_1},1\leq j\leq 2^{nR_2}}

1443:                P\big(f_1(X^n)=i,f_2(Y^n)=j\big)

1444:         \Big] \\

1445:   & = & nH\big(XY\big)

1446:         - \max_{1\leq i\leq 2^{nR_1},1\leq j\leq 2^{nR_2}}

1447:           H\big(X^nY^n\big|f_1(X^n)=i,f_2(Y^n)=j\big) \\

1448:   & \stackrel{(a)}{\geq} & nH\big(XY\big)

1449:         - \Big[\max_{1\leq i\leq 2^{nR_1},1\leq j\leq 2^{nR_2}}

1450:           \log\big|\tilde{\mathbf{S}}_{ij}\big|\Big]-n\epsilon_1 \\

1451:   & \stackrel{(b)}{\geq} &

1452:            nH\big(XY\big) - nH\big(XY\big|\hat X\hat Y\big)[\pi]

1453:                           - n\ddot\epsilon - n\epsilon_1 \\

1454:   & = & nI\big(XY\wedge \hat X\hat Y\big)[\pi] - n\ddot\epsilon - n\epsilon_1,

1455: \end{eqnarray*}

1456: where (a) follows from splitting outcomes of $X^nY^n$ into typical and

1457: non-typical ones, and from bounding the entropy of the typical ones with

1458: a uniform distribution; and (b) follows from Lemma~\ref{lemma:bound-size},

1459: for some $\pi\in\mathbb{P}_{\mbox{\tiny LB}}$.

1460:

1461: For the individual rates, we have the following chain of inequalities:

1462: \begin{eqnarray*}

1463: nR_1 & \geq & H\big(f_1(X^n)\big) \\

1464:   & \geq & H\big(f_1(X^n)\big|Y^n\big) \\

1465:   & = & H\big(f_1(X^n)\big|Y^n\big)-H\big(f_1(X^n)\big|X^nY^n\big) \\

1466:   & = & I\big(X^n\wedge f_1(X^n)\big|Y^n\big) \\

1467:   & = & H\big(X^n\big|Y^n\big)-H\big(X^n\big|f_1(X^n)Y^n\big) \\

1468:   & = & nH\big(X\big|Y\big)-H\big(X^n\big|f_1(X^n)Y^n\big) \\

1469:   & = & nH\big(X\big|Y\big)

1470:         -\sum_{\mathbf{y}^n\in\mathcal{Y}^n}\sum_{i=1}^{2^{nR_1}}

1471:          P\big(f_1(X^n)=i,Y^n=\mathbf{y}^n\big)

1472:          H\big(X^n\big|f_1(X^n)=i,Y^n=\mathbf{y}^n\big) \\

1473:   & \geq & nH\big(X\big|Y\big)

1474:         - \Big[ \max_{i=1...2^{nR_1},\mathbf{y}^n\in\mathcal{Y}^n}

1475:                 H\big(X^n\big|f_1(X^n)=i,Y^n=\mathbf{y}^n\big) \Big]

1476:           \\&&\mbox{\hspace{2.18cm}}

1477:           \Big[ \sum_{\mathbf{y}^n\in\mathcal{Y}^n}\sum_{i=1}^{2^{nR_1}}

1478:                 P\big(f_1(X^n)=i,Y^n=\mathbf{y}^n\big) \Big] \\

1479:   & = & nH\big(X\big|Y\big)

1480:         - \max_{i=1...2^{nR_1},\mathbf{y}^n\in\mathcal{Y}^n}

1481:           H\big(X^n\big|f_1(X^n)=i,Y^n=\mathbf{y}^n\big) \\

1482:   & \stackrel{(a)}{\geq} & nH\big(X\big|Y\big)

1483:         - \Big[\max_{i=1...2^{nR_1},\mathbf{y}^n\in\mathcal{Y}^n}

1484:           \log_2\big|\mathbf{S}_{1,i}\cap

1485:            T_\epsilon^n\big(X\big|\mathbf{y}^n\big)\big|\Big]-n\epsilon_1 \\

1486:   & \stackrel{(b)}{\geq} & nH\big(X\big|Y\big)

1487:            - nH\big(X\big|\hat X\hat YY\big)[\pi]

1488:            - n\ddot\epsilon' - n\epsilon_1 \\

1489:   & = & nI\big(X\wedge \hat X\hat Y\big|Y\big)[\pi]

1490:            - n\ddot\epsilon' - n\epsilon_1,

1491: \end{eqnarray*}

1492: where (a) follows from splitting the outcomes of $X^n$ into those

1493: that are jointly typical with the given sequence $\mathbf{y}^n$ and

1494: those that are not, and from bounding the entropy of the typical

1495: ones with a uniform distribution; and (b) follows from

1496: Lemma~\ref{lemma:bound-size}.  An identical argument shows that

1497: $nR_2\geq nI\big(Y\wedge\hat X\hat Y\big|X\big)[\pi]-n\ddot\epsilon''

1498: -n\epsilon_1$.  And since these conditions must hold for all

1499: $\epsilon>0$, the theorem follows.

1500: \tend

1501:

1502:

1503: \section{Discussion}

1504: \label{sec:discussion}

1505:

1506: We conclude the first part of this paper with some discussion on

1507: the results proved so far.

1508:

1509: \subsection{Finite Parameterization of $\mathcal{R}^o(D_1,D_2)$}

1510:

1511: The class of distributions used to define the Berger-Tung inner bound

1512: is given by:

1513: \[ \mathbb{P}_{\mbox{\tiny BT}}

1514:    \;\;\triangleq\;\;

1515:    \left\{p_{XYUV}\left|\begin{array}{rl}

1516:                         \bullet & p(xy)=\sum_{uv}p_{XYUV}(xyuv) \\

1517:                         \bullet & U-X-Y-V\textrm{ is a Markov chain} \\

1518:                         \bullet & \expect{d_1\big(X,\gamma_1(U,V)\big)}\leq D_1

1519:                                   \textrm{ and }

1520:                                   \expect{d_2\big(Y,\gamma_2(U,V)\big)}\leq D_2

1521:                         \end{array}\right\}\right.,

1522: \]

1523: for fixed distortions $(D_1,D_2)$, source $p(xy)$, and some functions

1524: $\gamma_1:\mathcal{U}\times\mathcal{V}\to\hat{\mathcal{X}}$ and

1525: $\gamma_2:\mathcal{U}\times\mathcal{V}\to\hat{\mathcal{Y}}$.

1526: To make a direct comparison

1527: with $\mathbb{P}_{\mbox{\tiny BT}}$ easier, we rewrite

1528: $\mathbb{P}_{\mbox{\tiny LB}}$ in terms of two

1529: variables $U$ and $V$ as follows:

1530: \begin{itemize}

1531: \item Set $\mathcal{U}\triangleq\hat{\mathcal{X}}$ and

1532:   $\mathcal{V}\triangleq\hat{\mathcal{V}}$.

1533: \item For any $p_{XY\hat X\hat Y}\in\mathbb{P}_{\mbox{\tiny LB}}$,

1534:   set $p_{XYUV}(xyuv)\triangleq p_{XY\hat X\hat Y}(xy\hat x\hat y)$.

1535: \end{itemize}

1536: Then, it is clear that $\mathbb{P}'_{\mbox{\tiny LB}}$, defined by

1537: \[ \mathbb{P}'_{\mbox{\tiny LB}}

1538:    \;\;\triangleq\;\;

1539:    \left\{p_{XYUV}\left|\begin{array}{rl}

1540:                         \bullet & p(xy)=\sum_{uv}p_{XYUV}(xyuv) \\

1541:                         \bullet & X-UV-Y\textrm{ is a Markov chain} \\

1542:                         \bullet & \expect{d_1\big(X,\gamma_1(U,V)\big)}\leq D_1

1543:                                   \textrm{ and }

1544:                                   \expect{d_2\big(Y,\gamma_2(U,V)\big)}\leq D_2

1545:                         \end{array}\right\}\right.,

1546: \]

1547: again for fixed distortions $(D_1,D_2)$, source $p(xy)$, and some

1548: functions $\gamma_1:\mathcal{U}\times\mathcal{V}\to\hat{\mathcal{X}}$

1549: and $\gamma_2:\mathcal{U}\times\mathcal{V}\to\hat{\mathcal{Y}}$, is

1550: just a relabeling of $\mathbb{P}_{\mbox{\tiny LB}}$.

1551:

1552: In terms of these sets, we can state the following bounds on

1553: $\mathcal{R}^*(D_1,D_2)$:

1554: \begin{equation}

1555:    \overline{\bigcup_{p\in\mathbb{P}_{\mbox{\tiny BT}}}\mathcal{R}(D_1,D_2,p)}

1556:    \;\;\subseteq\;\;

1557:    \mathcal{R}^*(D_1,D_2)

1558:    \;\;\subseteq\;\;

1559:    \overline{\bigcup_{p\in\mathbb{P}'_{\mbox{\tiny LB}}}\mathcal{R}(D_1,D_2,p)}.

1560:   \label{eq:region-bounds}

1561: \end{equation}

1562: $\mathcal{R}^*(D_1,D_2)$ is not a characterization of the region of

1563: achievable rates that we would normally consider satisfactory, in that

1564: it is not ``computable,'' in the sense of~\cite[pg.\ 259]{CsiszarK:81}.

1565: Yet with eqn.~\eqref{eq:region-bounds}, we have managed to ``sandwich''

1566: the uncomputable $\mathcal{R}^*(D_1,D_2)$ region in between two

1567: other regions, both of which are computable:

1568: \begin{itemize}

1569: \item in $\mathbb{P}'_{\mbox{\tiny LB}}$, $U$ and $V$ are taken

1570:   over finite alphabets ($\mathcal{U}=\hat{\mathcal{X}}$ and

1571:   $\mathcal{V}=\hat{\mathcal{Y}}$);

1572: \item and in $\mathbb{P}_{\mbox{\tiny BT}}$, although we have

1573:   not been able to find anywhere in the literature a proof that

1574:   the cardinality of $U$ and $V$ must be finite, presumably a

1575:   direct application of the method of Ahlswede and K\"orner should

1576:   produce the desired bounds~\cite{AhlswedeK:75, Salehi:78}.

1577: \end{itemize}

1578: This is of interest because, as far as we can tell, none of the outer

1579: bounds we have found in the literature are computable.

1580:

1581: \subsection{Relationship to the Berger-Tung Outer Bound}

1582:

1583: One simple sufficient condition (which unfortunately does not hold)

1584: for proving the inclusions in eqn.~\eqref{eq:region-bounds} to be

1585: in fact equalities would have been to show that

1586: $\mathbb{P}'_{\mbox{\tiny LB}}\subseteq\mathbb{P}_{\mbox{\tiny BT}}$.

1587: However, a direct comparison among these two sets is still revealing.

1588: Consider any distribution $p$ that satisfies the constraints of both

1589: sets (i.e., $p\in\mathbb{P}_{\mbox{\tiny LB}}\cap

1590: \mathbb{P}_{\mbox{\tiny BT}}$), and elements $xyuv$ for which

1591: $p(xyuv)\neq 0$.  Then, this $p$ admits two different factorizations:

1592: \[\begin{array}{crcl}

1593:   & p(uv)p(x|uv)p(y|uv) & = & p(xy)p(u|x)p(v|y) \\

1594: \Leftrightarrow & p(uv)\frac{p(uv|x)p(x)}{p(uv)}\frac{p(uv|y)p(y)}{p(uv)}

1595:                         & = & p(xy)p(u|x)p(v|y) \\

1596: \Leftrightarrow & p(uv|x)p(x)p(uv|y)p(y) & = & p(xy)p(u|x)p(v|y)p(uv) \\

1597: \Leftrightarrow & p(u|x)p(v|x)p(x)p(u|y)p(v|y)p(y)

1598:                         & = & p(xy)p(u|x)p(v|y)p(uv) \\

1599: \Leftrightarrow & p(v|x)p(x)p(u|y)p(y) & = & p(xy)p(uv) \\

1600: \Leftrightarrow & p(xv)p(yu) & = & p(xy)p(uv).

1601: \end{array}\]

1602: Clearly, any distribution in this intersection must make all

1603: variables pairwise independent: integrate any two of them, the

1604: other two can be expressed as the product of their marginals.

1605:

1606: We find this observation interesting because it provides clear

1607: evidence that our lower bound is very different in nature from the

1608: Berger-Tung outer bound~\cite{Berger:78, Tung:PhD}.  In that bound,

1609: the set of distributions in the outer bound (all Markov chains of

1610: the form $U-X-Y$ and $X-Y-V$) strictly contains

1611: $\mathbb{P}_{\mbox{\tiny BT}}$; that means, there is a subset of

1612: the distributions in the outer bound that generates all rates we

1613: know to be achievable.  In our bound, since

1614: $\mathbb{P}_{\mbox{\tiny LB}}\cap\mathbb{P}_{\mbox{\tiny BT}}$

1615: is a degenerate set, {\em none} of the distributions

1616: in $p\in\mathbb{P}_{\mbox{\tiny LB}}$ can be used to define a code

1617: construction based on known methods,\footnote{Except of course for

1618: trivial cases, such as when the two sources $X$ and $Y$ are independent,

1619: and the distortion is maximum.}

1620: such as the ``quantize-then-bin'' strategy used in the proof of

1621: the Berger-Tung inner bound.

1622:

1623: \subsection{Computation of the Outer Bound}

1624:

1625: The finite parameterization of our outer bound is an important

1626: contribution in itself we believe, given the fact that the Berger-Tung

1627: outer bound is not computable.\footnote{And neither is the more modern

1628: outer bound of Wagner and Anantharam~\cite{Wagner:PhD, WagnerA:05},

1629: also mentioned in the introduction.}  This is of interest in part

1630: because, at least in principle, this finite parameterization renders

1631: the problem amenable to analysis using computational methods.  Finding

1632: an efficient algorithm for computing solutions to the optimization

1633: problem defined by Theorem~\ref{thm:main}, similar in spirit to the

1634: Blahut-Arimoto algorithm for the numerical evaluation of channel

1635: capacity and rate-distortion functions~\cite{Arimoto:72, Blahut:72},

1636: certainly is an interesting challenge in its own right.

1637:

1638: More fundamentally though, we believe the computability of our

1639: bound holds the key to complete a proof of the optimality of the

1640: Berger-Tung inner bound for the problem setup of Fig.~\ref{fig:setup}:

1641: \begin{itemize}

1642: \item Computational methods are of interest not only because they

1643:   lead to answers that are ``useful in practice;'' discovering

1644:   efficient algorithms invariably requires the uncovering of structure

1645:   in the problem.  A good example in our field: the characterization

1646:   by Chiang and Boyd of the Lagrange duals of channel capacity and

1647:   rate-distortion as convex geometric programs~\cite{ChiangB:04}.

1648: \item Last but not least, an efficient algorithm to compute the

1649:   sandwich terms in eqn.~\eqref{eq:region-bounds} provides a fallback

1650:   strategy.  If all else fails, at least by means of numerical methods

1651:   we can check whether, in concrete instances of the problem, the

1652:   lower and upper bounds coincide or not.

1653: \end{itemize}

1654: The achievability of the set of rates defined by Theorem~\ref{thm:main},

1655: and the effective computation of the bounds of eqn.~\eqref{eq:region-bounds},

1656: are the main topics considered in Part II.

1657:

1658:

1659: \bigskip\noindent{\em Acknowledgements}--In the final version.

1660:

1661:

1662: %\pagebreak

1663: %\bibliographystyle{plain}

1664: %\bibliography{library}

1665: \begin{thebibliography}{10}

1666:

1667: \bibitem{AhlswedeK:75}

1668: R.~Ahlswede and J.~K{\"o}rner.

1669: \newblock {Source Coding with Side Information and a Converse for Degraded

1670:   Broadcast Channels}.

1671: \newblock {\em IEEE Trans. Inform. Theory}, IT-21(6):629--637, 1975.

1672:

1673: \bibitem{Arimoto:72}

1674: S.~Arimoto.

1675: \newblock {An Algorithm for Computing the Capacity of Arbitrary Discrete

1676:   Memoryless Channels}.

1677: \newblock {\em IEEE Trans. Inform. Theory}, IT-18(1):14--20, 1972.

1678:

1679: \bibitem{BarrosS:06}

1680: J.~Barros and S.~D. Servetto.

1681: \newblock {Network Information Flow with Correlated Sources}.

1682: \newblock {\em IEEE Trans. Inform. Theory}, 52(1):155--170, 2006.

1683:

1684: \bibitem{Berger:78}

1685: T.~Berger.

1686: \newblock {\em The Information Theory Approach to Communications (G. Longo,

1687:   ed.)}, chapter Multiterminal Source Coding.

1688: \newblock Springer-Verlag, 1978.

1689:

1690: \bibitem{BergerHOTW:79}

1691: T.~Berger, K.~B. Housewright, J.~K. Omura, S.~Tung, and J.~Wolfowitz.

1692: \newblock {An Upper Bound on the Rate Distortion Function for Source Coding

1693:   with Partial Side Information at the Decoder}.

1694: \newblock {\em IEEE Trans. Inform. Theory}, 25(6):664--666, 1979.

1695:

1696: \bibitem{BergerS:07}

1697: T.~Berger and S.~D. Servetto.

1698: \newblock {Multiterminal Source Coding -- 30 Years Later}.

1699: \newblock In preparation, for Foundations and Trends in Communications and

1700:   Information Theory.

1701:

1702: \bibitem{BergerY:89}

1703: T.~Berger and R.~W. Yeung.

1704: \newblock {Multiterminal Source Encoding with One Distortion Criterion}.

1705: \newblock {\em IEEE Trans. Inform. Theory}, 35(2):228--236, 1989.

1706:

1707: \bibitem{BergerZV:96}

1708: T.~Berger, Z.~Zhang, and H.~Viswanathan.

1709: \newblock {The CEO Problem}.

1710: \newblock {\em IEEE Trans. Inform. Theory}, 42(3):887--902, 1996.

1711:

1712: \bibitem{Blahut:72}

1713: R.~E. Blahut.

1714: \newblock {Computation of Channel Capacity and Rate-Distortion Functions}.

1715: \newblock {\em IEEE Trans. Inform. Theory}, IT-18(4):460--473, 1972.

1716:

1717: \bibitem{ChiangB:04}

1718: M.~Chiang and S.~Boyd.

1719: \newblock {Geometric Programming Duals of Channel Capacity and Rate

1720:   Distortion}.

1721: \newblock {\em IEEE Trans. Inform. Theory}, 50(2):245--258, 2004.

1722:

1723: \bibitem{Cover:75b}

1724: T.~M. Cover.

1725: \newblock {A Proof of the Data Compression Theorem of Slepian and Wolf for

1726:   Ergodic Sources}.

1727: \newblock {\em IEEE Trans. Inform. Theory}, IT-21(2):226--228, 1975.

1728:

1729: \bibitem{CoverT:91}

1730: T.~M. Cover and J.~Thomas.

1731: \newblock {\em {Elements of Information Theory}}.

1732: \newblock John Wiley and Sons, Inc., 1991.

1733:

1734: \bibitem{CsiszarK:80}

1735: I.~Csisz\'ar and J.\ K{\"o}rner.

1736: \newblock {Towards a General Theory of Source Networks}.

1737: \newblock {\em IEEE Trans. Inform. Theory}, 26(2):155--166, 1980.

1738:

1739: \bibitem{CsiszarK:81}

1740: I.~Csisz\'ar and J.~K{\"o}rner.

1741: \newblock {\em {Information Theory: Coding Theorems for Discrete Memoryless

1742:   Systems}}.

1743: \newblock Acad\'emiai Kiad\'o, Budapest, 1981.

1744:

1745: \bibitem{DobrushinT:62}

1746: R.~L. Dobrushin and B.~S. Tsybakov.

1747: \newblock {Information Transmission with Additional Noise}.

1748: \newblock {\em IEEE Trans. Inform. Theory}, 8(5):293--304, 1962.

1749:

1750: \bibitem{Salehi:78}

1751: M.~Salehi.

1752: \newblock {Cardinality Bounds on Auxiliary Variables in Multiple-User Theory

1753:   via the Method of Ahlswede and K{\"o}rner}.

1754: \newblock Technical Report~33, Statistics Department, Stanford University,

1755:   August 1978.

1756:

1757: \bibitem{Shannon:59}

1758: C.~E. Shannon.

1759: \newblock {Coding Theorems for a Discrete Source with a Fidelity Criterion}.

1760: \newblock {\em IRE Nat. Conv. Rec.}, 4:142--163, 1959.

1761:

1762: \bibitem{SlepianW:73b}

1763: D.~Slepian and J.~K. Wolf.

1764: \newblock {Noiseless Coding of Correlated Information Sources}.

1765: \newblock {\em IEEE Trans. Inform. Theory}, IT-19(4):471--480, 1973.

1766:

1767: \bibitem{Tung:PhD}

1768: S.~Y. Tung.

1769: \newblock {\em {Multiterminal Source Coding}}.

1770: \newblock PhD thesis, Cornell University, 1978.

1771:

1772: \bibitem{Wagner:PhD}

1773: A.~B. Wagner.

1774: \newblock {\em {Methods of Offine Distributed Detection: Interacting Particle

1775:   Models and Information-Theoretic Limits}}.

1776: \newblock PhD thesis, University of California, Berkeley, 2005.

1777:

1778: \bibitem{WagnerA:05}

1779: A.~B. Wagner and V.~Anantharam.

1780: \newblock {An Improved Outer Bound for the Multiterminal Source Coding

1781:   Problem}.

1782: \newblock In {\em Proc. IEEE Int. Symp. Inform. Theory (ISIT)}, Adelaide,

1783:   Australia, 2005.

1784: \newblock Extended version submitted to the IEEE Transactions on Information

1785:   Theory. Available from \href{http://arxiv.org/abs/cs.IT/0511103/} {{\tt

1786:   http://arxiv.org/abs/cs.IT/0511103/}}.

1787:

1788: \bibitem{Wyner:75}

1789: A.~D. Wyner.

1790: \newblock {On Source Coding with Side Information at the Decoder}.

1791: \newblock {\em IEEE Trans. Inform. Theory}, IT-21(3):294--300, 1975.

1792:

1793: \bibitem{WynerZ:76}

1794: A.~D. Wyner and J.~Ziv.

1795: \newblock {The Rate-Distortion Function for Source Coding with Side Information

1796:   at the Decoder}.

1797: \newblock {\em IEEE Trans. Inform. Theory}, IT-22(1):1--10, 1976.

1798:

1799: \bibitem{Yeung:PhD}

1800: R.~W. Yeung.

1801: \newblock {\em {Some Results on Multiterminal Source Coding}}.

1802: \newblock PhD thesis, Cornell University, 1988.

1803:

1804: \bibitem{Yeung:01}

1805: R.~W. Yeung.

1806: \newblock {\em {A First Course in Information Theory}}.

1807: \newblock Kluwer Academic Publishers, 2001.

1808:

1809: \end{thebibliography}

1810:

1811:

1812:

1813: \end{document}

1814:

1815: