0308:cond-mat0308308/rev.tex

1: \documentstyle[aps,prl]{revtex}

2: %\documentclass[aps,prl]{revtex}

3: \input{epsf}

4:

5:

6: \begin{document}

7: \title{An efficient joint source-channel coding for a D-dimensional

8: array} \author{Ido Kanter, Haggai Kfir and  Shahar Keren}

9: \address{Minerva Center and the

10: Department of Physics, Bar-Ilan University, Ramat-Gan 52900, Israel

11: }

12:

13:

14: \date{July 2003}

15: \maketitle

16:

17:

18: \begin{abstract}

19: An efficient joint source-channel (S/C) decoder based on the side

20: information of the source and on the MN-Gallager Code over Galois

21: fields, $q$, is presented. The dynamical posterior probabilities are

22: derived either from the statistical mechanical approach for

23: calculation of the entropy for the correlated sequences, or by the

24: Markovian joint S/C algorithm.  The Markovian joint S/C decoder has

25: many advantages over the statistical mechanical approach, among them:

26: (a) there is no need for the construction and the diagonalization of a

27: $q \times q$ matrix and for a solution to saddle point equations in

28: $q$ dimensions; (b) a generalization to a joint S/C coding of an array

29: of two-dimensional bits (or higher dimensions) is achievable; (c)

30: using parametric estimation, an efficient joint S/C decoder with the

31: lack of side information is discussed. Besides the variant joint S/C

32: decoders presented, we also show that the available sets of

33: autocorrelations consist of a convex volume, and its structure can be

34: found using the Simplex algorithm.

35: \end{abstract}

36:

37: % Note that keywords are not normally used for peerreview papers.

38:

39: % For peer review papers, you can put extra information on the cover

40: % page as needed:

41: % \begin{center} \bfseries EDICS Category: 3-BBND \end{center}

42: %

43: % For peerreview papers, inserts a page break and creates the second title.

44: % Will be ignored for other modes.

45: %\IEEEpeerreviewmaketitle

46: %\twocolumn

47:

48:

49: \section{Introduction}

50: Source coding is a process for removing redundant information from the

51: source information symbol stream. Suppose we have a bitmapped image,

52: then converting the bitmap image to GIF, JPEG or any of the familiar

53: image formats used on the web is a source coding process. Not only

54: can images be coded, but also sound, video frames, etc., and

55: compressing the stream of information is source coding.

56:

57: Channel coding is a procedure for adding redundancy as protection into

58: the information stream which is to be transmitted; in other words,

59: channel coding can be regarded as adding protection to the

60: transmission process.  For example, a wireless communication channel

61: is affected by many factors such as distance, speed at

62: which either party is moving, weather, buildings, other users'

63: unintentional interference, etc., so errors cannot be avoided.  During

64: the last decade engineers and also physicists have designed efficient

65: error correction techniques such as Low-Density-Parity-Check-Codes

66: (LDPC)\cite{forney,David-Mackay2,Shokrollahi,KS} or Turbo\cite{turbo} codes

67: that nearly saturate Shannon's limit.

68:

69:

70: In a typical scenario of a communication channel there are two major

71: resources which are highly limited. The first is power, which

72: includes both transmitter power and receiver power.  The

73: second is bandwidth (channel capacity) indicating the speed at which

74: the channel can transmit information, or more exactly, how many bps (bits

75: per second). Both of these determine the capability of a

76: channel. For example, by increasing the power we can reduce the error,

77: but the power is limited. On the other hand, if the channel capacity

78: is unlimited, we can just go ahead and add a large amount of protection

79: (low rate), but again we cannot afford that since channel capacity is a

80: commodity which in many scenarios is even more precious than power.

81:

82:

83: The main tradeoff in communication is the following: given a

84: fixed capacity channel and a fixed amount of power, how should we

85: allocate them between the source and the channel to get the best

86: result, i.e, the smallest distortion?  We know that a certain amount of

87: channel capacity is allocated to the source and the rest is used for

88: protection, but what is the ratio between them?

89:

90: Shannon separation theorem states that source coding (compression)

91: and channel coding (error protection) can be performed separately and

92: sequentially, while maintaining optimality

93: \cite{Shannon,Cover,err_cor_book,Frey}. However, this is true only in

94: the case of asymptotically long block lengths of data and

95: point-to-point transmission. In many practical applications, the

96: conditions of the Shannon's separation theorem neither holds, nor can

97: it be used as a good approximation. Thus, considerable interest has

98: developed in various schemes of joint source-channel (S/C) coding,

99: where compression and error correction are combined into one mechanism

100: (see, for instance, the following selected publications

101: \cite{shamail1,shamail2,shamail3,shamail4,shamail5,shamail6}).

102:

103: %It may seems ironic that after we spend so much effort squeezing and

104: %compressing the last redundant bit out of the source bit stream, now

105: %we are just going to add a lot of redundancy into it in order to

106: %protect the  information from a noise during the transmission.

107:

108: %The objective of joint source-channel coding is to combine both source

109: %(compression) and channel (error correction) coding into one mechanism

110: %in order to reduce the overall complexity of the communication while

111: %maintaining satisfactory performance.  Another possible advantage of

112: %the joint source-channel coding is the reduction of the sensitivity to

113: %a bit error in a compressed message.

114:

115: The paper is organized as follows.

116: In Section II Statistical Mechanical (SM) joint S/C coding is

117: introduced, whereas in Section III the threshold of the code is

118: calculated using scaling behavior for the required number of messages

119: passing for the convergence of the algorithm

120: \cite{KS,KS-Gaussian,KK}. In Section IV the efficiency of the SM

121: joint S/C coding is compared to various separation schemes. A

122: degradation in the performance of the SM joint S/C coding is examined

123: in Section V as a function of the spectrum of the eigenvalues of the

124: transfer matrix. In Section VI the Simplex algorithm is used to

125: calculate the available space of a possible set of autocorrelations.

126: The drawbacks of the SM joint S/C coding are discussed in Section VII,

127: and advanced S/C coding is presented in Section VIII. The Markovian

128: joint S/C coding and its efficiency are discussed in Section IX. Based

129: on the parametric estimation methods the Markovian joint S/C decoder

130: with the lack of side information is discussed in Section X.  Its

131: extension to higher dimensions is discussed in Section XI. The paper

132: closes with some concluding remarks.

133:

134:

135:

136:

137: \section{Joint S/C coding - Statistical Mechanical approach}

138:

139: In our recent papers\cite{KR,KK} a particular scheme based on a

140: SM approach for the implementation of the joint

141: S/C coding was presented and the main steps are briefly

142: summarized below.  The original boolean source is first mapped to a

143: binary source \cite{sourlas,sourlas1} $\left\{ x_{i}\pm1\right\}

144: ~i=1,...,L$, and is characterized by a finite set of autocorrelations

145: bounded by the length $k_0$

146: \begin{equation}

147: C_{k_1, ...,k_m}=\frac{1}{L}\sum_{i=1}^{L}x_{i}\prod_{j=0}^m

148: x_{\left(i+k_j\right)\: \mathbf{mod}\: L}

149: \label{ck}

150: \end{equation}

151: \noindent where $k_m \le k_0$ is the highest length autocorrelation

152: taken and the total number of possible different autocorrelations is

153: $2^{k_0}$.  For $k_0=2$, for instance, there are only $4$ possible

154: correlations, $C_0$, $C_1$, $C_2$ and $C_{12}$, and for $k_0=3$ there

155: are $8$ possible different correlations;

156: $C_0,~C_1~,C_2,~C_3,~C_{12},~C_{13},~C_{23},~C_{123}$, where we do not

157: assume left-right symmetry for the source.  Note that for the general

158: $k_0$ and $m=1$, eq. \ref{ck} is reduced to the two-point

159: autocorrelation functions \cite{liat}. The number of sequences obeying

160: these $2^{k_0}$ constraints is given by

161: \begin{equation}

162: \Omega = Tr_{\{ x_i = \pm 1 \} }\!\!\!\!

163: \prod_{\{k_1,k_2,...,k_m\}}\!\!\!\!\! \delta

164: (\sum_{i=1}^{L}x_{i}\prod_{j=0}^m x_{i+k_j} - LC_{k_1,

165: ...,k_m})

166: \label{omega}

167: \end{equation}

168: \noindent where $m=0$ stands for $C_0$. Using the integral

169: representation of the delta functions, eq. \ref{omega} can be written

170: as

171: \begin{eqnarray}

172: &\Omega\!\! &\!\! =\!\! \int_{-\infty}^{\infty}

173: \prod_{\{k_1,..,k_m\}}\!\!\!\!

174: dy_{\{k_1,..,k_m\}} \exp(\sum

175: -y_{k_1,..,k_m}C_{k_1,..,k_m}) \nonumber  \\ & Tr &\!\!\!\! \exp (\sum_{k_1,..,k_m}

176: y_{k_1,...,k_m}\sum_i x_{i}\prod_{j=0}^m x_{i+k_j} )

177: \label{omega1}

178: \end{eqnarray}

179: Since $k_j \le k_0$, the last term of eq. \ref{omega1} indicates that

180: the trace can be performed using the standard transfer matrix (of size

181: $2^{k_0} \times 2^{k_0}$) method\cite{baxter}. More precisely, assume

182: two successive blocks of $k_0$ binary variables denoted by

183: $(x_1,...,x_{k_0})$ and $(x_{k_0+1},...  ,x_{2k_0})$. The element

184: $(i,j)$ of the transfer matrix is equal to the value of the last

185: exponential term (on the r.h.s of the trace) of eq. \ref{omega1},

186: where the first block is in state $i$ (among $2^{k_0}$ possible

187: states) and the second block is in state $j$. The transfer matrix is a

188: non-negative matrix (as long as the $y_{k_1,...,k_m}$ are real

189: numbers), and the leading eigenvalue is positive and

190: non-degenerate\cite{baxter}. In the leading order one finds

191: \begin{eqnarray}

192: \Omega &=&\int

193: dy_k\exp\{-L\lbrack \sum y_{k_1,...,k_m}C_{k_1,...,k_m} \nonumber \\

194: &-& \ln \lambda_{max}(\{ y_{k_1,...,k_m}\})

195: \rbrack \}

196: \label{omega-sp}

197: \end{eqnarray}

198: \noindent where $\lambda_{max}$ is the maximal eigenvalue of the

199: corresponding transfer matrix.  For large $L$ and using the saddle

200: point method, the entropy, $H_2(\{C_{k_1,...,k_m} \})$, is given in the

201: leading order by

202: \begin{eqnarray}

203: H_2\left(\{C_{k_1...,k_m}\}\right) &=& {1 \over \ln 2} \lbrack

204: \frac{1}{k_0}\ln

205: \lambda_{max}\left (\{

206: y_{k_1,...,k_m}\}\right) \nonumber \\

207: &-&\sum_{k_1,...,k_m}^{k_0}y_{k_1,...,k_m}

208: C_{k_1,...,k_m} \rbrack

209: \label{entropy-ck}

210: \label{h2}

211: \end{eqnarray}

212: \noindent where $\{y_{k_1,...,k_m}\}$ are determined from the saddle

213: point equations of $\Omega$\cite{KK}. Assuming Binary Symmetric

214: Channel (BSC) and using Shannon's lower bound, the channel capacity of

215: sequences with a given set of autocorrelations bounded by a distance

216: $k_0$ is given by

217: \begin{equation}

218: C=\frac{1-H_{2}\left(f\right)}{H_{2}(\left\{C_{k_1,...,k_m}\}\right)-

219: H_{2}\left(P_{b}\right)}

220: \label{capacity}

221: \end{equation}

222: \noindent where $f$ is the channel bit error rate and $p_b$ is a bit

223: error rate.  The saddle point solutions derived from

224: eq. \ref{omega-sp} indicate that the equilibrium properties of the

225: one-dimensional Ising spin system ($x_i=\pm1$) with up to order $k_0$

226: multi-spin interactions\cite{ido-msi}

227: \begin{equation}

228: H=-\sum_i \sum_{k=1}^{k_0} \frac{y_{k_1,...,k_m}}{\beta }

229: x_{i}\prod_{j=0}^m x_{i+k_j}

230: \label{hamiltonian}

231: \end{equation}

232: \noindent obey in the leading order the autocorrelation constraints,

233: eq. \ref{ck}.  This property of the effective Hamiltonian,

234: eq. \ref{hamiltonian}, is used in simulations to generate an ensemble of

235: signals (source messages) with the desired set of

236: autocorrelations. {\it Note that in the following we choose $\beta=1$,

237: and hence we denote $\{y_{k_1,...,k_m}\}$ as interactions.}

238:

239:

240: %%%%%%%%%%%%%%%%%%

241:

242:

243: The transfer matrix method indicates that the relevant scale of the

244: correlated source message is $k_0$.  Hence, our encoding/decoding

245: procedure is based on the MN code\cite{MacKay} for a finite field

246: $q=2^{k_0}$ \cite{LDPC-GF(q),Davey}, which is based on the

247: construction of two sparse matrices $A$ and $B$ of dimensionalities

248: $L_0/R\!\times\! L_0$ and $L_0/R\! \times\! L_0/R$ respectively, where

249: $R$ is the code-rate and $L_0=L/k_0$.

250: %The non-zero elements of

251: %the matrices $A$ and $B$ are chosen following the construction of the

252: %Kanter-Saad code\cite{KS-LDPC}.

253: %Following the suggestion of reference

254: %\cite{LDPC-GF(q)} the non-zero elements of the matrices are chosen

255: %from a careful selected random distribution in order to maximize the

256: %entropy of each bit in the syndrome.

257: The matrix $B^{-1}A$ is then used for encoding the message

258: \begin{equation}

259: t = B^{-1}A x\  (\: \mathbf{mod}\:\ \ q)

260: \label{trans}

261: \end{equation}

262: The finite field message vector $t$ is mapped to a binary vector and

263: then transmitted. The received message, $r$, is corrupted by the

264: channel bit error rate, $f$.

265:

266:

267: The decoding of symbols of $k_0$ successive bits (named in the

268: following as a {\it block} of bits or binary variables) is based on the

269: solution of the syndrome

270: \begin{equation}

271: Z= Br = Ax + Bn\   (\: \mathbf{mod}\:\ \ q)

272: \label{decoding}

273: \end{equation}

274: \noindent where $n$ stands for the corresponding noise of $k_0$

275: successive bits.  The solution of the $L_0/R$ equations with

276: $L_0(1/R+1)$ variables is based on the standard message passing

277: introduced for the MN decoder over Galois fields with

278: $q=2^{k_0}$\cite{LDPC-GF(q),Davey} and with the following modification. The

279: horizontal pass is left unchanged, {\it but a dynamical set of

280: probabilities assigned for each block is used in the vertical

281: pass}. The Dynamical Block Probabilities (DBP), $\{P_n^c\}$, are

282: determined following the current belief regarding the neighboring

283: blocks and are given by

284: \begin{eqnarray}

285: \gamma_{n}^{c} & = & S_{I}\left(c\right)\left(\sum

286: _{l=1}^{q}q_{L}^{l}S_{L}\left(l,c\right)\right)\left(\sum

287: _{r=1}^{q}q_{R}^{r}S_{R}\left(c,r\right)\right)\nonumber \\ P_{n}^{c}

288: & = & \frac{\gamma _{n}^{c}}{\sum _{j=1}^{q}\gamma

289: _{n}^{j}}\label{tm-vertical-pass}

290: \label{dbp}

291: \end{eqnarray}

292: \noindent where $l/r/c$ denotes the state of the left/right/center

293: ($n\!-\!1\,/\,n\!+\!1\,/\,n$) block respectively and

294: $q_{L}^{l}/q_{R}^{r}$ are their posterior probabilities.

295: $S_I(c)=e^{-\beta H_I}$ stands for the Gibbs factor of the inner

296: energy of a block, $k_0$ successive binary variables spins,

297: characterized by an energy $H_I$ at a state $c$, see

298: eq. \ref{hamiltonian}.

299: %$S_I(c)=e^{-\beta H_I}$,

300: %where $H_I$ is the inner energy of a block of $k_0$ spins at a state

301: %$c$, see eq. 6.

302: Similarly $S_L(l,c)$ ($S_R(c,r)$) stands for the Gibbs factor of

303: consecutive Left/Center (Center/Right) blocks at a state $l,c$

304: $(c,r)$ \cite{KK,KR}. The complexity of the calculation of the block prior

305: probabilities is $O(Lq^2/ \log q)$ where $L/\log q$ is the number of

306: blocks.  The decoder complexity per iteration of the MN codes over a

307: finite field $q$ can be reduced to order

308: $O(Lqu)$\cite{David-Mackay2,David-Mackay1}, where $u$

309: stands for the average number of checks per block.  Hence the total

310: complexity of the DBP decoder is of the order of $O(Lqu+Lq^2/ \log

311: q)$.

312:

313:

314: Another way to represent the dynamical behavior of the SM joint S/C

315: decoder is in the framework of message passing on a graph. Typically,

316: the graph is bipartite and consists of variable nodes and check nodes.

317: A message from variables to checks is a horizontal pass, and a message

318: from checks to variables is a vertical pass. In the SM joint S/C

319: decoder there are {\it three} layers, as presented in

320: Fig. \ref{message-passing}. The first layer represents the checks and

321: the second layer represents the variables, where each variable and

322: check stands for a block of $k_0$ bits. The size of the third layer,

323: denoted as dynamical block posterior probabilities derived from the

324: Transfer Matrix (TM) method, is equal to the size of the source in

325: blocks, $L_0=L/k_0$.  Each element in the third layer receives two

326: arrows, representing the posterior probabilities of the neighboring

327: blocks, and sends one output arrow to the center block, representing

328: the current updated dynamical posterior probabilities which are then

329: used for the vertical pass.

330:

331:

332: %\begin{figure}

333: %\centering

334: %\includegraphics[width=2.5in]{myfigure}

335: % where an .eps filename suffix will be assumed under latex,

336: % and a .pdf suffix will be assumed for pdflatex

337: %\caption{Simulation Results}

338: %\label{fig_sim}

339: %\end{figure}

340:

341: \begin{figure}

342: \centering

343: %\vspace{2.5cm}

344: %\includegraphics[width=2.5in]{2/layer_graph_BW.eps}

345: %\includegraphics[width=2.5in]{2/Layer_graph.eps}

346: \centerline{\epsfxsize=2.5in \epsffile{layer_graph_BW.eps}}

347: \caption{A message passing in the SM joint S/C decoder is represented

348: by a graph with the following three layers.  The check blocks are

349: represented by full squares, the full/open circles denote source/noise

350: block variables and the open diamonds denote the calculators for the

351: dynamical block posterior probabilities for the source block

352: variables. Each one of these calculators receives an input message

353: from its two neighbors (module $L_0$) and sends its output message to

354: its block.  }

355: \label{message-passing}

356: \end{figure}

357:

358:

359:

360: For simplification of the discussion below, in almost all of the

361: simulation results we concentrate on rate $1/3$ and the construction

362: of the matrices $A$ and $B$ follow reference \cite{KS} which is

363: sketched in Fig. \ref{ks}. The advantage of this construction is

364: that the matrices $A$ and $B$ are very sparse, but the threshold of

365: the code for large blocks is only $1-3\%$ from the channel

366: capacity\cite{KS,KS-Gaussian}. Furthermore, since $B$ has a systematic

367: structure, the complexity of the encoder scales linearly with $L$

368: although $B^{-1}$ is dense\cite{saad,saad1}. Of course, codes with

369: higher thresholds exist (for instance in references

370: \cite{forney,David-Mackay2}), hence the performance of the joint S/C

371: algorithm reported below should be interpreted as a lower

372: bound. (Results for a limited example with rate greater than one,

373: $R>1$, are briefly discussed in reference \cite{r89})

374:

375: We conclude this section with the comment that the extension of the SM

376: joint S/C algorithm in the framework of the MN-Gallager decoder to the

377: Gallager decoder\cite{Gallager} is in question. In the Gallager

378: decoder we first solve $L_0(1/R-1)$ equations for the noise variables,

379: and only in the final step is the message recovered. Since the noise

380: is not spatially correlated, we do not see a simple way to incorporate

381: in the Gallager case the side information about the spatial

382: correlations among the message variables. The equivalence between

383: these two (MN-Gallager and Gallager) similar decoders is in question.

384:

385:

386: \begin{figure}

387: \centering

388: %\vspace{2.5cm}

389: %\includegraphics[width=2.5in]{2/KSconstructionBW.eps}

390: %\includegraphics[width=2.5in]{2/ks.eps}

391: \centerline{\epsfxsize=3.75in \epsffile{KSconstructionBW.eps}}

392: \caption{The structure of the matrices $A$ and $B$ for the MN decoder

393: taken from reference cite{KS}, for rate $1/3$. The black dots (area)

394: denote the non-zero elements of the matrices $A,~B,~B^{-1}$ }

395: \label{ks}

396: \end{figure}

397:

398: For illustration, in Fig. 3 we present results for rate $R=1/3$,

399: $L=10,000$, $q=4$ and $8$ where the decoding is based on the dynamical

400: block posterior probabilities, eq. \ref{dbp}, and with the following

401: parameters. For $q=4$ (open circles) $C_1=0.55,~C_2=0.5,~C_{12}=0.4$

402: ($y_1=0.275,~y_2=0.291,~y_{12}=0.422$) and $H_2=0.683$. Shannon's

403: lower bound, eq. \ref{capacity}, is denoted by the double dotted line,

404: where for $p_b=0$ the channel noise level is $f_c=0.227$. For $q=8$

405: (open diamonds) $C_1=0.77,~C_2=0.69,~C_3=0.56,~C_{123}=0.7$

406: ($y_1=0.349,~y_2=0.36,~y_3=0.211,~y_{123}=0.443$) and $H_2=0.453$.

407: Shannon's lower bound is denoted by the dashed line, where for

408: $p_b=0$ the channel noise level is $ f_c=0.275$.  Each point was

409: averaged over at least $1,000$ messages. These results for both

410: $q=4$ and $8$ indicate that the threshold of the presented decoder

411: with $L=10,000$ is $\sim 15\%-20\%$ below the channel capacity

412: for infinite source messages.

413:

414:

415: \begin{figure}

416: \centering

417: %\vspace{4.5cm}

418: %\includegraphics[width=2.5in]{2/msi_pb_f.eps}

419: \centerline{\epsfxsize=2.5in \epsffile{msi_pb_f1.eps}}

420: \caption{Simulation results for rate $R=1/3$, $L=10,000$, $q=4$ and

421: $8$ with the following parameters. For $q=4$ (open circles)

422: $C_1=0.55,~C_2=0.5,~C_{12}=0.4$ ($y_1=0.275,~y_2=0.291,~y_{12}=0.422$)

423: and $H_2=0.683$. Shannon's lower bound, eq. \ref{capacity}, is

424: denoted by the double dotted line. For $q=8$ (open diamonds)

425: $C_1=0.77,~C_2=0.69,~C_3=0.56,C_{123}=0.7$

426: ($y_1=0.349,~y_2=0.36,~y_3=0.211,~y_{123}=0.443$) and $H_2=0.453$.

427: Shannon's lower bound is denoted by the dashed line.  Each point

428: was averaged over at least $1,000$ source messages with the desired set

429: of autocorrelations.  }

430: \end{figure}

431:

432:

433:

434: \section{The threshold of the code}

435:

436: An interesting question is to measure the efficiency of the decoder,

437: eq. \ref{dbp}, as a function of the maximal correlation length taken

438: $k_0$, the strength of the correlations, the size of the finite fields

439: $q$ and to compare the efficiency with the separation schemes.  A

440: direct answer to the questions raised is to implement exhaustive

441: simulations on increasing source length, various finite fields $q$,

442: and sets of autocorrelations, which result in the bit error

443: probability versus the flip rate $f$. Besides the enormous

444: computational time required, the conclusions would be controversial

445: since it is unclear how to compare, for instance, the performance as a

446: function of $q$; with the same number of transmitted blocks or with

447: the same number of transmitted bits.

448:

449: In order to overcome these difficulties, for a given MN-Gallager code

450: and with DBP decoding over GF(q) and a set of autocorrelations, the

451: threshold $f_c$ for $L \rightarrow  \infty $

452: is estimated from the

453: scaling argument of the convergence time, which was previously

454: observed for $q=2$\cite{KS,KS-Gaussian}.  The median number of

455: message passing steps, $t_{med}$, necessary for the convergence of the

456: MN-DBP algorithm is assumed to diverge as the level of noise

457: approaches $f_c$ from below. More precisely, we found that the scaling

458: for the divergence of $t_{med}$ is {\it independent of $q$} and is

459: consistent with

460: \begin{equation}

461: t_{med} = {A \over f_c-f}

462: \label{scaling}

463: \end{equation}

464: \noindent where for a given set of autocorrelations and $q$, $A$ is a

465: constant. Moreover, for a given set of autocorrelations and a finite

466: field $q$, the extrapolated threshold $f_c$ is independent of $L$, as

467: demonstrated in Fig. 4.  This observation is essential to determine

468: the threshold of a code based on the above scaling behavior. Note that

469: the estimation of $t_{med}$ is a simple computational task in

470: comparison with the estimation of low bit error probabilities for

471: large $L$, especially close to the threshold. We also note that the

472: analysis is based on $t_{med}$ instead of the average amount of

473: message passing, $t_{av}$,\cite{KS} since we wish to prevent the

474: dramatic effect of a small fraction of finite samples with slow

475: convergence or no convergence.\cite{domany,median}

476:

477: \begin{figure}

478: \centering

479: %\vspace{0.5cm}

480: %\includegraphics[width=2.5in]{gf4_different_N.eps}

481: \centerline{\epsfxsize=2.5in \epsffile{gf4_different_N.eps}}

482: \caption{The flip rate $f$ as a function of $1/t_{med}$ for GF(4) with

483: $C_1=C_2=0.8$ and $L=1,000,~5,000~,50,000$.  The lines are a result of

484: a linear regression fit. The threshold, $f_c \sim 0.272$, extrapolated

485: from the scaling behavior eq. \ref{scaling}, is independent of $N$.  }

486: \end{figure}

487:

488: All simulation results presented below are derived for rate $1/3$ and

489: the construction of the matrices $A$ and $B$ of the MN code are taken

490: from \cite{KS}. In all examined sets of autocorrelations, $10^3 \le L

491: \le 5\!\times\!10^4$ and $4 \le q \le 64$, the scaling for the median

492: convergence time was indeed recovered. For illustration, in Fig. 5, we

493: present the scaling behavior for the amount of message passing for the

494: two examined cased presented in Fig. 3. (Note that this decoder can be

495: extended to rate $R>1$ and results for a limited example are presented

496: in reference \cite{r89})

497:

498:

499:

500:

501: \begin{figure}

502: \centering

503: \vspace{3.5cm}

504: %\includegraphics[width=2.5in]{2/msi_fit.eps}

505: \centerline{\epsfxsize=2.5in \epsffile{msi_fit1.eps}}

506: \caption{The flip rate $f$ as a function of $1/t_{med}$ for the two

507: examined cases of Fig. 3. The extrapolated threshold for $q=4,~8$ is

508: $0.223,~0.265$,  which are about $98\%$ of the Shannon's lower bound

509: $0.2267,~0.275$, respectively.

510: }

511: \end{figure}

512:

513:

514: For a given set of autocorrelations, $\{C_{k_1,...,k_m}\}$ where $k_m

515: \le k_0$, the MN decoder, eq. \ref{dbp}, can be implemented with any

516: field $q \ge 2^{k_0}$.  In order to optimize the complexity of the

517: decoder it is clear that one has to work with the minimal allowed

518: field, $q=2^{k_0}$.  However, when the goal is to optimize the

519: threshold of the code, the selection of the optimal field, $q$, is in

520: question. To answer this question we present in Fig. 6 results for

521: $k_0=2$ ($C_1=C_2=0.86$) and $q=4,~16,~64$. It is clear that the

522: threshold, $f_c$, increases as a function of $q$ as was previously

523: found for the case of $i.i.d$ sources.\cite{LDPC-GF(q),KABA} More

524: precisely, the estimated thresholds for $q=4,~16,~64$ are $\sim

525: 0.293,~0.3,~0.309$, respectively, and the corresponding Ratios

526: ($\equiv f_c/f_{Sh}$) are $0.913,~0.934, 0.962$, where Shannon's lower

527: bound $f_{Sh}=0.321$.  Note that the extrapolation of $f_c$ for large

528: $q$ appears asymptotically to be consistent with $f_c(q) \sim 0.316

529: -0.18/q$.

530:

531:

532:

533:

534: \begin{figure}

535: \centering

536: %\vspace{1.2cm}

537: %\includegraphics[width=2.5in]{2/Fc_for_diffrent_q.eps}

538: \centerline{\epsfxsize=2.5in \epsffile{Fc_for_diffrent_q.eps}}

539: \caption{The scaling behavior, $f$ as a function of $1/t_{med}$, for

540: $C_1=C_2=0.86$ and $q=4,~16,~64$. The lines are a result of a linear

541: regression fit. The estimated thresholds for $q=4,~16,~64$ are

542: $0.293,~0.3,~0.309$, and the corresponding $Ratio \equiv f_c/f_{Sh}=

543: 0.913,~0.934, 0.962$, where $f_{Sh}=0.321$.}

544: \end{figure}

545:

546:

547: \section{Comparison between joint and separation schemes}

548:

549:

550: Results of simulations for $q=4,~8,~16$ and $32$ and selected sets of

551: autocorrelations are summarized in Table I (Fig. \ref{t1}) and the

552: definition of the symbols is: $\{C_k\}$ denotes the imposed values of

553: two-point autocorrelations as defined in eqs.  \ref{ck} and

554: \ref{omega}; $\{y_k\}$ are the interaction strengths,

555: eq. \ref{hamiltonian}; $H$ represents the entropy of sequences with

556: the given set of autocorrelations, eq. \ref{entropy-ck}; $f_c$ is the

557: estimated threshold of the MN decoder with the DBP derived from the

558: scaling behavior of $t_{med}$, eq. \ref{scaling}; $f_{Sh}$ is

559: Shannon's lower bound, eq. \ref{capacity}; Ratio is the efficiency of

560: our code, $f_c/f_{Sh}$; $Z_R$ indicates the gzip compression rate

561: averaged over files of the sizes $10^5-10^6$ bits with the desired set

562: of autocorrelations. We assume that the compression rate with $L=10^6$

563: achieves its asymptotic ratio, as was indeed confirmed in the

564: compression of files with different $L$; $1/R^{\star}$ indicates the

565: ideal (minimal) ratio between the transmitted message and the source

566: signal after implementing the following two steps: compression of the

567: file using gzip and then using an {\it ideal optimal encoder/decoder},

568: for a given BSC with $f_c$.  A number greater than (less than) $3$ in

569: this column indicates that the MN joint S/C decoder is more efficient

570: (less efficient) in comparison to the channel separation method using

571: the standard gzip compression.  The last four columns of Table I

572: (Fig. \ref{t1}) are devoted to the comparison of the presented joint

573: S/C decoder with advanced compression methods. $PPM_R$ and $AC_R$

574: represent the compression rate of files of the size $10^5-10^6$ bits

575: with the desired autocorrelations using the Prediction by Partial

576: Match\cite{PPM} and for the Arithmetic Coder\cite{AC},

577: respectively. Similarly to the gzip case, $1/R_{PPM}$ and $1/R_{AC}$

578: denote the optimal (minimal) rate required for the separation process

579: (first a compression and then an ideal optimal encoder/decoder)

580: assuming a BSC with $f_c$.

581:

582:

583: \begin{figure*}

584: \centering

585: %\includegraphics[width=5.75in]{2/gf4_table.ppm1.eps}

586: \centerline{\epsfxsize=5.75in \epsffile{gf4_table.ppm1.eps}}

587: \caption{Results for $q=4,~8,~16,~32$ and selected sets of

588: two-point autocorrelations $\{C_k\}$

589: }

590: \label{t1}

591: \end{figure*}

592:

593:

594:

595:

596: Table I indicates the following main results: (a) For $q=4$ (the upper

597: part of Table I) a degradation in the performance is observed as the

598: correlations are enhanced, and as a result the entropy decreases.  The

599: degradation appears to be significant as the entropy is below $\sim

600: 0.3$ (or for the test case $R=1/3$, $f_c \ge 0.3$).\cite{bias} A

601: similar degradation was also observed for larger values of $q$ as the

602: entropy decreases. (b) The efficiency of our joint S/C coding

603: technique is superior to the alternative standard gzip compression in

604: the S/C separation technique. For high entropy the gain of the MN

605: decoder is about $5-10\%$.  This gain disappears as the entropy and

606: the performance of the presented decoder, eq. \ref{dbp}, are

607: decreased. (c) In comparison to the standard gzip, the compression

608: rate is improved by $2-5\%$ using the AC method. A further improvement

609: of a few percent is achieved by the PPM compression. This latter

610: improvement appears to be significant in the event of low entropy. (d)

611: With respect to the performance, the presented joint S/C decoder,

612: eq. \ref{dbp}, appears to be comparable with the presented separation

613: methods, but for low entropy it appears that the PPM compression is

614: superior. However, one should bear in mind a better threshold for the

615: MN code can be found by optimizing the code \cite{forney}.  (e) With

616: respect to the computational time of the S/C coding, our limited

617: experience indicates that the joint S/C decoder is faster than the AC

618: separation method and the PPM separation method is substantially

619: slower. Finally, we note that using the side information, the set of

620: autocorrelations, one can design a special compression procedure which

621: may overcome the disadvantages of the abovementioned compression

622: methods \cite{manfred}.

623:

624:

625: \section{ The role of the spectrum of eigenvalues}

626:

627: For a given $q$, there are many sets of autocorrelations,

628: $\{C_{k_1,...,k_m} \}$, in $q$ dimensions obeying the same entropy

629: (see the discussion in section VI below).  An interesting question is

630: whether the performance of the presented MN decoder measured by the

631: Ratio $(\equiv f_c/f_{Sh})$ is a function of the entropy only. Our

632: numerical simulations indicate that the entropy is not the only

633: parameter which controls the performance of the algorithm. For the

634: same entropy and $q$ the Ratio can fluctuate widely among different

635: sets of correlations.  For illustration, in Table II (Fig. \ref{t2})

636: results for two sets of autocorrelations with {\it the same entropy}

637: are summarized for each $q=4,~8,~16$ and $32$. It is clear that as the

638: Ratio $(\equiv f_c/f_{Sh})$ is much degradated the gzip performance is

639: superior (the second example with $q=8$ and $32$ in Table II

640: (Fig. \ref{t2}) where the Ratio is $0.8$ and $0.72$, respectively).

641: The crucial question is to find the criterion to classify the

642: performance of the algorithm among all sets of autocorrelations

643: obeying the same entropy.  Our generic criterion is {\it the decay of

644: the correlation function over distances beyond two successive blocks}.

645: However, before examination of this criterion, we return to some

646: aspects of statistical physics.

647:

648:

649:

650:

651: The entropy of sequences with a given set of autocorrelations bounded by

652: a distance $k_0=\log_2(q)$ is determined via the effective Hamiltonian

653: consisting of $q$ interactions, eq. \ref{hamiltonian}.  As a result

654: the entropy of these sequences is {\it the same} as the entropy of the

655: effective Hamiltonian, $H\{y_{k_1,...,k_m} \}$, at the inverse

656: temperature $\beta=1$, eq. \ref{h2}.  As for the usual scenario of the

657: transfer matrix method, the leading order of quantities such as

658: free energy and entropy are a function of the {\it largest

659: eigenvalue} of the transfer matrix only. On the other hand the decay

660: of the correlation function is a function of the whole spectrum of the

661: $q=2^{k_0}$ eigenvalues (and eigenvectors)\cite{baxter}.

662: Asymptotically, the decay of the correlation function is determined

663: from the ratio between the second largest eigenvalue, $\lambda_2$, and

664: the largest eigenvalue, $\lambda_2/\lambda_{max}$.  From the

665: statistical mechanical point of view, one may wonder why the first $q$

666: correlations can be determined using the information of

667: $\lambda_{max}$ only. The answer is that once the

668: transfer matrix is defined as a function of $\{y_{k_1,...,k_m} \}$,

669: eqs. 3-7, {\it all eigenvalues} are determined as well as

670: $\lambda_{max}$. There is no way to determine $\lambda_{max}$

671: independently of all other eigenvalues.

672:

673: In Table II (Fig. \ref{t2}) results of the MN decoder, eq. \ref{dbp}, for

674: $q=4,~8,~16,~32$ are presented. For each $q$, two different sets of

675: autocorrelations characterized by the {\it same entropy} and threshold

676: $f_{Sh}$ are examined.  The practical method we used to generate

677: different sets of autocorrelations with the same entropy was a simple

678: Monte Carlo over the space of $\{C_{k_1,...,k_m} \}$\cite{haggai1}.

679: The additional column in Table II (in comparison with Table I) is the

680: ratio between $\lambda_2/\lambda_{max}$, which characterizes the decay

681: of the correlation function over large distances.  It is clear that

682: for a given entropy as $\lambda_2/\lambda_{max}$ increases/decreases,

683: the performance of the joint S/C decoder measured by the Ratio

684: $f_c/f_{Sh}$ is degradated/enhanced, independent of $q$.  The new

685: criterion to classify the performance of the decoder among all sets of

686: autocorrelations obeying the same entropy is the decay of the

687: correlation function.  This criterion is consistent with the tendency

688: that as the first $k_0$ two-points autocorrelations are

689: increased/decreased a degradation/enhancement in the performance is

690: observed (see Table I).  The physical intuition is that as the

691: correlation length increases, the relaxation time to the equilibrium

692: macroscopic state increases, and flips on larger scales than nearest

693: neighbor blocks are required. Finally, we note that in the general

694: scenario, the first two largest eigenvalues are not sufficient to

695: approximate the correlation function on short length scales and the

696: comparison of the efficiency of the decoder should take into account

697: the entire spectrum of eigenvalues and the eigenvectors \cite{baxter}.

698:

699:

700:

701: \begin{figure*}

702: \centering

703: %\includegraphics[width=5.75in]{2/Lamda_Rate_table.ps}

704: \centerline{\epsfxsize=5.75in \epsffile{Lamda_Rate_table.ps}}

705: \caption{Results for $q=4,~8,~16,~32$ and different sets of two-point

706: autocorrelations.  For each $q$, two different sets of two-point

707: autocorrelations characterized by the same entropy and threshold

708: $f_{Sh}$ are examined.  As $\lambda_2/\lambda_{max}$

709: increases/decreases, the performance of the joint S/C decoder measured

710: by the Ratio $f_c/f_{Sh}$ is degradated/enhanced.}

711: \label{t2}

712: \end{figure*}

713:

714: Note that the decay of the correlation function in the intermediate

715: region of a small number of blocks is a function of all the $2^{k_0}$

716: eigenvalues.  Hence, in order to enhance the effect of the fast decay

717: of the correlation function in the case of small

718: $\lambda_2/\lambda_{max}$, we also try to enforce in our Monte Carlo

719: search that all other $2^{k_0}-2$ eigenvalues be less than

720: $A\lambda_{max}$ with the minimal possible constant $A$.  This

721: additional constraint was easily fulfilled for $q=4$ with $A=0.1$, but

722: for $q=32$ the minimal $A$ was around $0.5$.

723:

724:

725: \section{ Possible sets of autocorrelations and the Simplex algorithm}

726:

727: The entropy of correlated sequences can be calculated from

728: eq. \ref{h2}.  For the simplest case of sequences obeying only

729: $C_1$ and $C_2$ the numerical solution of the saddle point equations

730: indicate that the entropy is non-zero only in the regime

731: \begin{equation}

732: -(1+C_2)/2 \le C_1 \le (1+C_2)/2

733: \label{c1c2}

734: \end{equation}

735: \noindent where out of this regime the entropy is zero. The

736: boundaries, $C_1=|(1+C_2)/2|$, are characterized by the following

737: phenomena: (a) the entropy falls abruptly to zero at the boundary, and

738: (b) $y_1$ and $-y_2$ diverge at the boundary (the one-dimensional

739: Hamiltonian, eq. \ref{hamiltonian} consists of frustrated loops).

740:

741: These limited results obtained from the numerical solutions of the

742: saddle point equations suffer from the following limitations: (a)

743: finding the boundaries of the regime in the spaces of

744: $\{C_{k_1,...,k_m}\}$ with a finite entropy is very sensitive to the

745: numerical precision since on the boundary $\{|y_i|\}$ diverge; (b) it is

746: unclear whether the available space consists of a connected regime; (c)

747: the question of whether out of the space with a finite entropy, there are

748: a finite or infinite number of sequences (for instance $e^{\sqrt{L}}$)

749: obeying the set of autocorrelations cannot be answered using the

750: saddle point method; (d) extension of the saddle point solutions

751: to identify the boundaries of the finite entropy regime to many dimensions

752: is a very heavy numerical task.

753:

754:

755: To overcome these difficulties and to answer the above questions, we

756: show below how the possible sets of autocorrelations can be identified

757: using the Simplex algorithm.

758:

759:

760: For the case of only two constraints $C_1$ and $C_2$, for instance,

761: let us concentrate on three successive binary variables

762: $S_i,S_{i+1},S_{i+2}$, where $S_i=\pm 1$.  Since the Hamiltonian,

763: eq. \ref{hamiltonian}, obeys in this case an inversion symmetry, let us

764: examine only the $4$ configurations out of $8$ where $S_2=-$,

765: $(\pm,-,\pm)$. For these $4$ configurations one can assign the

766: following marginal probabilities, $P(\pm,-,\pm)$, where each

767: probability stands for the fraction of sequences obeying $C_1$ and

768: $C_2$ with a given state for these three successive binary

769: variables. In the SM language we measure the probabilities of these

770: four states in thermal equilibrium of the micro-canonical ensemble

771: obeying eq.  \ref{ck}. It is clear that the Hamiltonian,

772: eq. \ref{hamiltonian}, is translationally invariant,

773: $P(S_i,S_{i+1},S_{i+2})$ is independent on $i$ after averaging over

774: all sequences obeying the constraints of eq. \ref{ck}.

775:

776:

777: For these $4$ marginal probabilities one can write the following

778: $8$ equations, see eq. \ref{simplex12}.

779:

780: \noindent For a given $C_1$, these $8$ equations and inequalities can

781: be solved for the minimum and the maximum available $C_2$ using the

782: Simplex method. Running over values of $-1 \le C_1 \le 1$, we indeed

783: recover the result of eq. \ref{c1c2}. However, the {\it Simplex

784: solution indicates the lack of even finite sequences beyond the regime

785: with finite entropy}. Hence simple geometrical calculation of

786: constraint \ref{c1c2} indicates that the fraction of the space

787: $(C_1,C_2)$ with available sequences is $1/2$.

788:

789:

790: For the case of three constraints, $C_1, C_2$ and $C_3$, one can

791: similarly write the following $15$ equalities and inequalities for the

792: $8$ probabilities of $4$ successive binary variables

793: $P(\pm,\pm,-,\pm)$, see eq. \ref{simplex123}.

794:

795:

796: \noindent For a given $C_1$ and $C_2$, these $15$ equations and

797: inequalities can be solved for the minimum and the maximum available

798: $C_3$ using the Simplex method. The Simplex solution indicates: (a)

799: the available solution in the three-dimensional box $(-1:1,-1:1,-1:1)$

800: for $(C_1,C_2,C_3)$ is a connected region bounded by a few plans whose

801: detailed equations will be given elsewhere \cite{shahar}; (b) the

802: fraction of the volume of the box with a positive number of sequences

803: obeying the three constants is $\sim 0.222$.  Preliminary results

804: indicate that for $4$ ($C_i,~i=1,2,3,4$) and $5$ ($C_i,~i=1,2,3,4,5$)

805: constraints the available volume is $\sim 0.085,~0.034$, respectively.

806:

807: The fraction of possible sets of autocorrelations appears  to decrease as

808: the number of constraints increases. However, the question of whether the

809: fraction of available autocorrelations drops exponentially with the

810: number of constraints as well as its detailed spatial shape is the

811: subject of our current research \cite{shahar}.

812:

813: We conclude the discussion in this section with the

814: following general result \cite{manfred}. The available volume for the

815: general case of $q$ constraints $\{C_{k_1,...,k_m}\}$ $k_m<\log_2(q)$

816: is convex. The main idea is that one can verify that the set of

817: equalities can be written in a matrix representation in the following

818: form

819: \begin{equation}

820: {\bf M} P = C

821: \label{manfred}

822: \end{equation}

823: \noindent where ${\bf M}$ is a matrix with elements $\pm 1$; $P$

824: represents the marginal probabilities $P(\pm,\pm,....)$ and $C$

825: represents the desired correlations or a normalization constant

826: (for instance $C_1/2$, $C_2/2$ and $1/2$, for the case of

827: eq. \ref{simplex12}). The inequalities force the probabilities into

828: the range $\lbrack 0:1\rbrack$. Clearly if

829: $P_1(\pm,\pm,...)$ and $P_2(\pm,\pm,...)$ are two sets of

830: probabilities obeying eq.  \ref{manfred} then

831: \begin{equation}

832: \lambda P_1+(1-\lambda)P_2

833: \label{convex}

834: \end{equation}

835: \noindent is also a solution of the set of the equalities ($0 \le

836: \lambda \le 1$).  Hence, the available volume is convex.

837:

838:

839: %\onecolumn

840:

841:

842: \begin{eqnarray}

843: &P&(-,-,+)+P(-,-,-)-P(+,-,+)-P(+,-,-)=C_1/2 \nonumber\\

844: &P&(+,-,-)+P(-,-,-)-P(+,-,+)-P(-,-,+)=C_1/2  \nonumber\\

845: &P&(-,-,+)+P(-,-,-)+P(+,-,+)+P(+,-,-)=1/2  \nonumber\\

846: &P&(+,-,+)+P(-,-,-)-P(+,-,-)+P(-,-,+)=C_2/2  \nonumber\\

847: &0& \le P(\pm,-,\pm) \le 1

848: \label{simplex12}

849: \end{eqnarray}

850:

851: \begin{eqnarray}

852: &P&(+,+,-,+)+P(+,+,-,-)+P(-,-,-,+)+P(-,-,-,-)  \nonumber \\

853: -&P&(+,-,-,+)-P(+,-,-,-)-P(-,+,-,+)- P(-,+,-,-)=C_1/2 \nonumber \\

854: &P&(+,-,-,+)+P(+,-,-,-)+P(-,-,-,+)+P(-,-,-,-) \nonumber \\

855: -&P&(+,+,-,+)-P(+,+,-,-)-P(-,+,-,+)-P(-,+,-,-)=C_1/2 \nonumber \\

856: &P&(+,+,-,-)+P(+,-,-,-)+P(-,+,-,-)+P(-,-,-,-) \nonumber \\

857: -&P&(+,+,-,+)-P(+,-,-,+)-P(-,+,-,+)-P(-,-,-,+)=C_1/2 \nonumber \\

858: &P&(-,+,-,+)+P(-,+,-,-)+P(-,-,-,+)+P(-,-,-,-) \nonumber \\

859: -&P&(+,+,-,+)-P(+,+,-,-)-P(+,-,-,+)-P(+,-,-,-)=C_2/2 \nonumber \\

860: &P&(+,+,-,+)+P(+,-,-,-)+P(-,+,-,+)+P(-,-,-,-)  \nonumber \\

861: -&P&(+,+,-,-)-P(+,-,-,+)-P(-,+,-,-)-P(-,-,-,+)=C_2/2 \nonumber \\

862: &P&(+,+,-,+)+P(+,-,-,+)+P(-,+,-,-)+P(-,-,-,-) \nonumber \\

863: -&P&(+,+,-,-)-P(+,-,-,-)-P(-,+,-,+)-P(-,-,-,+)=C_3/2 \nonumber \\

864: &P&(+,+,-,+)+P(+,-,-,-)+P(-,+,-,+)+P(-,-,-,-) \nonumber \\

865: +&P&(+,+,-,-)+P(+,-,-,+)+P(-,+,-,-)+P(-,-,-,+)=1/2 \nonumber \\

866: &0& \le P(\pm,\pm,-,\pm) \le 1

867: \label{simplex123}

868: \end{eqnarray}

869: %\twocolumn

870:

871:

872: \section{ Drawbacks of the SM approach}

873:

874: The presented joint S/C decoder based on the SM approach

875: suffers from the following drawbacks:

876: %(a) As the strength of the interactions is enhanced,

877: %$|y_{k_1,...,k_m}| \gg 0$, a dramatic degradation in the performance

878: %of the DBP algorithm is observed. In Fig. 2, results of simulations

879: %for rate, $R=1/3$, $L=10,000$ and $C_1=0.64,~C_2=0.33$ ($y_1 \sim 1.4,

880: %y_2 \sim -0.44$) are presented. The performance of the DBP algorithm

881: %is degradated even in a comparison to the same algorithm with unbiased

882: %({\it i.i.d}) sources.

883: (a) For each transmitted block one must calculate a $q \times q$

884: matrix, where each element of this matrix is a function of all $q$

885: autocorrelations, $\{ C_{k_1,...,k_m}\}$.  Hence, the naive complexity

886: of the construction of the transfer matrix is $O(q^4)$.  Furthermore,

887: for each transmitted block the complexity of the calculation of the

888: leading eigenvalue of the transfer matrix is of $O(q^3)$.  (b) The

889: required memory is of the order $O(q^2)$, where, for instance, for

890: $K_0=20$ it results in a 1Mega Bytes.  (c) The solution of the saddle

891: point equations, eqs. 4-5, requires the calculation many times and

892: with high precision of the leading eigenvalue of $q \times q$ matrix.

893: From our experience, the calculation with high precision

894: of the saddle point equations in $q=2^{k_0}$ dimensions, $\{

895: y_{k_1,...,k_m} \}$ is a heavy numerical task for $k_0 \ge 4$.  (d)

896: The extension of the decoder based on the SM approach to include an

897: array of bits in two or a higher number of dimensions is impossible,

898: since the trace in eq. \ref{omega} can be done only for very limited

899: two-dimensional cases\cite{baxter}.

900:

901: \section{ Joint S/C decoder with advanced threshold}

902:

903: In order to overcome some of the abovementioned difficulties we

904: present in this section a decoder with an advanced threshold, where

905: the decoder gains from fluctuations among different finite source

906: messages.  For a given sequence of $L$ bits, $\{x_1,~x_2,~...,~x_L\}$,

907: and $k_m \le k_0$, there are $L_0=L/k_0$ blocks, denoted by

908: $\{A_1,~A_2,...,~A_{L_0}\}$.  For a given finite field $q=2^{k_0}$ we

909: denote the number of possible different blocks by

910: $B_m ~m=1,~2,...,~q$. In the first step of the algorithm, the

911: probability of occurrence of all three possible successive blocks is

912: calculated

913: \begin{equation}

914: {\hat P}(B_i,B_j,B_k) \equiv {1 \over L_0} \sum_{m=1}^{L_0} \delta_{A_m,B_i}

915: \delta_{A_{m+1},B_j} \delta_{A_{m+2},B_k}

916: \label{qqq}

917: \end{equation}

918: \noindent where we assume periodic boundary conditions. Note that

919: although the number of possible triplets of blocks is $2^{3k_0}$, the

920: complexity of this step for a given source message scales linearly

921: with $L$.\cite{comment}

922:

923: The number of non-zero probabilities of occurrence of triplets is

924: bounded from above by $L_0$ .  However, in a typical scenario of some

925: enhanced autocorrelations the number of non-zero ${\hat

926: P}(B_i,B_j,B_k)$ is expected to be $\ll L_0$. Hence in the regime

927: where $q^3 \gg L_0$ most of the ${\hat P}(B_i,B_j,B_k)$ are equal to

928: zero, and the tensor, ${\hat P}(B_i,B_j,B_k)$, can be efficiently kept

929: as a very sparse tensor.  The sparseness of the tensor is expected

930: even for long sequences, for instance, for $L=10^5$ and $q=128$

931: ($k_0=7$) $128^3 \gg 10^5/7$. In the following we discuss the

932: importance of this observation.

933:

934: The decoding of symbols of $k_0$ successive bits is again based on the

935: standard message passing introduced for the MN decoder over Galois

936: fields with $q=2^{k_0}$\cite{LDPC-GF(q)} and with the following

937: modification. The horizontal pass is left unchanged, {\it but a

938: dynamical set of probabilities assigned for each block is used in the

939: vertical pass}. The Dynamical Block Probabilities (DBP), $\{P_n^c\}$,

940: are determined following the current belief regarding the neighboring

941: blocks in the following way

942: \begin{equation}

943: \gamma_{n}^{B_m} = \sum_{i,j=1}^q { {\hat P}(B_i,B_m,B_j) \over \sum_{b=1}^q

944: {\hat P}(B_i,b,B_j) } q_{m-1}^i q_{m+1}^j

945: \label{dbp3}

946: \end{equation}

947: \begin{equation}

948: {\hat P}_{n}^{B_m} = \frac{\gamma_{n}^{B_m}}{\sum _{j=1}^{q}\gamma

949: _{n}^{j}}

950: \label{dbp3a}

951: \end{equation}

952: \noindent where $q_{m+1}^i/q_{m-1}^j$ stands for the posterior

953: probabilities of the right/left block in the state $i/j$.

954:

955:

956: We compared the performance of this decoder with the performance of

957: the previously discussed decoder based on the SM approach,

958: eq. \ref{dbp}, for different values of $q$ and with rate $1/3$ where

959: the construction of the matrices $A$ and $B$ again follows

960: \cite{KS}. Results of the bit error rate, $P_b$, versus the channel

961: bit error rate, $f$ for $q=8$, and a given set of autocorrelations are

962: presented in Fig. \ref{markov8}, and for a set of autocorrelations

963: with $q=16$ in Fig. \ref{markov16} It is clear that the threshold of

964: the decoder based on eq. \ref{dbp3} is superior to the decoder based

965: on the SM approach, eq. \ref{dbp}.

966:

967:

968: \begin{figure}

969: \centering

970: %\vspace{4.5cm}

971: %\includegraphics[width=2.5in]{2/c070707.eps}

972: \centerline{\epsfxsize=2.5in \epsffile{c070707a.eps}}

973: \caption{ The bit error rate, $p_b$ versus the channel bit error rate

974: $f$ for $L=10,000$, $R=1/3$, $q=8$ with

975: $C_1=C_2=C_3=0.7$. Decoding following the dynamical block

976: probabilities defined in eq. \ref{dbp} (open squares), decoding

977: following the advanced joint S/C decoder, eq. \ref{dbp3} (full

978: squares) and decoding following the Markovian decoder, eq. \ref{pabc}  (open

979: triangular). Each point is averaged over at least $1,000$ source

980: messages.  Shannon's lowered bound, $f_c=0.271$, derived from

981: eq. \ref{capacity} is denoted by an arrow.  }

982: \vspace{2.5cm}

983: \label{markov8}

984: \end{figure}

985:

986: \begin{figure}

987: \centering

988: \vspace{3.5cm}

989: %\includegraphics[width=2.5in]{2/c068068060655.eps}

990: \centerline{\epsfxsize=2.5in \epsffile{c068068060655a.eps}}

991: \caption{ The bit error rate, $p_b$ versus the channel bit error rate

992: $f$ for $L=10,000$, $R=1/3$, $q=16$ with

993: $C_1=0.68,~C_2=0.68,~,C_3=0.6,C_4=0.655$.  Decoding following the

994: dynamical block probabilities defined in eq. \ref{dbp} (open squares),

995: decoding following the advanced joint S/C decoder, eq. \ref{dbp3}

996: (full squares) and decoding following the Markovian decoder,

997: eq. \ref{pabc} (open triangular). Each point is averaged over at least

998: $1,000$ source messages.  Shannon's lowered bound, $f_c=0.266$,

999: derived from eq. \ref{capacity} is denoted by an arrow.  }

1000: \label{markov16}

1001: \end{figure}

1002:

1003: Note that for finite $L$ the dynamical block posterior probabilities

1004: defined in eq. \ref{dbp3} (${\hat P}(B_i,B_j,B_k)$) fluctuate among

1005: different samples, where for the decoder based on the SM approach,

1006: eq. \ref{dbp}, these probabilities are sample independent.  This is

1007: one of the sources of the superiority of the presented decoder over

1008: the SM approach (at least for finite $L$).

1009:

1010: Note also that the presented decoder, eq. \ref{dbp3}, takes into

1011: account all higher order correlations ($q$ autocorrelations,

1012: $\{C_{k_1,...,k_m}\}$) in a direct measure -- the probability of

1013: occurrence of triplets of blocks, ${\hat P}(B_i,b,B_j) $. There is no

1014: need, as required in the SM approach, eq. \ref{dbp}, to calculate the

1015: form of a $q \times q$ matrix, to diagonalize large transfer matrices

1016: or to seek a saddle point in a large number of dimensions, $q$.

1017:

1018: It is clear that the threshold of the advanced joint S/C decoder,

1019: eq. \ref{dbp3}, is superior to the decoder based on the SM approach.

1020: However, from a practical point of view the advanced joint S/C

1021: decoder, eq. \ref{dbp3}, suffers from the following

1022: disadvantages. Firstly, the complexity per message passing scales with

1023: $L_0q^3$ (see eq. \ref{dbp3}), where the complexity of the previously

1024: discussed algorithm is only $L_0q^2$. Secondly, the size of the header

1025: (the transmitted side information, namely, the measured probabilities

1026: of occurrence of triplets of successive blocks) scales also with

1027: $q^3$.  Although the size of the header does not scale with $L$, it is

1028: a critical overhead for a finite $L$.  In the following sections we

1029: show how to overcome these difficulties and to sail towards a

1030: practical algorithm in the large $q$ limit.

1031:

1032:

1033: \section{Markovian joint S/C decoder}

1034:

1035: The calculation of the entropy using the transfer matrix methods

1036: indicates that the ensemble of sequences obeying in the leading order

1037: a given set of autocorrelations can also be derived using a Markovian

1038: process \cite{kinzel}.  More precisely, the elements of the transition

1039: matrix, $\{ P_{ij}\}$ (a transition from state $i$ to $j$), are

1040: related to the transfer matrix elements, $\{ T_{ij} \}$, via the

1041: following normalization

1042: \begin{equation}

1043: P_{ij} = {T_{ij} \over \sum_j T_{ij} }

1044: \label{t-ij}

1045: \end{equation}

1046:

1047: Using this analogue, one can now approximate the measured probability

1048: of occurrence of any $q^3$ combinations of three successive blocks,

1049: $(A,B,C)$, using the following formula:

1050: \begin{equation}

1051: {\hat P}(A,B,C) = {{\hat P}(A,B){\hat P}(B,C) \over {\hat P}(B) }

1052: \label{pabc}

1053: \end{equation}

1054: \noindent Hence, the dynamical block probabilities, eq. \ref{dbp}, can

1055: be now calculated with a complexity of $q^2$, and the overall

1056: complexity of the Markovian joint S/C decoder per message passing is

1057: $O(Lq^2/\log(q))$.  Note again that there is no need, as required in

1058: the SM approach, eq. \ref{dbp}, to calculate the form of a $q \times

1059: q$ matrix, to diagonalize large transfer matrices or to seek a saddle

1060: point in a large number of dimensions, $q$

1061:

1062: Note that in the limit of infinite $L$, eq. \ref{pabc} is exact in the

1063: leading order of $L$. For a finite $L$, some corrections are expected.

1064: The deviation from a direct measure of the probability of occurrence

1065: of three successive blocks to the estimation of the r.h.s of

1066: eq. \ref{pabc} is expected to be significant only for triplets with

1067: very low probability of occurrence (for instance, if the l.h.s of

1068: eq. \ref{pabc} indicates that a triplet of three successive blocks is

1069: absent in a given sequence where the r.h.s makes one

1070: appearance). However, we do not expect these events with very low

1071: probabilities to dramatically affect the performance of the

1072: algorithm. This expectation was indeed confirmed in our

1073: simulations. Results are exemplified in Figs. \ref{markov8} and

1074: \ref{markov16}, where the performance of the Markovian S/C decoder is

1075: compared with the SM approach, eq. \ref{dbp}.  The difference in the

1076: threshold between these two methods is negligible for the examined

1077: cases.

1078:

1079:

1080: The complexity of our Markovian S/C decoder was reduced to $O(L_0q^2)$

1081: per message passing. However, there is still a need for the

1082: transmission of the side information consisting of the measured

1083: probabilities of occurrence of all successive pairs of blocks, $\{

1084: {\hat P}(A,B)\}$. Hence the size of the header is of the order of

1085: $O(q^2)$.  For $L \rightarrow \infty$ or more precisely for $L \gg q^2$

1086: the overhead of the transmitted side information is negligible;

1087: however, for a finite $L \le q^2$ it may cancel the benefits of the

1088: Markovian joint S/C decoder.

1089:

1090:

1091: One way to reduce the overhead of the header of the order $O(q^2)$ is

1092: to transmit only the dominated elements of the matrix ${\hat

1093: P}(A,B)$. The remaining elements of the matrix are determined in the

1094: following way. Let us denote the sum of the transmitted dominated

1095: elements in the $i$th row by $M_i$ and their number by $N_i$.

1096: The non-transmitted elements in each row are set equally to

1097: $(1-M_i)/(q-N_i)$. It is clear that as we increase $\{N_i\}$ the

1098: structure of the approximated matrix converges to the true one.  For

1099: sequences with enhanced autocorrelations the structure of the matrix

1100: ${\hat P}(A,B)$ was observed to be dominated by a small number of

1101: large elements. The result of simulations for $q=8$ where the number

1102: of transmitted elements, $\sum M_i=8$ (out of $q^2=64$), is presented

1103: in Fig. \ref{q88} and for the case of $q=16$ where $\sum M_i=16$ (out

1104: of $q^2=256$) is presented in Fig \ref{q1616}. The performance seems

1105: to be only slightly affected by this approximation, which dramatically

1106: reduces the required transmitted side information.

1107:

1108:

1109:

1110:

1111: \begin{figure}

1112: \centering

1113: %\vspace{2.5cm}

1114: %\includegraphics[width=2.5in]{2/Markov_c07.new.eps}

1115: \centerline{\epsfxsize=2.5in \epsffile{Markov_c07.new1.eps}}

1116: \caption{ The bit error rate, $p_b$ versus the channel bit error rate

1117: $f$ for $L=10,000$, $R=1/3$, $q=8$ with $C_1=C_2=C_3=0.7$. Decoding

1118: following the Markovian process, eq. \ref{pabc} (open triangle),

1119: decoding following the Markovian process where only $8$ dominated

1120: elements of the transition matrix, ${\hat P}(A,B)$, are taken as a side

1121: information, and the rest of the elements are set equal to a constant

1122: such that the sum of each row is equal to $1$ (open circle).

1123: Shannon's lower bound, $f_c=0.271$, is denoted by an arrow.}

1124: \vspace{2.5cm}

1125: \label{q88}

1126: \end{figure}

1127:

1128:

1129: \begin{figure}

1130: \centering

1131: \vspace{2.5cm}

1132: %\includegraphics[width=2.5in]{2/Markov_c068068060655.new.eps}

1133: \centerline{\epsfxsize=2.5in \epsffile{Markov_c068068060655.new1.eps}}

1134: \caption{ The bit error rate, $p_b$ versus the channel bit error rate

1135: $f$ for $L=10,000$, $R=1/3$, $q=16$ with

1136: $C_1=0.68,~C_2=0.68,~,C_3=0.6,C_4=0.655$.  Decoding following the

1137: Markovian process, eq. \ref{pabc} (open triangular) and decoding

1138: following the Markovian process where only $16$ dominated elements of

1139: the transition matrix, ${\hat P}(A,B)$, are taken as a side information,

1140: and the rest of the elements are set equal to a constant such that the

1141: sum of each row is equal to $1$ (open circle).  Shannon's lower

1142: bound, $f_c=0.266$, is denoted by an arrow.}

1143: \label{q1616}

1144: \end{figure}

1145:

1146: An interesting open question is the effect of use of the

1147: sparseness of the tensor ${\hat P}(B_i,B_j,B_k)$ on  the average and

1148: the distribution of the number of required message passing for

1149: convergence of the decoding process.

1150:

1151:

1152: \section{Efficient Joint S/C decoder with the lack of side information}

1153:

1154: The discussion in the previous sections indicates that the performance

1155: of the presented joint S/C coding is not too far from Shannon's lower

1156: bound and, most probably, using an optimized code (a better

1157: construction for the matrices $A$ and $B$ of the MN code), the channel

1158: capacity can be nearly saturated. However for a finite block length

1159: the main drawback of our algorithm is the overhead of the header which

1160: must be encoded and transmitted reliably. One has to remember that the

1161: size of the header scales with $q^2$ where the precision of each

1162: element is of the order $O(\log L)$. This overhead is especially

1163: intolerable in the limit where

1164: \begin{equation}

1165: {q^2 \log (L) \over L} \sim O(1)

1166: \label{largeblock}

1167: \end{equation}

1168: \noindent Note that this is indeed the situation even for very large

1169: messages, $L=10^6$, and the largest taken autocorrelation length is

1170: only $\log_2q=8$. The l.h.s of eq. \ref{largeblock} with these parameters

1171: is around $1$.

1172:

1173: In this section we explain how the abovementioned Markovian joint S/C

1174: can be implemented without the transmission of any side information,

1175: $\{y_{k_1,...k_m}\}$ of eq. \ref{h2}, or ${\hat P}(A,B)$ of

1176: eq. \ref{pabc}.  The main idea can be easily exemplified in the

1177: framework of the Gallager code, where only at the end of the

1178: discussion do we extend it to the MN code.

1179:

1180: The first $N$ received bits (the source message) using the Gallager

1181: code with systematic parity-check matrix is the message itself which

1182: is generated by a Markovian process plus the additional channel noise

1183: $f$. Hence, from the receiver point of view the generator of the first

1184: $N$ received bits is a Hidden Markov Process.  The first task of the

1185: receiver is to estimate ${\hat P}(A,B)$ from the knowledge of the

1186: noisy received source message and the channel flip rate $f$.  This

1187: type of {\it parametric estimation} is a common problem in statistics

1188: and can be solved (exactly or approximately) using the EM algorithm or

1189: by one of its variants \cite{EM}. More precisely, for an infinite

1190: source, $L \rightarrow \infty$, the transition matrix eq. \ref{t-ij}

1191: (or the interaction strengths eq. \ref{h2}) can be recovered within a

1192: bounded error with $O(L)$ time complexity. For a finite $L$ sequence

1193: the parameters of the Markovian process can be estimated approximately

1194: with an error of the order $O(q^2/L)$. Hence, the required parameters

1195: for the presented joint S/C decoder based on the dynamical block

1196: posterior probabilities can be estimated from the received noisy

1197: message.  Note that as explained above, the error in the dominated

1198: elements of the transition matrix, ${\hat P}(A,B)$, is the most

1199: important ingredient for the performance of our joint S/C decoder,

1200: hence one may desire an efficient algorithm to estimate especially the

1201: dominated part of the transition matrix.

1202:

1203: The critical problem with the above description is that our efficient

1204: decoder can be implemented only by the MN algorithm, where the decoder

1205: {\it simultaneously} estimates the values for the source and noise

1206: bits.  In contrast, in the Gallager decoder, the values for the noise

1207: bits are firstly estimated, and only in the next step are the source

1208: bits recovered. Hence, the dynamical block posterior probabilities

1209: cannot be used (to our current knowledge) in the framework of the

1210: Gallager algorithm.  Since in the MN decoder the source is not

1211: transmitted by itself, the question now is how to estimate the

1212: parameters of the Markovian process which are responsible for

1213: generating of the source message from the received message.

1214: Nevertheless, as explained below, this problem can be solved also

1215: for the MN case in $O(L)$ time complexity.

1216:

1217: Let us first explain the solution for the examined MN construction,

1218: Fig. \ref{ks}, then later sketch the general solution.  For the used

1219: MN construction, Fig. \ref{ks}, the first $N$ rows of $A$ are

1220: characterized by one non-zero element per row and column, where the

1221: first $N$ rows of $B$ are characterized by $2$ non-zero elements

1222: (furthermore, each row of $B$ cannot be written as a linear

1223: combination of the other rows). Hence, the first $N$ bits of the

1224: syndrome, eq. \ref{decoding}, are equal to the source with an

1225: effective flip rate equal to $f_{eff}=2f(1-f)$. The EM algorithm with

1226: $f_{eff}$ can now be used to estimate the finite number of parameters

1227: of the Markovian process generating the sequence.

1228:

1229: For the general construction of the NM algorithm one adds/subtracts

1230: rows of the concatenated matrix $[A,B]$ and the corresponding received

1231: message bits $Z$ (see eq. \ref{decoding}), such that a situation is

1232: finally reached as follows. The first $N$ rows of $A$ are the identity

1233: matrix, regardless of the construction of the first $N$ rows of $B$, and

1234: with the corresponding $Z_{eff}$.  From the knowledge of the noise

1235: level $f$ and the structure of $i$th row of $B$ one can now calculate

1236: the effective noise level, $f_{i,eff}$, of the $i$th received source

1237: bit. Note that all $N$ effective noise, $\{f_{i,eff}\}$ are  functions

1238: of a unique noise level $f$, and one can again estimate the parameters

1239: of the Markovian process using some variants of the EM algorithm. The

1240: only approximation used in the calculation of $\{f_{i,eff} \}$ {\it

1241: in the general case} is that the new form of the first $N$ rows of $B$

1242: contain loops, hence $\{f_{i,eff} \}$ are correlated. However, these

1243: correlations are assumed to be small as the typical loops are of the

1244: order of $O(\log (L))$.

1245:

1246: \section{ Markovian joint S/C coding in higher dimensions}

1247:

1248: The decoder based on the SM approach, eq. \ref{dbp}, is limited to a

1249: one-dimensional stream of bits, since the trace in eq. \ref{omega1} can

1250: be done using the transfer matrix method (or any known method) only to

1251: very limited cases of a two-dimensional array of bits \cite{baxter}.

1252: The analytical solution of a two-dimensional Ising system with

1253: arbitrary strength of even nearest neighbor interactions is not

1254: known and in three dimensions no analytical solution is known.  On the

1255: contrary, the one-dimensional Markovian joint S/C decoder can be

1256: easily extended to a joint S/C coding of a two-dimensional array of bits

1257: or even to an array of bits in higher dimensions.

1258:

1259: For illustration, assume that we have a two-dimensional picture to

1260: transmit using a joint S/C mechanism via a noisy channel. A simple way

1261: would be to convert the two-dimensional picture into a one-dimensional

1262: sequence and then to use, for instance, one of the abovementioned

1263: decoders.  However, it is clear that the mapping of the

1264: two-dimensional picture into a one-dimensional sequence is not unique

1265: and of course the natural two-dimensional correlations are destroyed

1266: in this mapping (at least for the realistic case of finite $k_0$). An

1267: alternative way is to generalize the advanced threshold joint S/C

1268: decoder, eq. \ref{dbp3}, to two dimensions, where each block is

1269: updated following its four neighboring blocks.  The generalization of

1270: eq. \ref{pabc} to this case is given by

1271: \begin{eqnarray}

1272: {\hat P}(B_{i,j-1},B_{i-1,j},B_{i,j},B_{i+1,j},B_{i,j+1}) \equiv

1273: \nonumber \\

1274: {1 \over L_0^2} \sum_{i,j}^{L_0} \prod_{k+m=-1,0,1}

1275: \delta_{A_{i+k,j+m},B_{i+k,j+m}}

1276: \label{pabc-2d}

1277: \end{eqnarray}

1278: \noindent where $L_0^2=(L/k_0)^2$ is the number of blocks of the

1279: two-dimensional array of bits, and again periodic boundary conditions

1280: are assumed.  Similarly to eq. \ref{pabc}, the dynamical posterior

1281: probabilities now take the following form

1282: \begin{eqnarray}

1283: \gamma_{n}^{B_{k,m}} = \sum_{i,j,s,t=1}^q {

1284: {\hat P}(B_{i},B_{j},B_{k,m},B_{s},B_{t}) \over \sum_{b=1}^q

1285: {\hat P}(B_i,b,B_j) }\times  \nonumber \\ q_{k-1,m}^{i} q_{k,m-1}^j

1286: q_{k+1,m}^{s} q_{k,m+1}^t

1287: \label{gamma-2d}

1288: \end{eqnarray}

1289: \noindent It is clear that the generalization of this decoder to

1290: a higher dimension is straightforward and in the naive decoding the

1291: complexity of the decoder scales as $L_0^dq^{2d-1}$.  Nevertheless,

1292: similarly to the Markovian joint S/C decoder also in the higher

1293: dimensional case the complexity can be reduced and, for instance, in two

1294: dimensions

1295: \begin{eqnarray}

1296: &{\hat P}(B_{i,j-1},B_{i-1,j},B_{i,j},B_{i,j+1},B_{i+1,j})= \nonumber \\

1297: &{ {\hat

1298: P}(B_{i,j-1},B_{i,j}) {\hat P}(B_{i-1,j},B_{i,j}) {\hat

1299: P}(B_{i,j+1},B_{i,j}) {\hat P}(B_{i+1,j-},B_{i,j}) \over {\hat

1300: P}(B_{i,j})^3 }

1301: \label{pabc-d}

1302: \end{eqnarray}

1303: \noindent and similarly in higher dimensions. Hence, the complexity of

1304: a message passing in $d$ dimensions is reduced to $(L_0)^dq^2$, or

1305: alternatively the complexity per block is of the order of $O(q^2)$.

1306:

1307: Besides the above simplification, it is important to note that for

1308: finite $L$ the tensor of the probabilities of occurrence of nearest

1309: blocks, for instance eq. \ref{pabc-d}, for the two-dimensional case,

1310: is expected to be very sparse. Hence, the decoder can be accelerated

1311: as was discussed for the one-dimensional case.

1312:

1313: From an analytical point of view, we do not have an effective way to

1314: generate an ensemble of arrays with a given set of autocorrelations in

1315: two or higher dimensions, since we do not know how to derive the

1316: effective interactions, eq. \ref{hamiltonian}. From a practical point

1317: of view, for a given two-dimensional picture and $k_0$, we can

1318: measure the correlations, eq. \ref{pabc-d}, and then apply the

1319: Markovian decoder. However, there is no reference point to compare the

1320: efficiency of the decoder, since we do not have an effective way to

1321: calculate the entropy, $H_2$, and then Shannon's lower bound,

1322: eq. \ref{capacity}, for a given set of correlations in more than

1323: one dimension. Practically,

1324: %a comparison can be made of

1325: the performance of the Markovian decoder in higher dimensions can be

1326: compared to other known efficient lossless compression methods for two

1327: and higher dimensions.  This important comparison certainly warrants

1328: further research.

1329:

1330:

1331: \section{Concluding remarks}

1332:

1333: The only remaining major drawback of the presented Markovian joint S/C

1334: coding is that the complexity of the decoder scales in the leading

1335: order for large $q$ per message passing as $O(Lq^2/\log_2(q))$.

1336:

1337: We note that asymptotically the complexity of the Markovian joint

1338: S/C decoder per message passing might be reduced to $O(Lq\log(q))$.  The

1339: main idea can be exemplified in the framework of the original SM

1340: scheme, eq. \ref{dbp}. The complexity of the calculation of each

1341: $\gamma_n^c$ is of the order of $q$, and it is required to calculate

1342: such $q$ different elements. Each summation in eq. \ref{dbp} consists

1343: of the following two types of terms. The first one is the static

1344: terms, $S_{L}(l,c) (S_R(c,r))$, which are the Boltzmann factors, or a

1345: row of the transition matrix of the Markov process. The second

1346: type is the dynamical posterior probabilities for the neighboring

1347: blocks, $q_L^l,~q_R^r$.  The static terms can be ordered in a decreasing

1348: rank order only once, in the initial stage of the decoder, and the

1349: first $O(\log(q))$ largest dynamical posterior probabilities can be

1350: found at the cost of $O((q\log(q)))$ per block.  Next we run the usual

1351: decoder, eq. \ref{dbp}, in one of the following two options: (a) the

1352: summations in eq. \ref{dbp} are done only on the {\it current} leading

1353: $O(\log(q))$ of the dynamical block posterior probabilities, or

1354: alternatively (b) the summations are done as was proposed in (a) with

1355: the additional $O(\log(q))$ leading terms of the static terms, or any

1356: combination of (a) and (b).

1357:

1358: The idea behind the above procedure is similar to results of section

1359: IX, where only a limited degradation in the performance of the

1360: Markovian decoder was observed where the transition matrix was

1361: approximated by the knowledge of only a small number dominated

1362: terms. Similarly in the presented approximation, we expect that most

1363: of the bits are correctly ordered by the dominated part of the

1364: static Boltzmann weights and by the dominated part of the dynamical

1365: block posterior probabilities.  In the final stage of the decoder,

1366: rare events of a pair of blocks, small Boltzmann factors, will be

1367: correctly ordered by the dominated true posterior block probabilities

1368: for one of its two neighbors.

1369:

1370:

1371: The question which remains to be answered is the origin of the

1372: suggested scaling of the order of $O(\log(q))$ dominated taken terms

1373: in the summations of eq. \ref{dbp}. The explanation is based on the

1374: characteristic features of random graphs \cite{erdos,kanter1}. In the

1375: full operation of the Markovian process one assigns for each pair of

1376: nearest blocks a dynamical transition matrix of size $q \times q$,

1377: which resembles a fully connected graph consisting of $q$ nodes. The

1378: purpose of our approximation is to replace the fully connected graph

1379: with the diluted one, {\it but the graph has still to be connected};

1380: the maximal component of the graph must be $q$. The lack of finite

1381: components is a necessary condition, since in such an event the

1382: enhancement of the true block posterior probability may be dynamically

1383: forbidden, since there are isolated nodes (states).  From the random

1384: graph theory it is know that the maximal component is equal to $q$

1385: where the average connectivity (the average number of non-zero

1386: transitions per row) is of the order of

1387: $O(\log(q))$\cite{erdos,kanter1}. This prediction has still to be confirmed

1388: in large scale simulations, large $L$ and $q$.

1389:

1390:

1391: Finally, note that for large $q$ the transition matrix, ${\hat

1392: P}(A,B)$, is a very sparse matrix in the limit $q^2 \gg L$. This limit

1393: is achieved even for very large source messages and short-range

1394: correlation length, for instance, $k_0=12, ~q=2^{k_0}=2048$ and

1395: $L=10^5$. Furthermore, in the limit where the number of possible

1396: different blocks $q=2^{k_0} \gg L$, a large fraction of $\gamma_n^c$,

1397: eq. \ref{dbp}, can be taken as zero probabilities. Hence, the

1398: complexity of the decoder can be simplified further by these two effects.

1399:

1400:

1401:

1402:

1403: \section*{Acknowledgment}

1404:

1405: I.K thanks David Forney, Wolfgang Kinzel, Manfred Opper, Shlomo

1406: Shamai, Rudiger Urbanke and Shun-ichi Amari for many helpful

1407: dicussions and comments.

1408:

1409:

1410: % The very first letter is a 2 line initial drop letter followed

1411: % by the rest of the first word in caps.

1412: %

1413: % form to use if the first word consists of a single letter:

1414: % \PARstart{A}{demo} file is ....

1415: %

1416: % form to use if you need the single drop letter followed by

1417: % normal text (unknown if ever used by IEEE):

1418: % \PARstart{A}{}demo file is ....

1419: %

1420: % Some journals put the first two words in caps:

1421: % \PARstart{T}{his demo} file is ....

1422: %

1423: % Here we have the typical use of a "T" for an initial drop letter

1424: % and "HIS" in caps to complete the first word.

1425: %\PARstart{T}{his} demo file is intended to serve as a ``starter file"

1426: %for IEEE journal papers produced under \LaTeX\ using IEEEtran.cls version

1427: %1.6b and later.

1428: % You must have at least 2 lines in the paragraph with the drop letter

1429: % (should never be an issue)

1430: % May all your publication endeavors be successful.

1431:

1432: %\hfill mds

1433:

1434: %\hfill November 18, 2002

1435:

1436: %\subsection{Subsection Heading Here}

1437: %Subsection text here.

1438:

1439: % needed in second column of first page if using \pubid

1440: %\pubidadjcol

1441:

1442: %\subsubsection{Subsubsection Heading Here}

1443: %Subsubsection text here.

1444:

1445: % Reminder: the "draftcls" or "draftclsnofoot", not "draft", class option

1446: % should be used if it is desired that the figures are to be displayed while

1447: % in draft mode.

1448:

1449: % An example of a floating figure using the graphicx package.

1450: % Note that \label must occur AFTER (or within) \caption.

1451: % For figures, \caption should occur after the \includegraphics.

1452: %

1453: %\begin{figure}

1454: %\centering

1455: %\includegraphics[width=2.5in]{myfigure}

1456: % where an .eps filename suffix will be assumed under latex,

1457: % and a .pdf suffix will be assumed for pdflatex

1458: %\caption{Simulation Results}

1459: %\label{fig_sim}

1460: %\end{figure}

1461:

1462:

1463: % An example of a double column floating figure using two subfigures.

1464: % (The subfigure.sty package must be loaded for this to work.)

1465: % The subfigure \label commands are set within each subfigure command, the

1466: % \label for the overall fgure must come after \caption.

1467: % \hfil must be used as a separator to get equal spacing

1468: %

1469: %\begin{figure*}

1470: %\centerline{\subfigure[Case I]{\includegraphics[width=2.5in]{subfigcase1}

1471: % where an .eps filename suffix will be assumed under latex,

1472: % and a .pdf suffix will be assumed for pdflatex

1473: %\label{fig_first_case}}

1474: %\hfil

1475: %\subfigure[Case II]{\includegraphics[width=2.5in]{subfigcase2}

1476: % where an .eps filename suffix will be assumed under latex,

1477: % and a .pdf suffix will be assumed for pdflatex

1478: %\label{fig_second_case}}}

1479: %\caption{Simulation results}

1480: %\label{fig_sim}

1481: %\end{figure*}

1482:

1483:

1484:

1485: % An example of a floating table. Note that, for IEEE style tables, the

1486: % \caption command should come BEFORE the table. Table text will default to

1487: % \footnotesize as IEEE normally uses this smaller font for tables.

1488: % The \label must come after \caption as always.

1489: %

1490: %\begin{table}

1491: %% increase table row spacing, adjust to taste

1492: %\renewcommand{\arraystretch}{1.3}

1493: %\caption{An Example of a Table}

1494: %\label{table_example}

1495: %\centering

1496: %% Some packages, such as MDW tools, offer better commands for making tables

1497: %% than the plain LaTeX2e tabular which is used here.

1498: %\begin{tabular}{|c||c|}

1499: %\hline

1500: %One & Two\\

1501: %\hline

1502: %Three & Four\\

1503: %\hline

1504: %\end{tabular}

1505: %\end{table}

1506:

1507:

1508: %\section{Conclusion}

1509: %The conclusion goes here.

1510:

1511: % if have a single appendix:

1512: %\appendix[Proof of the Zonklar Equations]

1513: % or

1514: %\appendix  % for no appendix heading

1515: % do not use \section anymore after \appendix, only \section*

1516: % is possibly needed

1517:

1518: % use appendices with more than one appendix

1519: % then use \section to start each appendix

1520: % you must declare a \section before using any

1521: % \subsection or using \label (\appendices by itself

1522: % starts a section numbered zero.)

1523: %

1524: % Use this command to get the appendices' numbers in "A", "B" instead of the

1525: % default capitalized Roman numerals ("I", "II", etc.).

1526: % However, the capital letter form may result in awkward subsection numbers

1527: % (such as "A-A"). Capitalized Roman numerals are the default.

1528: %\useRomanappendicesfalse

1529: %

1530: %\appendices

1531: %\section{Proof of the First Zonklar Equation}

1532: %Appendix one text goes here.

1533:

1534: % you can choose not to have a title for an appendix

1535: % if you want by leaving the argument blank

1536: %\section{}

1537: %Appendix two text goes here.

1538:

1539: % use section* for acknowledgement

1540: % optional entry into table of contents (if used)

1541: %\addcontentsline{toc}{section}{Acknowledgment}

1542: %The authors would like to thank...

1543:

1544: % trigger a \newpage just before the given reference

1545: % number - used to balance the columns on the last page

1546: % adjust value as needed - may need to be readjusted if

1547: % the document is modified later

1548: %\IEEEtriggeratref{8}

1549: % The "triggered" command can be changed if desired:

1550: %\IEEEtriggercmd{\enlargethispage{-5in}}

1551:

1552: % references section

1553: % NOTE: BibTeX documentation can be easily obtained at:

1554: % http://www.ctan.org/tex-archive/biblio/bibtex/contrib/doc/

1555:

1556: % can use a bibliography generated by BibTeX as a .bbl file

1557: % standard IEEE bibliography style from:

1558: % http://www.ctan.org/tex-archive/macros/latex/contrib/supported/IEEEtran/bibtex

1559: %\bibliographystyle{IEEEtran.bst}

1560: % argument is your BibTeX string definitions and bibliography database(s)

1561: %\bibliography{IEEEabrv,../bib/paper}

1562: %

1563: % <OR> manually copy in the resultant .bbl file

1564: % set second argument of \begin to the number of references

1565: % (used to reserve space for the reference number labels box)

1566: \begin{thebibliography}{1}

1567:

1568:

1569: \bibitem{forney}Sae-Young Chung, Forney GD Jr, Richardson TJ, Urbanke

1570: R, On the design of low-density parity-check codes within 0.0045 dB

1571: of the Shannon limit. {\em IEEE, Communications Letters, vol.5, no.2,

1572: Feb. 2001, pp.58-60.}

1573:

1574: \bibitem{David-Mackay2} Richardson TJ, Urbanke R, The capacity of

1575: low-density parity-check codes under message-passing decoding. {\em

1576: IEEE Transactions on Information Theory, vol.47, no.2, Feb. 2001,

1577: pp.599-618.}

1578:

1579:

1580: \bibitem{Shokrollahi} Luby MG, Mitzenmacher M, Shokrollahi MA,

1581: Spielman DA,  Analysis of low density codes and improved designs using

1582: irregular graphs.  {\em Proceedings of the Thirtieth Annual ACM

1583: Symposium on Theory of Computing.  ACM. 1998, pp.249-58. New York, NY,

1584: USA.}

1585:

1586:

1587: \bibitem{KS} Kanter I, Saad D, Error-correcting codes that nearly

1588: saturate Shannon's bound,  {\em Physical Review Letters, vol.83, no.13,

1589: 27 Sept. 1999, pp.2660-3.}

1590:

1591: \bibitem{turbo} Berrou C, Glavieux A, Thitimajshima P, Near Shannon

1592: limit error-correcting coding and decoding: Turbo-codes. {\em ICC '93

1593: Geneva. IEEE International Conference on Communications

1594: '93. IEEE. 1993, pp.1064-70 vol.2.}, and Berrou C, Glavieux A.  Near

1595: optimum error correcting coding and decoding: turbo-codes.  {\em IEEE

1596: Transactions on Communications, vol.44, no.10, Oct. 1996, pp.1261-71.}

1597:

1598:

1599: \bibitem{Shannon} Shannon CE, A mathematical theory of

1600: communication, {\emph Bell System Technical J.}, {\bf 27}, 379-423,

1601: 623-656, 1948.

1602:

1603:

1604: \bibitem{Cover} Cover TM, Thomas JA. {\it Elements of information

1605: theory.}  Wiley. 1991, UK.

1606:

1607:

1608: \bibitem{err_cor_book} Michelson, AM  and Levesque, AH,

1609: {\it Error-Control Techniques for Digital Communications,} Wiley, New

1610: York, 1985.

1611:

1612: \bibitem{Frey} Frey BJ, {\em Graphical Models for Machine Learning

1613: and Digital Communication} (MIT Press), 1998.

1614:

1615:

1616:

1617: \bibitem{shamail1} Kliewer J, Thobaben R, Combining FEC and optimal

1618: soft-input source decoding for the reliable transmission of correlated

1619: variable-length encoded signals.  {\it Proceedings DCC 2002. Data

1620: Compression Conference. IEEE Comput. Soc. 2002, pp.83-91.  Los

1621: Alamitos, CA, USA.}

1622:

1623: \bibitem{shamail2} Shamai S, Verdu S, Capacity of channels with

1624: uncoded side information.  {\it European Transactions on

1625: Telecommunications \& Related Technologies, vol.6,no.5,

1626: Sept.-Oct. 1995, pp.587-600.}

1627:

1628: \bibitem{shamail3} Liveris A, Xiong Z and Georghiades CN,

1629: Compression of Binary Sources with Side Information Using Low-Density

1630: Parity-Check Codes, {\it Proceedings of Globecom 2002, Taipei, Taiwan,

1631: November 17-21 2002. }

1632:

1633:

1634: \bibitem{shamail4} Garcia-Frias J and  Villasenor JD, Joint Turbo

1635: Decoding and Estimation of Hidden Markov Sources {\it IEEE

1636: J. Selec. Areas Commun., Vol. 19, No. 9, pp. 1671-1679, Sept. 2001}.

1637:

1638:

1639: \bibitem{shamail5} C.-C. Zhu and F. Alajaji,Turbo Codes for

1640: Non-Uniform Memoryless Sources over Noisy Channels, {\it IEEE

1641: Communications Letters, Vol. 6, No. 2, pp. 64-66, February 2002}.

1642:

1643:

1644: \bibitem{shamail6} Garcia-Frias J , Zhao Y, Compression of binary

1645: memoryless sources using punctured turbo codes, {\it IEEE

1646: Communication Letters, vol.6, no.9, pp.394�396 (2002).}

1647:

1648:

1649: \bibitem{KS-Gaussian} Kanter I, Saad D, Finite-size effects and

1650: error-free communication in Gaussian channels. {\it Journal of Physics

1651: A-Mathematical \& General, vol.33, no.8, 3 March 2000, pp.1675-81.}

1652:

1653:

1654:

1655: \bibitem{KK} Kanter I and Kfir H, Statistical mechanical aspects of

1656: joint source-channel coding, {\it Europhys. Lett. Vol. 63 No. 2 pp. 310

1657: (July 2003)}.

1658:

1659: \bibitem{KR} Kanter I and Rosemarin H, (cond-mat-0301005).

1660:

1661: \bibitem{sourlas} Sourlas N, Spin-glass models as error-correcting

1662: codes. {\it Nature, vol.339, no.6227, 29 June 1989, pp.693-5}.

1663:

1664: \bibitem{sourlas1} Sourlas N, Statistical mechanics and

1665: capacity-approaching error-correcting codes. {\it Physica A, vol.302,

1666: no.1-4, 15 Dec. 2001, pp.14-21. }

1667:

1668: \bibitem{liat} Ein-Dor L, Kanter I, Kinzel W, Low autocorrelated

1669: multiphase sequences.  {\it Physical Review E, vol.65, no.2, Feb. 2002}.

1670:

1671:

1672: \bibitem{kinzel} We thank Wolfgang Kinzel for his advice to simplify

1673: the decoder by using the Markovian process.

1674:

1675:

1676: \bibitem{baxter}R. Baxter J, `` Exactly Solved Models in Statistical

1677: Mechanics'', {\emph Academic Press, London}, 1982.

1678:

1679:

1680: \bibitem{ido-msi} I. Kanter, The equivalence between discrete-spin

1681: Hamiltonians and Ising Hamiltonians with multi-spin interactions,

1682: {\it J. Phys. A, vol. 20 pp. L257 1987}.

1683:

1684: \bibitem{MacKay} MacKay DJC and  Neal RM, Near Shannon limit

1685: performance of low density parity check codes. {\it Electronics

1686: Letters, vol.33, no.6, 13 March 1997, pp.457-8}; MacKay DJC, Good

1687: error-correcting codes based on very sparse matrices. {\it IEEE

1688: Transactions on Information Theory, vol.45, no.2, March 1999,

1689: pp.399-431. }

1690:

1691:

1692: \bibitem{LDPC-GF(q)} Davey MC and MacKay DJC, Low-density parity check

1693: codes over GF(q).  {\it Communications Letters, vol.2, no.6, June

1694: 1998, pp.165-7}.

1695:

1696: \bibitem{KABA} Nakamura K, Kabashima Y and Saad D, Statistical

1697: mechanics of low-density parity-check codes over Galois fields,

1698: {\it Europhys. Lett., vol.  56, 2001, pp. 610-616}.

1699:

1700:

1701: \bibitem{Davey} MacKay D. J. C, Wilson S. T. , Davey M. C. Comparison

1702: of constructions of irregular Gallager codes. {\it IEEE Transactions on

1703: Communications, vol.47, no.10, Oct. 1999, pp.1449-54. Publisher: IEEE,

1704: USA} M.C.~Davey and D.J.C.~MacKay, {\em IEEE Comm. Lett.}, in press

1705: (1999).

1706:

1707:

1708:

1709: \bibitem{David-Mackay1} MacKay DJC and Davey MC, {\it Gallager

1710: Codes for Short Block Length and High Rate Applications, Codes,

1711: Systems and Graphical Models}, IMA Volumes in Mathematics and its

1712: Applications, Springer-Verlag (2000).

1713:

1714:

1715:

1716: \bibitem{saad} Kabashima Y, Saad D, Statistical mechanics of

1717: error-correcting codes.  {\it Europhysics Letters, vol.45, no.1, 1

1718: Jan. 1999, pp.97-103}.

1719:

1720:

1721:

1722: \bibitem{saad1} Skantzos NS, van Mourik J,  Saad D and

1723: Kabashima  Y, Average and reliability error exponents in

1724: low-density-parity-check-codes, J. Phys. A (in press).

1725:

1726:

1727: \bibitem{Gallager} Gallager RG,  {\em Low density parity check codes}

1728: Research monograph series {\bf 21} (MIT press), 1963.

1729:

1730: \bibitem{domany} Priel A, Blatt M, Grossman T, Domany E, Kanter

1731: I, Computational capabilities of restricted two-layered

1732: perceptrons. {\it Physical Review E, vol.50, no.1,
 July 1994,

1733: pp.577-95.}

1734:

1735: \bibitem{median} In practice we define $t_{med}$ to be the average

1736: convergence time of all samples with $t \le$ the median time.

1737:

1738:

1739: \bibitem{r89} For rate $9/8$, for instance, the chosen construction

1740: for the matrices $A$ and $B$ are as follows. For $A$, the fraction of

1741: rows (from the first row of $A$) $(1/16,1/4,9/16,1/16,1/16)$ are

1742: characterized by $1,2,3,5,9$ non-zero elements per row, respectively.

1743: The structure of $B$ is the same as illustrated in Fig. \ref{ks}, but

1744: $1.75$ is replaced with $7/9$. We ran simulations for this

1745: construction with $C_1=C_2=0.7$ and the corresponding entropy is

1746: $H_2=0.513$ and $L=9,000$.  The extrapolation of $t_{med}$ indicates

1747: that the threshold of this code for large $L$ is $f_c \sim 0.057$. In

1748: the separation scheme using {\it optimal compression and error

1749: correction schemes} and with $f_c=0.057 ~(R_{i.i.d}=0.618)$, one can

1750: find that the overall inverse rate of the communication channel is

1751: $1/R=0.513/0.618 \sim 0.83$, which is only about $6\%$ below our joint

1752: S/C inverse rate $1/R=8/9 \sim 0.89$.  One must remember that our MN

1753: construction can be further optimized, and the critical channel noise

1754: is expected to be enhanced, $f_c > 0.057$.

1755:

1756:

1757:

1758: \bibitem{PPM} The PPMZ software used can be downloaded from

1759: www.cbloom.com/src/ppmz.html

1760:

1761:

1762: \bibitem{AC} The AC software used can be downloaded from

1763: www.cs.mu.oz.au/~alistair/arith\_coder

1764:

1765:

1766:

1767: \bibitem{bias} A similar degradation in the performance was observed

1768: for $q=2$ and biased binary messages (each source bit is equal $0/1$ and is

1769: chosen with probability $p/1-p$).  As $|p-0.5|$ increases the

1770: entropy decreases and a degradation in the performance of the MN

1771: algorithm was observed.

1772:

1773: \bibitem{haggai1} Kfir H and Kanter I (unpublished).

1774:

1775:

1776: \bibitem{shahar} Shahar K and Kanter I (unpublished).

1777:

1778:

1779:

1780: \bibitem{EM} McLachlan GJ and Krishan T,  The EM Algorithm and

1781: Extension. {\it Wiley Sons, New York, 1997}.

1782:

1783:

1784:

1785: \bibitem{manfred} I. Kanter thanks Manfred Opper for the discussion

1786: and the explanations of this general result

1787:

1788:

1789:

1790: \bibitem{comment} In principle one can generalize the tensor to

1791: include nearest and next-nearest blocks.

1792:

1793:

1794: \bibitem{erdos} Erdos P and Reyni A, ``The Art of Counting'',

1795: Edit. by J. Spencer (MIT Press, Cambridge MA , 1973).

1796:

1797:

1798: \bibitem{kanter1} Kanter I, Sompolinsky H. Mean-field theory of

1799: spin-glasses with finite coordination number. {\it Physical Review

1800: Letters, vol.58, no.2, 12 Jan. 1987, pp.164-7.}

1801:

1802:

1803: \end{thebibliography}

1804:

1805: % biography section

1806: %

1807: % If you have an EPS/PDF photo (graphicx package needed) extra braces are

1808: % needed around the contents of the optional argument to biography to prevent

1809: % the LaTeX parser from getting confused when it sees the complicated

1810: % \includegraphics command within an optional argument. (You could create

1811: % your own custom macro containing the \includegraphics command to make things

1812: % simpler here.)

1813: %\begin{biography}[{\includegraphics[width=1in,height=1.25in,clip,keepaspectratio]{mshell}}]{Michael Shell}

1814: % where an .eps filename suffix will be assumed under latex, and a .pdf suffix

1815: % will be assumed for pdflatex; or if you just want to reserve a space for

1816: % a photo:

1817:

1818: %\begin{biography}{Michael Shell}

1819: %Biography text here.

1820: %\end{biography}

1821:

1822: % if you will not have a photo at all:

1823: %\begin{biographynophoto}{John Doe}

1824: %Biography text here.

1825: %\end{biographynophoto}

1826:

1827: % insert where needed to balance the two columns on the last page

1828: %\newpage

1829:

1830: %\begin{biographynophoto}{Jane Doe}

1831: %Biography text here.

1832: %\end{biographynophoto}

1833:

1834: % You can push biographies down or up by placing

1835: % a \vfill before or after them. The appropriate

1836: % use of \vfill depends on what kind of text is

1837: % on the last page and whether or not the columns

1838: % are being equalized.

1839:

1840: %\vfill

1841:

1842: % Can be used to pull up biographies so that the bottom of the last one

1843: % is flush with the other column.

1844: %\enlargethispage{-5in}

1845:

1846: % that's all folks

1847: \end{document}

1848:

1849:

1850: