0401:cond-mat0401170/kim.tex

1: \documentclass[aps,pre,twocolumn,showpacs]{revtex4}

2: \usepackage{graphicx}

3:

4: \begin{document}

5:

6: \title{Error-correcting codes on scale-free networks}

7:

8: \author{Jung-Hoon Kim and Young-Jo Ko}

9: \affiliation{Future Technology Research Division,

10:          Electronics and Telecommunications Research Institute,

11:          Daejeon 305-350, Korea}

12: \date{\today}

13:

14: \begin{abstract}

15: We investigate the potential of scale-free networks as

16: error-correcting codes. We find that irregular low-density

17: parity-check codes with highest performance known to date have

18: degree distributions well fitted by a power-law function $p(k)\sim

19: k^{-\gamma}$ with $\gamma$ close to 2, which suggests that codes

20: built on scale-free networks with appropriate power exponents can

21: be good error-correcting codes, with performance possibly

22: approaching the Shannon limit. We demonstrate for an erasure

23: channel that codes with power-law degree distribution of the form

24: $p(k)=C(k+\alpha )^{-\gamma}$, with $k \geq 2$ and suitable

25: selection of the parameters $\alpha$ and $\gamma$, indeed have

26: very good error-correction capabilities.

27: \end{abstract}

28:

29: \pacs{89.75.Hc, 89.70.+c, 89.75.Fb}

30:

31: \maketitle

32:

33:

34: A variety of complex networks \cite{Alb1} exhibit a topological

35: structure in which the connectivity between their constituent

36: nodes follows a simple power law. Examples of such scale-free

37: networks include the Internet \cite{Fal,Cal}, the World Wide Web

38: (WWW) \cite{Alb2,Hub}, social networks \cite{New1}, metabolic

39: networks \cite{Jeong}, etc. Extensive studies have been made to

40: understand the topological features and evolving dynamics

41: \cite{Bara,Krap,Doro} of these networks. While many intriguing

42: properties concerning the structural aspect of complex networks

43: have been revealed thanks to these efforts, relatively little has

44: been known about the effects of specific connectivity structures

45: on networks' functional behavior \cite{Stro}. In order to properly

46: operate under a certain environment or, in a more active sense, to

47: successfully accomplish a given task, a complex network may favor

48: one particular structure over another. For practical applications,

49: it now appears that more attention need be paid to the {\em

50: functional aspect} of these complex networks viewed as whole

51: systems or organisms working for particular purposes.

52:

53: Recent advances in channel coding theory have led to the

54: perception that the state-of-the-art capacity-approaching codes,

55: such as Turbo codes \cite{Berr} and low-density parity-check

56: (LDPC) codes \cite{Gall,Mac1,Rich}, can be understood in terms of

57: graphs (or networks) consisting of nodes and edges \cite{Forney}.

58: The function of these graphs is to carry out error correction,

59: i.e. to recover original data transmitted over noisy channels, by

60: iteratively passing certain messages through edges connecting

61: neighboring nodes. The art of developing a high-performance

62: error-correcting code lies in designing a connectivity structure

63: of a graph in such a way as to make the code built on it perform a

64: desired function. One very important issue concerns finding the

65: connectivity distribution that achieves the Shannon's capacity

66: limit. Most attempts, however, have been limited to numerical

67: optimization, and a complete understanding of the connectivity

68: structure specific to capacity-achieving codes is still lacking.

69: Inspired by the ubiquitous nature of scale-free networks, one may

70: ask whether the connectivity structure of scale-free networks

71: could offer any insight into seeking good graph-based codes.

72:

73: In this paper we address the question whether scale-free networks

74: whose connectivity distribution follows a power law can function

75: effectively as good error-correcting codes. The codes built on

76: scale-free networks considered here are basically LDPC codes in

77: that the associated parity-check matrices are sparse and that the

78: belief propagation algorithm \cite{Gall,Mac1,Rich} is employed for

79: decoding. We first show that the degree distributions of LDPC

80: codes with highest performance known to date are well fitted by

81: power-law functions. Motivated by this finding, we generate a

82: degree distribution according to the function $p(k)=C(k+\alpha

83: )^{-\gamma}$ and fine tune the parameters $\alpha$ and $\gamma$ to

84: maximize the code's performance. We investigate error-correction

85: capability of these codes over a binary erasure channel and

86: compare them with the Tornado code \cite{Luby}, the first

87: commercialized LDPC code.

88:

89: An LDPC code can be represented by a bipartite graph in which

90: there are two different types of nodes: variable nodes and check

91: nodes. Nodes of one type are connected by edges only to nodes of

92: the other type. Variable nodes are associated with data bits, and

93: check nodes examine whether the variable nodes connected to them

94: satisfy parity-check equations. Error correction of corrupted data

95: bits is performed by passing certain messages, e.g. likelihood

96: ratios, through edges back and forth between variable and check

97: nodes. It is known from the density evolution analysis \cite{Rich}

98: that, under the assumption of a tree-structured random graph with

99: no closed loops, the error-correction capability of a code is

100: solely determined by the degree distribution.

101:

102: We begin by inspecting the degree distributions of some

103: high-performance LDPC codes. Figure \ref{fig1}(a) shows the

104: variable-node degree distribution of the LDPC code designed by

105: Chung {\it et al.}~\cite{Chung}, which has been optimized for an

106: additive white Gaussian noise channel and approaches the Shannon

107: limit within 0.0045 dB, presently the world record. Here, in order

108: to obtain a meaningful distribution from the irregularly spaced

109: data $\lambda (k_i)$ in Table \ref{tab2} of Chung {\it et

110: al.}~\cite{Chung}, we took a local average over a bin of length

111: $(k_{i+1} - k_{i-1})/2$:

112: \begin{equation}\label{eq1}

113: p(k_i) = \frac{P(k_i)}{(k_{i+1} - k_{i-1})/2},

114: \end{equation}

115: where $P(k_i)$ is the fraction of nodes with degree $k_i$ and is

116: given by $P(k_i) = C\lambda (k_i)/k_i$, in which $\lambda (k_i)$

117: is the fraction of edges connected to a variable node of degree

118: $k_i$ and $C$ is a normalization constant. It can be seen from

119: Fig.~\ref{fig1}(a) that the degree distribution is well fitted by

120: a power-law function $p(k) \sim k^{-\gamma}$ with $\gamma \simeq

121: 2.14$. A more dramatic correspondence is observed for the

122: variable-node degree distribution of the Tornado code

123: \cite{Luby,Shok}, as apparently seen in Fig.~\ref{fig1}(b). The

124: Tornado code has been optimized for an erasure channel and has a

125: Poisson distribution for its check-node degree distribution. The

126: power-law function that best fits the variable-node degree

127: distribution is found to have an exponent $\gamma \simeq 2.02$,

128: and the fitting appears to be nearly perfect for large $k$. We

129: also find that the right-regular sequence of Shokrollahi

130: \cite{Shok} that slightly beats the Tornado sequence is well

131: fitted by a similar power-law function.

132:

133: \begin{figure}

134: \includegraphics[width=0.95 \columnwidth]{Fig1.ps}

135: \caption{\label{fig1} Degree distributions for high-performance

136: LDPC codes. (a) The LDPC code designed by Chung {\it et al.}

137: (Table \ref{tab2} of \cite{Chung}). (b) The Tornado code with

138: maximum degree $d_{max}=610$ \cite{Luby,Shok}. The best fitting

139: lines have slopes (a) $\gamma \simeq 2.14$ and (b) $\gamma \simeq

140: 2.02$. }

141: \end{figure}

142:

143: The fact that many high-quality LDPC codes have degree

144: distributions well fitted by power-law functions stimulates us to

145: test scale-free networks as error-correcting codes. Following the

146: approach of Newman {\it et al.}~\cite{New2}, we write the

147: generating function for a general scale-free network in the form:

148: \begin{equation}\label{eq2}

149: G(x) = \sum_{k=0}^{d_l-1}a_kx^k

150: +\sum_{k=d_l}^{d_{max}}p(k)x^k,

151: \end{equation}

152: where the fraction of nodes with degree $k \geq d_l$ follows a

153: power law $p(k)=Ck^{-\gamma}$, and the terms $a_k$ with low degree

154: $k < d_l$ are separated from the second sum to allow for possible

155: deviation from the power law for small $k$, as is often the case

156: for general scale-free networks. The generating function in

157: Eq.~(\ref{eq2}), however, contains too many parameters to be

158: amenable to numerical optimization unless $d_l$ is sufficiently

159: small. To reduce the number of parameters while still retaining

160: the possibility that the distribution for low degrees may not obey

161: an exact power law, we instead choose the following generating

162: function:

163: \begin{equation}\label{eq3}

164: G(x) = \sum_{k=0}^{d_{max}} p(k)x^k,

165: \end{equation}

166: where $p(k) = C(k+\alpha )^{-\gamma}$. If $\alpha <0$ ($\alpha

167: >0$), the degree distribution for small $k$ lies above (below) the

168: power-law function $k^{-\gamma}$. We henceforth use

169: Eq.~(\ref{eq3}) to generate a variable-node degree distribution of

170: our code and optimize the parameters $\alpha$ and $\gamma$ to

171: achieve the best performance.

172:

173: Some empirical results known about LDPC codes help us to further

174: refine our code. The most well known findings related to features

175: of good LDPC codes may be that the variable nodes of degree one

176: should be removed since they do not contribute to error correction

177: and that the codes with almost uniform check-node degree yield

178: good performance \cite{Rich,Chung,Mac2}. Taking these into

179: account, we let the sum in Eq.~(\ref{eq3}) start from $k=2$, and

180: restrict the check-node degree to two consecutive integers: the

181: generating function for the check-node degree is written as

182: $F(x)=bx^i +(1-b)x^{i+1}$, where the parameters $b$ and $i$ are

183: easily determined once a variable-node degree distribution is

184: selected. This choice of the check-node degree distribution

185: enables us to design a code without restrictions on $d_{max}$ for

186: any given code rate; this property, however, is not shared by the

187: right-regular sequence \cite{Shok} for which $d_{max}$ is allowed

188: to have only a special set of values.

189:

190: The performance of an LDPC code over a binary erasure channel can

191: be evaluated by the density evolution method \cite{Rich} as

192: follows. Let $\delta$ be the erasure probability of a given

193: channel, and consider a code with a degree distribution pair

194: $\lambda (x)=\sum \lambda _kx^{k-1}$ and $\rho (x)=\sum \rho

195: _kx^{k-1}$, where $\lambda _k$ ($\rho _k$) is the fraction of

196: edges connected to a variable (check) node of degree $k$. Note

197: that the distribution here is defined in terms of the fraction of

198: edges, not the fraction of nodes as before. Then, if the belief

199: propagation algorithm is used for decoding, the messages passed

200: between the variable and check nodes are known to evolve as

201: \cite{Rich,Luby}

202: \begin{equation}\label{eq4}

203: x_l=x_0\lambda (1-\rho (1-x_{l-1})),

204: \end{equation}

205: where $x_l$ denotes the expected fraction of erasure messages at

206: the $l$th iteration and $x_0$ is its initial value given by

207: $x_0=\delta$. The recovery of original data is successfully done

208: if $x_l$ converges to zero. The threshold $\delta ^*$, defined by

209: the supremum of all $\delta$ that result in successful decoding,

210: tells the code's performance.  For a given code rate $R$, the

211: threshold is upper bounded by $1-R$ \cite{Luby}.

212:

213: With the help of the above density evolution method, we calculate

214: the error-correction capability of the scale-free networks given

215: in the form of Eq.~(\ref{eq3}). The results are shown in

216: Fig.~\ref{fig2} as a function of the maximum variable-node degree

217: $d_{max}$, where the code rate is fixed at $R=0.5$. It is seen

218: that the threshold erasure probability $\delta ^*$ increases as

219: the maximum variable-node degree increases. For large $d_{max}$,

220: the threshold almost reaches the theoretical upper bound $1-R$,

221: indicating that the error-correction capability of our code is

222: very good. For comparison, we have also studied the

223: error-correction capability of codes that have degree

224: distributions other than the power-law distribution, namely the

225: exponential distribution of the form $p(k)\sim e^{-\beta (k+\alpha

226: )}$ and the Gaussian distribution of the form $p(k)\sim e^{-\beta

227: (k+\alpha )^2}$. We find that the threshold for the exponential

228: distribution rapidly increases with $d_{max}$ and converges to

229: 0.465, a value much lower than the threshold for the power-law

230: distribution. The case for the Gaussian distribution is observed

231: to exhibit a similar behavior with a similar, low convergence

232: limit.

233:

234: \begin{figure}

235: \includegraphics[width=0.9 \columnwidth]{Fig2.ps}

236: \caption{\label{fig2} Error-correction capability of optimized

237: scale-free networks over a binary erasure channel. The code rate

238: is $R=0.5$. }

239: \end{figure}

240:

241: \begin{table}[b] %[H] add [H] placement to break table across pages

242: \caption{\label{tab1}Error-correction capabilities of scale-free

243: networks (SFN) and the Tornado code \cite{Shok} of rate $R=0.5$,

244: and parameters $\gamma$ and $\alpha$ optimizing the SFN.

245: The optimization was done by using a direction set method.}

246: \begin{ruledtabular}

247: \begin{tabular}{rcccccc}

248: \multicolumn{3}{c}{ } &\multicolumn{2}{c}{SFN}

249: &\multicolumn{2}{c}{Tornado \cite{Shok}} \\

250: \multicolumn{1}{c}{$d_{max}$} & $\gamma$ & $\alpha$ & $\delta ^*$

251: & $\langle k\rangle$

252: & $\delta ^*$ & $\langle k\rangle$ \\

253: \hline

254: 9 & 1.347 & -1.473 & 0.47875 & 2.97 & 0.44546 & 3 \\

255: 16 & 1.788 & -1.102 & 0.48633 & 3.30 & 0.46950 & 3.5 \\

256: 28 & 2.024 & -0.868 & 0.49163 & 3.57 & 0.48235 & 4 \\

257: 47 & 2.088 & -0.775 & 0.49477 & 3.88 & 0.48960 & 4.5 \\

258: 79 & 2.084 & -0.753 & 0.49689 & 4.24 & 0.49380 & 5 \\

259: 133 & 2.080 & -0.741 & 0.49810 & 4.60 & 0.49628 & 5.5 \\

260: 222 & 2.086 & -0.712 & 0.49862 & 4.94 & 0.49776 & 6 \\

261: 368 & 2.081 & -0.698 & 0.49895 & 5.31 & 0.49865 & 6.5 \\

262: 610 & 2.076 & -0.691 & 0.49920 & 5.68 & 0.49918 & 7 \\

263: 1009 & 2.073 & -0.687 & 0.49931 & 6.02 & 0.49951 & 7.5 \\

264: \end{tabular}

265: \end{ruledtabular}

266: \end{table}

267:

268: To more clearly demonstrate the high performance of codes on

269: scale-free networks, we compare them with the Tornado code

270: \cite{Luby,Shok}. The threshold of our code is presented in Table

271: \ref{tab1} along with the parameters $\alpha$ and $\gamma$ that

272: maximize the code's performance. Table \ref{tab1} shows that our

273: code yields better performance than the Tornado code for $d_{max}$

274: smaller than about 1000. Also shown in Table \ref{tab1} is the

275: average variable-node degree $\langle k\rangle$ of the two codes.

276: From a practical viewpoint, it is important to design a code that

277: yields good performance for small $\langle k\rangle$, since the

278: physical complexity of a code, which grows with increasing

279: $\langle k\rangle$, limits the hardware implementation of the

280: code. For this reason, our code seems to be better suited to

281: applications than the Tornado code.

282:

283: Another merit of our code is that the iteration number required

284: for convergence of decoding is very small. The iteration numbers

285: of our code and the Tornado code are compared in

286: Fig.~\ref{fig3}(a) as a function of the erasure probability for

287: the case of $d_{max}=610$, which clearly shows that our code has a

288: smaller iteration number than the Tornado code in the whole region

289: of $\delta$. Even for the case of $d_{max}>1000$ where the Tornado

290: code has a little higher threshold than our code, the iteration

291: number is smaller for our code than for the Tornado code over a

292: broad region of $\delta$, except near the threshold

293: [Fig.~\ref{fig3}(b)]. For an early convergence of decoding

294: processes, each node needs to gather messages from other nodes

295: quickly. This implies that graphs with smaller diameter may be

296: more advantageous to reducing the iteration number. This in turn

297: suggests that scale-free networks, which are known to have a very

298: small diameter $d \sim \ln{\ln{N}}$ \cite{Cohen} where $N$ is the

299: number of nodes, may require a smaller iteration number than

300: regular random networks or small-world networks \cite{Watt}.

301:

302: \begin{figure}

303: \includegraphics[width=0.95 \columnwidth]{Fig3.ps}

304: \caption{\label{fig3}

305: Iteration numbers of optimized scale-free networks

306: (solid curve) and the Tornado code (dashed curve) for maximum

307: degrees (a) $d_{max}=610$ and (b) $d_{max}=1009$. The criterion

308: for convergence of decoding is set to be $x_l < 1\times 10^{-6}$.

309: }

310: \end{figure}

311:

312: The error-correction capability of codes on scale-free networks can be

313: further enhanced by adjusting the degree distribution, especially in the

314: low degree region, so that it more closely models realistic

315: scale-free networks whose degree distribution does not necessarily

316: follow a power law for small $k$. While doing this, we try to keep

317: as small as possible the number of parameters added to the generating

318: function. After a number of numerical simulations we have found the

319: following generating function adequate enough for this purpose:

320: \begin{equation}\label{eq5}

321: G(x) = C\left[ w_2 p(2)x^2 +w_3 p(3)x^3 +\sum_{k=4}^{d_{max}} p(k)x^k

322: \right] ,

323: \end{equation}

324: where $p(k) = (k+\alpha )^{-\gamma}$. The performance of this code

325: is displayed in Table \ref{tab2}, which shows that by adding two

326: new parameters, $w_2$ and $w_3$, which permit the two lowest

327: degrees to vary from the power law, the performance of the code is

328: increased. Addition of more parameters is expected to give rise to an

329: increased performance, but at the expense of rendering the

330: optimization process more time consuming.

331:

332: \begin{table}

333: \caption{\label{tab2} Performance of codes on scale-free networks

334: [Eq.~(\ref{eq5})]. The same optimization parameters $\alpha$ and

335: $\gamma$ as in Table \ref{tab1} are used.}

336: \begin{ruledtabular}

337: \begin{tabular}{rcccc}

338: $d_{max}$ & $w_2$ & $w_3$ & $\delta ^*$ & $\langle k\rangle $ \\

339: \hline

340: 222 & 1.004 & 0.983 & 0.49885 & 4.94 \\

341: 368 & 1.004 & 0.982 & 0.49923 & 5.31 \\

342: 610 & 1.005 & 0.982 & 0.49945 & 5.68 \\

343: 1009 & 1.005 & 0.983 & 0.49955 & 6.01 \\

344: \end{tabular}

345: \end{ruledtabular}

346: \end{table}

347:

348: In summary, we have found that many high-performance LDPC codes

349: possess degree distributions well fitted by power-law functions

350: with exponents close to 2. Based on this finding, we have

351: developed codes on scale-free networks that have very good

352: error-correction capabilities. The codes with power-law degree

353: distribution yield better performance than those with exponential

354: and Gaussian degree distributions that have fast decreasing tails.

355: It also would be interesting to study the effect of degree correlations

356: on the performance of a code, which is left as future work.

357: As good error-correcting codes, the codes on scale-free networks

358: could find lucrative applications in areas as diverse as wireless

359: communication, media and data transfer over the Internet, and

360: storage.

361:

362:

363: \begin{thebibliography}{99}

364:

365: \bibitem{Alb1} R. Albert and A. L. Barab\'{a}si, Rev. Mod. Phys.

366:          {\bf 74}, 47 (2002).

367: \bibitem{Fal} M. Faloutsos, P. Faloutsos, C. Faloutsos, Comput.

368:         Commun. Rev. {\bf 29}, 251 (1999).

369: \bibitem{Cal} G. Caldarelli, R. Marchetti, and L. Pietronero, Europhys.

370:         Lett. {\bf 52}, 386 (2000).

371: \bibitem{Alb2} R. Albert, H. Jeong, and A. L. Barab\'{a}si, Nature

372:          {\bf 401}, 130 (1999).

373: \bibitem{Hub} B. A. Huberman, L. A. Adamic, Nature {\bf 401}, 131 (1999).

374: \bibitem{New1} M. E. J. Newman, Phys. Rev. E {\bf 64}, 016131 (2001).

375: \bibitem{Jeong} H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and

376:         A. L. Barab\'{a}si, Nature {\bf 407}, 651 (2000).

377: \bibitem{Bara} A. L. Barab\'{a}si and R. Albert, Science {\bf 286},

378:          509 (1999).

379: \bibitem{Krap} P. L. Krapivsky, S. Redner, and F. Leyvraz, Phys. Rev.

380:          Lett. {\bf 85}, 4629 (2000).

381: \bibitem{Doro} S. N. Dorogovtsev, J. F. F. Mendes, A. N. Samukhin,

382:         Phys. Rev. Lett. {\bf 85}, 4633 (2000).

383: \bibitem{Stro} S. H. Strogatz, Nature {\bf 410}, 268 (2001).

384: \bibitem{Berr} C. Berrou and A. Glavieux, IEEE Trans. Commun. {\bf 44},

385:          1261 (1996).

386: \bibitem{Gall} R. G. Gallager, {\it Low-Density Parity-Check Codes}

387:          (MIT Press, Cambridge, MA, 1963).

388: \bibitem{Mac1} D. J. C. MacKay, IEEE Trans. Inform. Theory {\bf 45}, 399

389:          (1999).

390: \bibitem{Rich} T. J. Richardson and R. L. Urbanke, IEEE Trans. Inform.

391:          Theory {\bf 47}, 599 (2001); T. J. Richardson, M. A. Shokrollahi,

392:          and R. L. Urbanke, {\it ibid}. {\bf 47}, 619 (2001).

393: \bibitem{Forney} G. D. Forney, Physica A {\bf 302}, 1 (2001);

394:          Special Issue on Codes on Graphs and Iterative Algorithms,

395:          IEEE Trans. Inform. Theory {\bf 47}, No. 2 (2001).

396: \bibitem{Luby} M. G. Luby, M. Mitzenmacher, M. A. Shokrollahi, and

397:          D. A. Spielman, IEEE Trans. Inform. Theory {\bf 47}, 569

398:          (2001).

399: \bibitem{Chung} S. Y. Chung, G. D. Forney, Jr., T. J. Richardson,

400:          and R. Urbanke, IEEE Commun. Lett. {\bf 5}, 58 (2001).

401: \bibitem{Shok} A. Shokrollahi, in {\it Proc. 13th Int. Symp. Applied

402:          Algebra, Algebraic Algorithms, and Error-Correcting Codes},

403:          edited by M. Fossorier, H. Imai, S. Lin, and A. Poli,

404:          Lecture Notes in Computer Science Vol. 1719 (Springer-Verlag,

405:          New York, 1999); P. Oswald and A. Shokrollahi, IEEE Trans.

406:          Inform. Theory {\bf 48}, 3017 (2002).

407: \bibitem{New2} M. E. J. Newman, S. H. Strogatz, and D. J. Watts,

408:          Phys. Rev. E {\bf 64}, 026118 (2001).

409: \bibitem{Mac2} D. J. C. MacKay, S. T. Wilson, and M. C. Davey,

410:          IEEE Trans. Commun. {\bf 47}, 1449 (1999).

411: \bibitem{Cohen} R. Cohen and S. Havlin, Phys. Rev. Lett. {\bf 90},

412:          058701 (2003).

413: \bibitem{Watt} D. J. Watts and S. H. Strogatz, Nature {\bf 393},

414:          440 (1998).

415: \end{thebibliography}

416:

417:

418: \end{document}

419: