0012:cs0012019/ms.tex

1: \documentstyle[12pt,aasms4]{article}%,aasms

2: \def\abs{\mid}

3: \def\be{\begin{equation}}

4: \def\ee{\end{equation}}

5: \def\beq{\begin{eqnarray}}

6: \def\eeq{\end{eqnarray}}

7: \def\lra#1{\left\langle #1\right\rangle}

8: \def\part{\partial}

9: \def\nn{\nonumber}

10: \def\l{\left}

11: \def\r{\right}

12:

13: \def\Q{{\bf Q}}

14: \def\br{{\bf r}}

15: \def\la{\lambda}

16: \def\f{{\bf f}}

17: \def\k{{\bf  k}}

18: \def\hbz{\hat {\bf z}}

19: \def\hbw{\hat {\bf w}}

20: \def\bj{{\bf j}}

21: \def\ep{\epsilon}

22: \def\w{{\bf w}}

23: \def\a{{\bf a}}

24: \def\A{{\bf A}}

25: \def\x{{\bf x}}

26: \def\ve{\varepsilon}

27: \def\etal{{\it et al.\ }}

28: \def\b{{\bf b}}

29: \def\B{{\bf B}}

30:

31: \def\v{{\bf v}}

32: \def\V{{\bf V}}

33: \def\OV{\overline{\V}}

34: \def\OB{\overline{\B}}

35: \def\ob{\overline{B}}

36: \def\hw{\hat w}

37: \def\kdotB{\k \cdot \OB}

38:

39:

40: \begin{document}

41: \baselineskip=24pt

42: \begin{center}

43: {\Large\bf A Note on Power-Laws of Internet Topology}

44: \bigbreak

45: {\large\bf by}

46: \medbreak

47: {\large\bf Hongsong Chou } \\

48: {\it Harvard University, Cambridge, MA 02138}\\

49: {\it chou5@fas.harvard.edu}

50: \end{center}

51: \begin{abstract}

52: The three Power-Laws proposed by Faloutsos \etal(1999) are important

53: discoveries among many recent works on finding hidden rules in the

54: seemingly chaotic Internet topology. In this note, we want to point out

55: that the first two laws discovered by Faloutsos \etal(1999, hereafter,

56: {\it Faloutsos' Power Laws}) are in fact equivalent. That is, as long as any one of

57: them is true, the other can be derived from it, and {\it vice

58: versa}. Although these two laws are equivalent, they provide different

59: ways to measure the exponents of their corresponding power law

60: relations. We also show that these two measures will give equivalent

61: results, but with different error bars. We argue that for nodes of not

62: very large out-degree($\leq 32$ in our simulation), the first Faloutsos'

63: Power Law is superior to the second one in giving a better estimate of

64: the exponent, while for nodes of very large out-degree($> 32$) the

65: power law relation may not be present, at least for the relation

66: between the frequency of out-degree and node out-degree.

67: \end{abstract}

68:

69: \section{Introduction}

70: The past five years has been the golden time for the 30 year old Internet,

71: during which it experienced fascinating evolution, both exponential

72: growth in its traffic and endless expansion in its topology. Such

73: growth makes more thorough and rigorous analysis of the nature of

74: Internet traffic and topology an urgent task. It is also a very

75: difficult one. It was ever believed that the mathematical theories for

76: {\it circuit switching} telephone networks might be good enough for

77: analyzing the Internet traffic and topology. However, later it was found that

78: the Internet, as a {\it package switching} network, has very different

79: nature and successful mathematical theories for the Internet can be

80: quite different from those for telephone networks(see Willinger and

81: Paxson(1998) for more details).

82:

83: As the Internet grows at an astonishing speed, more and more high

84: quality data have been collected in recent years. These data make thorough

85: studies possible. The pioneering work by Leland \etal(1994) shows that

86: the traffic of Local Area Network(LAN) appears to be self-similar at different

87: scales. Discoveries of other self-similarities such as the one found in

88: Wide Area Network(WAN) by Paxson and Floyd(1995) make Internet

89: engineers and interested mathematicians contemplate that some special

90: power laws such as the heavy tail distribution might be the hidden

91: rules in Internet traffic(see Willinger and Paxson(1998) or Willinger,

92: Paxson and Taqqu(1998)).

93:

94: The discovery of three Power Laws by Faloutsos, Faloutsos and

95: Faloutsos(1999) is one of the most recent work on the Internet

96: topology. They view the Internet as an undirected graph. For each node

97: in the graph, it has properties such as the out-degree. Faloutsos'

98: discovery is not just power laws for the large scale properties of

99: the Internet, but rather the relationships between nodes at different

100: scales, running from a host on a LAN to the range encompassed by the whole

101: Internet. Without a doubt, such discovery is important not only to our

102: understanding of the very nature of the rapidly growing Internet, but also

103: to any reasonable simulations of LANs or WANs or the whole Internet.

104:

105: Yet, as we will point out in section 2 of this note, the first two Power Laws in

106: Faloutsos' discovery are not independent to each other. In fact, they

107: are equivalent so each one can be derived from the other. In section 3,

108: we

109: will go further to show that the data analysis in Faloutsos' work

110: toward discovering Power Law 1 is superior to the data analysis work

111: done for Power Law 2, simply because the former data analysis will give

112: more accurate estimate comparing to the later one. Conclusions are

113: summarized in the last section, section 4.

114:

115: \section{Equivalence Between the first and the second Faloutsos' Power

116: Laws}

117:

118: Throughout this note, we adopt the notations used in Faloutsos'

119: work. The Internet is viewed as an undirected graph $G$, and the number

120: of nodes and the number of edges in $G$ are $N$ and $E$,

121: respectively. The out-degree of a node $v$, which is the number of edges

122: incident to the node, is denoted by $d_v$. Note that in $G$, different

123: nodes may have the same out-degree. That is, if we can group all the

124: nodes which have the same out-degree $d$ and index the group, then for

125: the nodes in the $l^{th}$ group, they have the same out-degree denoted

126: by $dl$. The number of nodes in this $l^{th}$ group, which gives the

127: frequency of appearances of out-degree $dl$ in $G$, is denoted by

128: $f_{dl}$. Sometimes we just write $f_d$ to denote the frequency of $d$

129: in $G$. This is because the out-degree $d$, which always starts from 1

130: throughout this note, can be used to index the groups of nodes of different

131: out-degrees, thus $f_d$ is the number of nodes in the $d^{th}$ group for

132: out-degree $d$. The {\it rank}, $r_{v}$, of a node $v$ which has an

133: out-degree $d$ is the global index of the node among all of the nodes

134: in the order of decreasing out-degree.

135:

136: The first and second Faloutsos' Power Laws can be stated as

137: \be

138: d_v=C_1 r_{v}^R

139: \ee

140: and

141: \be

142: f_{d}=C_2 d^O

143: \ee

144: respectively. Here $C_1$ is a constant and can be determined by any given pair of

145: $d_v$ and $r_v$ measured from data collected from the Internet. $C_2$

146: is another constant and can be calculated from a pair of $f_{d}$ and

147: $d$. $R$ and $O$ are the two exponents of the Power Laws.

148:

149: By definition, the frequency of out-degree $d$ in $G$, $f_d$, is

150: related to the ranks of those nodes which have out-degree $d$. Suppose

151: node $v_{d-1}$ is a node of out-degree $d-1$, and it is the last

152: indexed node with rank $r_v^{\prime}$ in the group consisting of nodes

153: which all have out-degree $d-1$. Further suppose that node $v_d$ is a node of

154: out-degree $d$, and it is the last indexed node, with rank $r_v$, in

155: the group consisting of nodes of out-degree $d$. Then $f_d$ is related

156: to $r_v^{\prime}$ and $r_v$ through the relation

157: \be

158: f_{d} = r_v^{\prime} - r_v.

159: \ee

160: We may re-write relation (1) as

161: \be

162: r_v=\left ( \frac{1}{C_1} \right )^{\frac{1}{R}} d_v^{\frac{1}{R}}.

163: \ee

164: Note that the first order approximation to the right hand side of (3) is

165: in fact the first order derivative of the right hand side of (4) with

166: respect to $d$:

167: \be

168: f_d={r_v^{\prime} - r_v} \approx -\frac{1}{R}\left ( \frac{1}{C_1} \right

169: )^{\frac{1}{R}} d^{\frac{1}{R}-1}

170: \ee

171: From (4) to (5) we changed $d_v$ to $d$ because for all nodes of the group where

172: node $v_d$ is in, they all have the same out-degree $d$. Comparing (5) and (2), we have

173: \be

174: O \approx \frac{1}{R}-1

175: \ee

176: and

177: \be

178: C_2 \approx -\frac{1}{R}\left ( \frac{1}{C_1} \right

179: )^{\frac{1}{R}}.

180: \ee

181: Thus we have derived the second Power Law from the first Power Law. To

182: derive the first Power Law from the second, we have to integrate

183: relation (2) from 1 to $d_v$ to get $r_v$, then compare the result with

184: relation (4). By doing this, we have

185: \be

186: R \approx \frac{1}{O+1}

187: \ee

188: and

189: \be

190: C_1 \approx \left (\frac{-O-1}{C_2} \right)^R.

191: \ee

192: (6), (7) and (8), (9) shows that whenever we have one of the two Power

193: Laws, exponent and the constant of the other one can be derived from

194: the given parameters. That is, in data analysis of the Internet

195: topology, once we have measured $r_v$ at different out-degrees and found

196: a power law relation with exponent $R$ between them, we do not need to measure

197: $f_d$ at different out-degrees because the power law relation between

198: $r_v$ and out-degree will guarantee the power law relation between $f_d$

199: and $d$ with an exponent $O \approx \frac{1}{R}-1$. In simulations,

200: samples generated according to Power Law 1 will follow Power Law 2

201: automatically, and {\it vice versa}.

202:

203: In Table 1 and Table 2, we list the comparisons of

204: the derived parameters using above relations and the measured

205: parameters given in the work of Faloutsos'. From table 1 we find our

206: calculated exponent $O$ of Power Law 2 is quite close to the measured

207: one, except the last case, which is the Rout-95 dataset. The small

208: discrepancy shows that mere coincidence is not likely. For the

209: comparison of our calculated exponent $R$ and the measured $R$ in table

210: 2, although the relative errors are larger than those in table 1, for

211: the first three cases they are still below 15\%.

212:

213: \section{Better Way to Estimate Exponent}

214: When deriving relations (6) and (8) in above section, we assumed first

215: that one of the Power Laws must hold. In the derivation, we used

216: differentiation and integration, which can only be approximately

217: correct because the real datasets are discrete samples. Suppose at one

218: sampling position, such as an out-degree $d$, the measured rank is $r_v$,

219: and our calculated rank by integrating equation (2) is ${\hat r}_v$. We denote the

220: difference between $r_v$ and ${\hat r}_v$ by $\epsilon$, and call it an

221: {\it error term} for the estimation of rank at out-degree $d$. We

222: can define a similar error term, $\eta$, for the estimation of

223: frequency at out-degree $d$ with equation (5). The errors, both

224: $\epsilon$ and $\eta$, can be the measurement errors, the round-off

225: errors, or the errors due to the discrete nature of our sampling, and

226: in most cases, the combination of them all.

227:

228: Non-zero $\epsilon$ will affect our estimation of $O$ made in table

229: (1). Non-zero $\eta$ will also affect our estimation of $R$ in table

230: (2), but in a different way from how $\epsilon$ affects estimating

231: $O$. We find that the relative errors for the first three cases in

232: table (1) are smaller than those in table (2). In other words, the

233: derivation of $O$ from $R$ by differentiating equation (4) gives closer

234: to measured results than the derivation of $R$ from $O$ by integrating

235: equation (2). This is because that if rank $r_v$ has error $\epsilon$,

236: then from equation (3) the error in $f_d$ will be of the order

237: $\sim O(\epsilon)$. However, if $f_d$ has error $\eta$, then the error in

238: $r_v$, which can be obtained by integrating equation (2), is in fact an

239: accumulation of $\eta$ in the summation, which is $\sim O(n\eta)$ where

240: $n$ is the number of out-degrees used in the integration. Hence, even

241: though the two Power Laws are equivalent, deriving the second Power Law

242: from the first one will give better estimate, i.e., estimate with smaller

243: errors if $\epsilon \sim \eta$, of the second Power Law than the

244: estimate of the first Power Law derived from the second one.

245:

246: In Faloutsos' work, they applied linear regression to obtain the Power

247: Laws. We have shown above that Power Laws 1 and 2 are equivalent,

248: therefore the two linear regressions applied in Faloutsos'  work should

249: give the same answer.  That is, if we start from a Power Law relation

250: between rank $r$ and out-degree $d$, for example, $d = C_1 r^R$,  and

251: deduce the relation between frequency $f_d$ and $d$, the relation should also

252: be a power law. for example, $f_d = C_2 d^O$, and the exponent $O$

253: should be related to $R$ through (6). In Figure 1, the $\star$'s show the relation

254: \be

255: d = C_1 r^{-1.0}.

256: \ee

257: Note the logarithmic scales on both axes. There are 2000 data

258: generated, so the rank $r$ runs from 1 to 2000. The heavy solid line is

259: the linear fitting to the $\star$'s, with slope $-0.85$, instead of $-1$ as

260: we expect. This is due to the discretization of the data. The $\star$'s with

261: out-degree $d>1$ have a linear fitting of slope $-0.97$, which is shown

262: in Figure 1 as the dash line. Apparently the data of out-degree 1 have

263: large effect on the linear fitting.

264:

265: In Figure 2, we plot the relation between $f_d$ and $d$ based on the

266: data($\star$'s) collected in Figure 1. A few data of frequency 1 and

267: out-degree $d>33$ are outliers and discarded in the fitting made in Figure

268: 2, which is shown as a heavy solid line. The number of these discarded outliers

269: is 26, only 1.3\% of the total data. The slope of the linear fitting is

270: $-2.01$, which is what we expect because of the equivalence of the two

271: Power Laws, i.e., equation (6).

272:

273: \section{Discussions and Conclusions}

274: Given the definitions of frequency $f_d$ and rank $r$, it is not

275: surprising to see the equivalence of the first two Power Laws proposed by

276: Faloutsos \etal We have proved such equivalence and demonstrated the mutual

277: determination of these two relations, therefore it is not possible nor

278: necessary for any simulations to follow these two power relations

279: independently. However, as we have shown in section 3, determining the

280: power relation between frequency $f_d$ and out-degree $d$ by analyzing

281: the data of rank $r_v$ as a function of our-degree $d$, will give

282: estimates of smaller error comparing to the estimate made in reversed

283: order, i.e., the estimate of the power law relation between rank $r$

284: and out-degree $d$ by analyzing the data of frequency $f_d$ as a

285: function of out-degree $d$. For any set of data measured from the

286: Internet, they will follow the power law only {\it approximately}, not

287: always exactly, especially the nodes of very high out-degree and

288: frequency 1, or the nodes of out-degree 1, as we show in Figures (1) and

289: (2). In simulations, these nodes should be treated with special care.

290:

291: If the probability density function for the appearance of nodes with out-degree

292: $s$ in the Internet is $\rho(s)$, then the average number of nodes

293: whose out-degrees run from $d_1$ to $d_2$ is

294: \be

295: \int_{d_1}^{d_2} s \rho(s) ds,

296: \ee

297: with which we can deduce the relation between frequency $f_d$ at out-degree

298: $d$ and the probability density function $\rho(s)$ as

299: \be

300: f_d \approx \int_{d-\Delta d}^{d+\Delta d} s \rho(s) ds \approx \rho(d)

301: d,

302: \ee

303: where we assume $2 \Delta d = 1$. If $f_d = C_2 d^O$, we have the

304: probability density function $\rho(d)$ as

305: \be

306: \rho(s) \approx C_2 d^{O-1}

307: \ee

308: which is a heavy tail distribution. The rank $r$ is related to the

309: function $\rho(d)$ through the integration

310: \be

311: r(d)=\int_1^d s \rho(s) ds

312: \ee

313: for $d>1$.

314:

315: The Power Laws show the relations between nodes of different

316: out-degrees when the Internet is in steady state. In order to study

317: the dynamics of the Internet, it would be very interesting to inject

318: nodes of some specific out-degrees into the Internet, and follow the

319: temporal evolution of these nodes. By the time we inject nodes of some

320: specific out-degree, we alter the power law relationship between $f_d$

321: and $d$ by adding a spike-like disturbance(see Figure 3). If a steady

322: Internet does follow power laws, tracing the evolution of the

323: spike-like disturbance will tell us how the disturbance will be

324: propagated, or {\it cascaded}, toward higher out-degree and lower

325: out-degree directions(shown by the two arrows in Figure 3), the spike

326: being broadened at the same time(shown by the dash line in Figure

327: 3). In real life, such spike-like disturbance could be due to the sharp

328: increase in the number of Internet users signing onto their

329: ISPs. Studies on the dynamic evolution of the Internet due to such

330: spike-like disturbances will be included in our future work.

331: \acknowledgements

332:

333: \clearpage

334: \begin{deluxetable}{ccccc}

335: \footnotesize

336: \tablecaption{Exponent O: measured and calculated with equation (6). \label{tbl-1}}

337: \tablewidth{0pt}

338: \tablehead{

339: \colhead{dataset} & \colhead{measured R} & \colhead{measured O} &

340: \colhead{calculated O } & \colhead{relative error}

341: }

342:

343: \startdata

344: Int-11-97 &$-$0.81 &$-$2.15 &$-$2.23 &4\% \nl

345: Int-04-98 &$-$0.82 &$-$2.16 &$-$2.22 &4\% \nl

346: Int-12-98 &$-$0.74 &$-$2.20 &$-$2.35 &7\% \nl

347: Rout-95 &$-$0.48 &$-$2.48 &$-$3.08 &25\% \nl

348: \enddata

349:

350: \end{deluxetable}

351:

352: \clearpage

353: \begin{deluxetable}{ccccc}

354: \footnotesize

355: \tablecaption{Exponent R: measured and calculated with equation (8). \label{tbl-2}}

356: \tablewidth{0pt}

357: \tablehead{

358: \colhead{dataset} & \colhead{measured O} & \colhead{measured R} &

359: \colhead{calculated R } & \colhead{relative error}

360: }

361:

362: \startdata

363: Int-11-97 &$-$2.15 &$-$0.81 &$-$0.87 &7.4\% \nl

364: Int-04-98 &$-$2.16 &$-$0.82 &$-$0.86 &5\% \nl

365: Int-12-98 &$-$2.20 &$-$0.74 &$-$0.83 &12\% \nl

366: Rout-95 &$-$2.48 &$-$0.48 &$-$0.68 &42\% \nl

367: \enddata

368:

369: \end{deluxetable}

370:

371: \begin{thebibliography}{}

372:

373: \bibitem[]{}Faloutsos, M., Faloutsos, P. and Faloutsos, C.: On

374: Power-Law Relationships of the Internet Topology, SIGCOMM'99,

375: Cambridge, MA. http://www.cs.ucr.edu/~michalis/papers.html

376:

377: \bibitem[]{}Leland, W.E., Taqqu, M.S., Willinger, W. and Wilson, D.V.:

378: On the self-similar nature of ethernet traffic. {\it IEEE Transactions

379: on Networking}, 2(1):1-15, February 1994.

380:

381: \bibitem[]{}Paxson, V. and Floyd, S.: Wide-area traffic: The failure of

382: Poisson modeling. {\it IEEE/ACM Transactions in Networking},

383: 3(3):226-244, June 1995

384:

385: \bibitem[]{}Willinger, W. and Paxson, V,: Where Mathematics meets the

386: Internet. In {\it Notices of the American Mathematical Society}, 45(8),

387: pp.961-970, Sept. 1998

388:

389: \bibitem[]{}Willinger, W., Paxson, V. and Taqqu, M.S.: Self-similarity

390: and heavy-tails: Structure modeling of network traffic. In {\it A

391: Practical Guide to Heavy Tails: Statistics; Techniques and Applications},

392: 1998. Adler, R., Feldman, R. and Taqqu, M.S., editors, Birkhauser

393:

394: \end{thebibliography}

395:

396: \clearpage

397:

398: \begin{figure}[htbp]

399:

400: \plotfiddle{f1.eps}{6in}{0}{70}{70}{-210}{-25}

401:

402: \caption{Out-degree $d$ {\it vs.} rank $r$. The $\star$'s are 2000 data

403: points obtained from the relation (10) in text. The heavy solid line of

404: slope $-0.85$ is the fitting to these data. Dash line is the fitting to

405: the data of out-degree greater than 1, with slope $-0.97$.

406: }

407: \end{figure}

408:

409: \begin{figure}[htbp]

410:

411: \plotfiddle{f2.eps}{6in}{0}{70}{70}{-210}{-25}

412:

413: \caption{Frequency {\it vs.} out-degree $d$, following Fig. 1. The

414: $\star$'s are data calculated with relation (3) in text. The heavy solid line is the fitting to

415: data of out-degrees less than 33. The number of discarded data in the

416: linear fitting is 26, only 1.3\% of the total data.}

417: \end{figure}

418:

419: \begin{figure}[htbp]

420:

421: \plotfiddle{f3.eps}{6in}{0}{70}{70}{-210}{-25}

422:

423: \figcaption{The cascade of spike-like disturbance of steady state

424: Internet toward large or small out-degree directions.}

425: \end{figure}

426:

427: \end{document}

428: