0608:cs0608088/rad.tex

1: \documentclass[rmp,showpacs,twocolumn,superscriptaddress,floatfix]{revtex4}

2:

3: \usepackage{dcolumn,graphicx,amsmath,amssymb,txfonts}

4:

5:

6: \begin{document}

7:

8: \title{Radial Structure of the Internet}

9:

10: \author{Petter Holme}

11: \affiliation{Department of Computer Science, University of New Mexico,

12:   Albuquerque, NM 87131, U.S.A.}

13:

14: \author{Josh Karlin}

15: \affiliation{Department of Computer Science, University of New Mexico,

16:   Albuquerque, NM 87131, U.S.A.}

17:

18: \author{Stephanie Forrest}

19: \affiliation{Department of Computer Science, University of New Mexico,

20:   Albuquerque, NM 87131, U.S.A.}

21: \affiliation{Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM

22:   87501, U.S.A.}

23:

24:

25: \begin{abstract}

26: The structure of the Internet at the Autonomous System (AS) level has

27: been studied by both the Physics and Computer Science communities. We

28: extend this work to include features of the core and the periphery,

29: taking a radial perspective on AS network structure.  New methods for

30: plotting AS data are described, and they are used to analyze data sets

31: that have been extended to contain edges missing from earlier

32: collections. In particular, the average distance from one vertex to

33: the rest of the network is used as the baseline metric for

34: investigating radial structure.  Common vertex-specific quantities are

35: plotted against this metric to reveal distinctive characteristics of

36: central and peripheral vertices. Two data sets are analyzed using

37: these measures as well as two common generative models

38: (Barab\'{a}si-Albert and Inet).  We find a clear distinction between

39: the highly connected core and a sparse periphery.  We also find that

40: the periphery has a more complex structure than that predicted by

41: degree distribution or the two generative models.

42: \end{abstract}

43:

44: \pacs{89.20.Hh,89.75.Fb, 89.75.Hc}

45: % 89.20.Hh -- World Wide Web, Internet

46: % 89.75.Fb -- Structures and organization in complex systems

47: % 89.75.Hc -- Networks and genealogical trees

48:

49: \maketitle

50:

51: \section{Introduction}

52:

53: Since the turn of the century there has been increasing interest in the

54: statistical study of networks~\cite{ba:rev,doromen:book,mejn:rev},

55: stimulated in large part by the availability of large-scale network

56: data sets.  One network of great interest is the

57: Internet~\cite{vesp:inet}. The Internet is intriguing because its

58: complexity and size preclude comprehensive study. It is comprised of

59: millions of individual end-nodes connected to tens of thousands of ISPs

60: whose relationships are continually in flux and only partially

61: observable.  One way to cope with these complexities is by analyzing a

62: single scale of Internet data, for example, a local office network of

63: computers and their inter-connections; a network of email address book

64: contacts; the network formed by URL links on the World Wide Web; or the

65: interdomain (Autonomous System) level of the Internet. This paper is

66: concerned with the last of these examples---the AS graph. The vertices

67: in the graph are themselves computer networks; roughly speaking an AS is

68: an independently operated network or set of networks owned by a single entity. Edges represent pairs of ASs that can directly

69: communicate.

70:

71:

72: A major finding of earlier AS studies is that node degree (number of

73: links to other ASs) has a power law distribution~\cite{f3}. The degree

74: distribution is, however, not the only structure that affects

75: Internet dynamics~\cite{hot:inet}. In this paper we investigate

76: higher-order (beyond the degree distribution) network structures that

77: also impact network dynamics. We analyze the AS graph using methods

78: that are appropriate for networks with a clear hierarchical

79: organization~\cite{vesp:inet,ala:hier}. In particular, we study

80: network quantities as a function of the average distance to other

81: vertices. This approach allows us to separate vertices of different

82: hierarchical levels, in a radial fashion, ranging from central (in the

83: sense of the closeness centrality~\cite{sab:clo}) to peripheral

84: vertices. This is, furthermore, a way to dissolve how clearly

85: separated the core and the periphery are. Most analysis methods

86: developed by physicists (degree frequencies, correlations, etc.)\ are

87: based on quantities averaged over the whole network and do not take a

88: hierarchical partitioning into account~\cite{vesp:inet}. Studies by

89: computer scientists, on the other hand, assume a division of the AS

90: level Internet into hierarchical levels~\cite{rex:infer}.  We will

91: argue that the observed AS level networks do have pronounced

92: core-periphery dichotomy but that the periphery has more structure

93: than previously thought.

94:

95:

96: \section{Networks}

97:

98: This section briefly reviews the organization of the

99: AS-level Internet and describes how we obtained our data sets. We also

100: describe the network models to which we compare our observed

101: data. These models include one randomization scheme that samples random

102: networks with the same set of degrees as the original networks, and the generative BA and Inet models. Technically all three models are

103: null-models, but to contrast the randomized networks (having $N$

104: degrees of freedom) with the generative models (having only a few

105: degrees of freedom) we reserve the term null-model to the former.

106:

107: The data are represented as a network $G=(V,E)$ where $V$ is a set

108: of $N$ vertices (ASs) and $E$ is a set of $M$ undirected edges

109: (connections between ASs).

110: The Internet is currently composed of roughly $22,000$ individual

111: networks known as Autonomous Systems.  Each of these systems peer with

112: a (usually small) set of ASs to form a connected network.  The

113: protocol used to establish peering sessions and discover routes to

114: distant ASs is called the Border Gateway Protocol (BGP).  Two typical

115: peering relationships are: customer-provider in which the provider

116: provides connectivity to the rest of the Internet for the customer;

117: and peer-peer in which the peering ASs transfer traffic between their

118: respective customers. The extreme core of the network, the Tier-1 ASs, have many peer-peer and

119: customer links but no providers.  Nodes closer to the periphery of the

120: network have fewer customers and peers but more providers.

121:

122:

123: \subsection{AS networks}

124:

125: We analyze four real-world data sets (that is, data sets collected

126: using observed network data rather than simulated networks that are

127: generated synthetically), of which two are original. The first two are

128: well-known and well-studied~\cite{mich:as} dating from 2002 and the

129: second two data sets are recent, inferred from 2006 data. The first

130: graph in each pair consists of edges learned solely from dumps of

131: router state, known as Routing Information Bases (RIBs)

132: (\url{http://www.routeviews.org/data.html}).  RIBs are a standard

133: source of AS connectivity data.  The second graph in each pair

134: contains RIB information augmented with edges derived from other

135: sources (such as routing registries, looking glass servers, and update

136: messages) which produces a more accurate representation of the real

137: network.  The additional sources are described below.

138:

139:

140: \subsubsection{Obtaining RIBs from Route Views}

141:

142: BGP routers store the most recent AS path for each IP block (prefix)

143: announced by its peers.  These data are stored in the router's RIB,

144: and periodic RIB dumps from a large number of voluntary sources are

145: available from Route Views (\url{http://www.routeviews.org}). Each RIB

146: represents a static snapshot of all routes available to the router

147: from which it was obtained.  Since BGP only disseminates each router's

148: best path, and this value is dynamic as links go up and down, a

149: sizable portion of the network is hidden from each router.  In order

150: to obtain a more complete topology, common practice is to take the

151: union of the relationships found in a large number of RIB samples.

152: From the samples, AS relationships are then inferred from the routing

153: paths.  A path is comprised of connected ASs and therefore each pair

154: of adjacent ASs in a path corresponds to an edge in the graph.

155:

156:

157: The 2002 graph taken from a single RIB (RIB '02) was inferred from Route

158: Views on May 15th of 2002.  We constructed the 2006 RIB graph (RIB '06)

159: from the Route Views RIB on May 16th of 2006. The RIB '02 graph has

160: $N=13233$ and $M=27724$ while RIB '06 has $N=22403$ and $M=46343$.

161:

162: \subsubsection{Extending the RIB Dataset}

163: \label{sec:extended}

164:

165: There are other sources of AS connectivity data besides Route Views.

166: RIPE (\url{http://www.ripe.net}) has data collected from additional RIBs

167: beyond those contained in the Route Views data.  Peering information

168: is directly available for a small number of ASs that are participating

169: Looking Glass (\url{http://www.traceroute.org}) routers.  Finally, some

170: ASs register their peering relationships in regional registries such

171: as RIPE. The extended 2002 AS graph (AS '02) was constructed using

172: inferred topologies from all three of these sources, together with the

173: original Route Views data.

174:

175: RIB data represent a brief snapshot of routing state.  There are many

176: paths that a router sees only briefly, and the chances of capturing

177: all of them from just a few RIB dumps is unlikely.  In the extended

178: AS-graph of 2006 (AS '06), we augmented the Route Views RIB data with

179: all of the paths found in BGP update messages for the entire month of

180: April 2006 from both Route Views and RIPE.  This gives a more

181: complete picture over time, although it is still biased by the limited

182: number of routers from which the data were collected.

183:

184: The extended 2002 AS-graph (AS '02) has $N=13579$, $M=37448$ and the

185: corresponding 2006 network (AS '06) has $N=22688$ $M=62637$. Thus the

186: extended data sets have $35\%$ (2002) and $67\%$ (2006) more edges than

187: their RIB counterparts.

188:

189: \subsection{Null-model networks}

190:

191: We are interested in network structure beyond degree distribution, so

192: we compare our AS network data against a null model

193: with the same degree distribution.  Our null model is a random network

194: constrained to have the same set of degrees as the original network.

195: By comparing results for the observed networks with the same quantities

196: for the null model, we can observe additional network structure if it

197: exists.  The standard way to sample such networks is by randomizing

198: the original network with stochastic rewiring of the edges (see

199: Ref.~\cite{gale:rew} for an early example). In our implementation we

200: create a new random network by enumerating the edges $E$ of the

201: original graph, and for each edge $(i,j)$ we are:

202: \begin{enumerate}

203: \item Choosing another edge $(i',j')$ randomly and replacing $(i,j)$

204:   and $(i',j')$ with $(i,j')$ and $(i',j)$. If this creates a

205:   multi- or self-edge, then we are reverting to the original edges  $(i,j)$ and

206:   $(i',j')$, and repeating with a new  $(i',j')$.

207: \item \label{step:rew3} Choosing two edges

208:   $(i_1,j_1)$ and $(i_2,j_2)$ and replacing them along with $(i,j')$ by

209:   $(i_1,j')$, $(i,j_2)$ and $(i_2,j_1)$.

210: \end{enumerate}

211: Step \ref{step:rew3} guarantees ergodicity of the

212: sampling~\cite{roberts:mcmc}, i.e.\ that one can go between any pair of graphs

213: with a given set of degrees by successive edge-rewirings.

214:

215: \subsection{Generative network models}

216:

217: In addition to the observed (inferred from data) and null-model networks described above,

218: we also study networks produced according to two previously proposed

219: network-generation schemes~\cite{ba:model,inet}. The first is the well-known the

220: Barab\'{a}si-Albert preferential attachment model~\cite{ba:model}.

221: The second, known as the Inet model (version 3.0)~\cite{inet}, is

222: more complex and designed specifically for creating networks with AS

223: graph properties.

224:

225: \subsubsection{Barab\'{a}si-Albert model}

226:

227: The Barab\'{a}si-Albert (BA) model is a general growth model for

228: producing networks with power-law degree distributions

229: Ref.~\cite{ba:model}. Vertices and edges are iteratively added to the

230: network according to a preferential attachment rule, which ensures

231: that a power-law degree distribution emerges.

232:

233: More precisely, the initial configuration consists of $m$ isolated

234: vertices. From this configuration the network is iteratively grown.

235: At each time step one vertex is added together with $m$ edges leading

236: out from the new vertex. The edges are attached to vertices in the

237: graph such that:

238: \begin{enumerate}

239: \item The probability of attaching to a vertex $i$ is proportional to

240:   $k(i)$.

241: \item No multiple edges, or self-edges, are formed.

242: \end{enumerate}

243: This procedure produces a network which has,

244: in the $N\rightarrow\infty$ limit, a degree

245: distribution $P(k)\sim k^{-3}$ for $k\geq m$, and $P(k)= 0$ for $k<m$.

246:

247: Because the BA model has only one integer parameter it is not very

248: flexible at fitting data. We use $m=3$ to make the average degree

249: as similar to the AS networks as possible. Other preferential

250: attachment models (e.g., Ref.~\cite{pas:inet}), can model the average

251: degree and slope of the degree distribution more closely. Such

252: improvements, we believe, are unlikely to change the conclusions drawn from the

253: original BA model.

254:

255:

256: \subsubsection{Inet model}

257:

258: The Inet model~\cite{inet} is less general than BA's. Its objective is

259: to regenerate the AS graph as accurately as possible rather than to focus

260: on a single mechanism to create and explain scale-free networks.  The

261: scheme is rather detailed and we only sketch its strategy here.  Starting with $N$ vertices, Inet first generates random

262: numbers that represent the final degree of the vertices such that

263: the degree distribution matches the observed distribution of the

264: AS-graph as closely as possible. This means that the low-degree end of

265: the distribution is more accurately modeled by Inet than the BA model

266: because the BA model will not produce a vertex with degree less than $m$.

267: In the real AS-graph there are a considerable fraction of

268: degree-one vertices. After the degrees are assigned to the vertices,

269: edges are added in such a way that the degree-degree correlation

270: properties of the original AS-graph is matched as closely as

271: possible.

272:

273: A more detailed explanation of this procedure and its rationale are

274: given in Ref.~\cite{inet}. We use Inet's default parameter settings,

275: except $N$ which we extracted from our datasets, producing an average

276: degree that is approximately six.

277:

278: \section{Numerical results}

279:

280: \begin{figure}

281:   \resizebox*{0.9 \linewidth}{!}{\includegraphics{density.eps}}

282:   \caption{ Normalized histograms of vertices with a specific average

283:     distance $\bar{d}$ to the rest of the vertices. (a) shows curves for the

284:     Oregon Route Views data (RIB '02), extended data (AS '02), and values for

285:     random networks with the same degree sequences as AS '02. (b)

286:     displays curves for the Oregon Route Views data (RIB '06), extended

287:     data (AS '06), as well as randomized

288:     networks with the degree sequence of AS '06. (c) shows the same AS

289:     '06 curve as (b) along with the BA and Inet model results for

290:     parameter values as close as possible to those of the AS '06

291:     network. 100 averages were used for the null-model curves in (a)

292:     and (b) as well as the model networks in (c). Lines are guides for

293:     the eyes. The error-bars represent standard error (the point

294:     symbols are often larger than the error bars).

295:   }

296:   \label{fig:density}

297: \end{figure}

298:

299: % INTRO

300: In this section we present the numerical results of our analysis. We

301: first discuss the average distance metric we use for displaying

302: network properties with a radial perspective. Then we define and

303: present the results for each network structural measure as a function

304: of the average distance to other vertices.

305:

306: % AVG DIST THEORY

307: Let $d(i,j)$ denote the graph distance between two vertices $i$ and

308: $j$---the number of edges in the shortest path between $i$ and $j$. A

309: simple measure for how peripheral a vertex is in the network is its

310: \textit{eccentricity}---the distance to the most distant vertex,

311: $\max_{j\in V} d(i,j)$~\cite{harary}. Eccentricity is thus an extremal

312: property of the network and is determined by a small fraction of

313: vertices. To reflect the typical path length of a vertex we rank

314: vertices according to an average property of the vertex.  The

315: average property corresponding to eccentricity is the average distance

316: from one vertex to all of the others:

317: \begin{equation}\label{eq:dist}

318:   \bar{d}(i)=\frac{1}{N-1}\sum_j d(i,j) ,

319: \end{equation}

320: where the sum is over all vertices, except $i$, in $V$. We note that

321: the reciprocal value of $\bar{d}(i)$, the \textit{closeness

322:   centrality}, is a common measure for centrality in social network

323: studies~\cite{sab:clo,harary}. Average distance is a more intuitive measure in this context---$\bar{d}(i)\approx 2$ means that $i$ is on average

324: two hops away from other vertices, whereas the closeness value $0.5$

325: does not have such a direct interpretation.

326:

327: Another way to study eccentricity is by iteratively removing vertices

328: of low-degree to construct a sequence of $k$-cores (subgraphs in which

329: all vertices have degree $\geq k$)~\cite{rex:infer,vesp:kcore}. We used the average distance metric instead because it measures

330: separation of vertices---i.e.\ the values on the x-axis

331: are not only integers as for the eccentricity. Further, because it is

332: a global measure (in the sense that the entire network topology

333: affects $\bar{d}(i)$ for every $i$) it is likely more robust to errors

334: in the input data.

335:

336:

337:

338: \subsection{Radial vertex density}

339: % OBSERVATIONS

340: We first plot the fraction of

341: vertices as a function of $\bar{d}$. Fig.~\ref{fig:density} shows the

342: distribution of $\bar{d}$ for our data sets and the model AS

343: graphs. The observed networks produce graphs that are far from smooth,

344: unimodal distributions. Instead they have one peak close to

345: $\bar{d}=3$, a smaller peak around $\bar{d}=4$, and for the 2006 data,

346: a third peak near $\bar{d}=5$. The difference between the RIB-only and

347: the extended datasets is small, except around the second peak

348: in Fig.~\ref{fig:density}(b) which is higher in the RIB-only data. The

349: null-model curves are much more unimodal, although

350: they do not follow a simple, smooth functional form. Such a unimodal

351: form could be a result of the averaging of many null-model

352: curves, but the observation holds even if single realizations of the

353: randomization are plotted (data not shown). Thus, the observed AS graph is less homogeneous than what we would predict by considering only vertex degree.

354:

355: We interpret the two peaks as an effect of the hierarchical

356: organization of the Internet. The core (Tier-1 providers and other

357: large ISPs) is in the low-$\bar{d}$ tail, the $\bar{d}=3$ peak are

358: vertices directly connected to the core, and the $\bar{d}=4$ peak are

359: vertices whose closest neighbors are in the $\bar{d}=3$ peak. This

360: explains the approximately integer distance between the peaks.

361: Determining the edge relationship between the peaks (customer-provider or peer-peer) is a difficult problem~\cite{rex:infer} however we believe that they are likely to be from customers to providers as ASs generally only have peer-peer edges with networks of equal class.

362: The Tier-1 ASs that do not have any providers and are thus most core (AS numbers 209, 701,

363: 1239, 1668, 2914, 3356, 3549, 3561, 6461 and 7018 in our data sets)

364: have an average $\bar{d}=2.35\pm 0.03$ in the AS '02 data and

365: $\bar{d}=2.41\pm 0.03$ in the AS '06 data, and are thus in the center

366: of the network (left of the most central peak).  Thus, the Tier-1 ASs are in

367: the extreme low end of the $\bar{d}$-spectrum.

368:

369: Results for the BA and Inet model networks are shown in

370: Fig.~\ref{fig:density}(c). The Inet model has a peak to the left of

371: the middle of the range of distances, but no second or third peak.

372: The BA model matches the observed network even less accurately---its peak is at a relatively high $\bar{d}$ value.

373:

374: \begin{figure}

375:   \resizebox*{0.9 \linewidth}{!}{\includegraphics{degree.eps}}

376:   \caption{ Degree $k$ as a function of the average distance

377:     $\bar{d}$. The panels and symbols represent the same

378:     data sets as in Fig.~\ref{fig:density}.}

379:   \label{fig:degree}

380: \end{figure}

381:

382: \subsection{Degree}

383:

384: % THEORY

385: Degree distribution is now a classical quantity in the study of the

386: Internet topology. Ref.~\cite{f3} reports a highly

387: skewed distribution of degree, fitting well to a power-law with an

388: exponent around $2.2$. Since this finding, the degree distribution has

389: become a core component in models of the AS graph---both the BA and

390: Inet models as well as others~\cite{fkp:model,ahs:model,meina:pow}

391: create networks with power-law degree distributions. One

392: interpretation of degree is that it is a local centrality

393: measure~\cite{harary}. Further,

394: different measures of centrality are known to be highly

395: correlated~\cite{centr:keiko,lee:corr,our:attack} so one can expect

396: the average degree $k$ to be a decreasing function of the average

397: distance $\bar{d}$.

398:

399: % OBSERVATIONS

400: Figure~\ref{fig:degree} confirms this prediction for both the observed

401: and model networks. In Fig.~\ref{fig:degree}(a) and (b) we observe

402: that the $k(\bar{d})$-curves decrease dramatically until the

403: approximate location of the first peak in the distribution plots

404: Fig.~\ref{fig:density}(a) and (b). Therefore, $\bar{d}$ identifies a

405: natural border between the core vertices of high-degree and low

406: average distance, and the sparsely connected periphery. The observed graphs,

407: however, have higher degree in the periphery compared to the

408: null-model curves. This suggests that the network periphery

409: may have more complex wiring topology than that is predicted by

410: degree distribution alone. This pattern occurs in our

411: other network measurements as well.

412:

413: The Inet model (Fig.~\ref{fig:degree}(c)) fails to capture the

414: higher degree (implying additional complexity) in the

415: periphery. Because the BA model has a minimal degree of three, it is

416: difficult to compare to the observed networks. However, the decrease of the

417: $k(\bar{d})$-curves at the largest $\bar{d}$-peak is not conspicuous

418: in the BA model curves.  Thus, there is no clear core-periphery

419: dichotomy in the BA model. This too is not surprising, because the BA

420: model was designed to produce ``scale-free'' networks in the sense of

421: fractals (if one zooms in on any part of system, it looks similar to the whole).

422:

423: \begin{figure}

424:   \resizebox*{0.9 \linewidth}{!}{\includegraphics{nbdeg.eps}}

425:   \caption{ Neighbor degree $K$ as a function of the average

426:     distance $\bar{d}$. The panels and symbols represent the

427:     same data sets as in Fig.~\ref{fig:density}.}

428:   \label{fig:nbdeg}

429: \end{figure}

430:

431: \subsection{Neighbor degree}

432:

433: % THEORY

434: Degree is a property of individual vertices, with no information about how they are interconnected. In this sense degree is a measure of local

435: network structure. A common way to broaden the perspective to understand

436: the network's non-local organization~\cite{caida:corr} is to measure the

437: correlations of degrees between neighbors in the network. There are

438: three common approaches. The first, known as \textit{assortative mixing

439: coefficient}~\cite{mejn:rev}, measures the Pearson correlation

440: coefficient for each edge. This provides one number for the entire

441: network and is thus appropriate for comparisons between networks. The

442: second approach makes a density plot that displays the fraction

443: of edges with degree $(k_1,k_2)$. This kind of two-dimensional plot is

444: called a \textit{correlation

445: profile}~\cite{maslov:inet,three:mah}. Correlation profiles provide

446: more detailed information than the assortative mixing coefficient, but they

447: are less concise and more sensitive to noisy data. The third approach measures

448: average neighbor degree

449: \begin{equation}

450:   K(i) = \frac{1}{k(i)}\sum_{j\in\Gamma_i} k(j)~,

451: \end{equation}

452: (where $\Gamma_i$ is the neighborhood of $i$) as a function of degree

453: $k(i)$~\cite{pas:inet}. All approaches

454: must be compared to null models because skewed degree

455: distributions are known to induce

456: anti-correlations~\cite{maslov:inet}. The third approach produces a one-dimensional plot

457: and thus forms a middle ground between the assortative mixing coefficient

458: and the correlation profile. It is also a method that can

459: be adapted to our radial-plot framework---by plotting $K$

460: against $\bar{d}$ we can monitor the correlation between centrality and

461: neighbor degree. For the AS-level Internet it has been observed

462: that the $K(k)$-curves decay~\cite{pas:inet}. In other words, high-degree

463: vertices are, on average, connected to vertices of low degree and vice

464: versa. Then, since degree decreases with $\bar{d}$,

465: one would then expect $K$ to be an increasing function

466: of $\bar{d}$.

467:

468: % OBSERVATIONS

469: As seen in Fig.~\ref{fig:nbdeg}, vertices at intermediate distances

470: have neighbors of highest degree. The peak in $K(\bar{d})$ coincides

471: with the largest peak in the histograms found in

472: Fig.~\ref{fig:density}, and the change of slope in

473: Fig.~\ref{fig:degree}.  This suggests that the periphery is composed

474: of two levels: the intermediate majority which is primarily connected

475: to the core, and the extreme periphery that is connected to other

476: periphery vertices.

477:

478: It is also apparent in Fig.~\ref{fig:nbdeg}(a) and (b) that the

479: null-model qualitatively has the same shape as the observed network; but,

480: just as for $k$; $K$ are larger in the observed networks than the

481: null-model.  Also, the Inet model underestimates the average neighbor

482: degree in the periphery. Finally, the BA model exhibits less

483: correlation between $K$ and $\bar{d}$.

484:

485: \begin{figure}

486:   \resizebox*{0.9 \linewidth}{!}{\includegraphics{delete.eps}}

487:   \caption{ Deletion impact $\phi$ as a function of the average

488:     distance $\bar{d}$. The panels and symbols represent the

489:     same data sets as in Fig.~\ref{fig:density}. }

490:   \label{fig:delete}

491: \end{figure}

492:

493: \subsection{Deletion impact}

494: \label{sec:delete}

495: % THEORY

496: If a vertex is not actively routing packets due to fault or attack,

497: other vertices might be affected. We are interested in knowing how susceptible

498: a given network structure is to random node failures. Assuming that

499: the network is connected, let $S_i$ be the number of vertices in the

500: largest connected subgraph after the deletion of $i$.  We define the

501: \textit{deletion impact} as

502: \begin{equation} \label{eq:del}

503:   \phi(i) = \frac{N-1-S_i}{N-2}.

504: \end{equation}

505: This measure can take values in the interval $[0,1]$. A value of $0$

506: means that the entire network, except $i$, is still connected after

507: the deletion. A value of $1$ means that all of the network's edges

508: were attached to $i$ and that all of the vertices are isolated after

509: the deletion.

510:

511: % OBSERVATIONS

512:

513: Fig.~\ref{fig:delete} plots deletion impact as a function of the

514: average distance for the same data sets as the previous figures. All

515: curves are roughly decreasing. This means that the

516: network is more sensitive to the deletion of central, than peripheral,

517: vertices. This observation is anticipated from earlier studies

518: showing that the Internet is vulnerable to targeted attacks at the

519: vertices of highest degree~\cite{alb:attack}  but robust to random

520: failures. This is because the majority of vertices have low

521: $\phi$-values.  However, the deletion impact measure can detect more

522: subtle effects in the periphery.

523:

524: The first peak in the $\bar{d}$-distribution is, as mentioned above,

525: around $\bar{d}=3$. At this distance $\phi$ has decreased a thousand

526: times from the core where $\phi\sim 10^{-2}$. In this quantity we see

527: a substantial difference from the null-model; the peripheral vertices

528: of the inferred networks have significantly lower deletion impact than

529: the peripheral vertices of the null-model networks. This, we believe, is another effect of the high degree of peripheral vertices. The

530: fact that the periphery is relatively highly connected suggests that

531: there are alternate routes that could be used if a regular path is

532: obstructed by a vertex failure. In the case

533: of the Inet model, which has very few vertices of high $\bar{d}$, the

534: peripheral $\phi$ values are quite low because the periphery is well

535: connected to the core. As expected, $\phi=0$ for all vertices in the

536: BA model since all vertices have degree of at least three. The BA model thus produces

537: network that are more robust to vertex deletion than the observed networks are.

538:

539: \begin{figure}

540:   \resizebox*{0.9 \linewidth}{!}{\includegraphics{clust.eps}}

541:   \caption{ Clustering coefficient $C$ as a function of the average

542:     distance $\bar{d}$. The panels and symbols represent the

543:     same data sets as shown in Fig.~\ref{fig:density}.}

544:   \label{fig:clust}

545: \end{figure}

546:

547: \subsection{Clustering coefficient}

548:

549: % THEORY

550: The \textit{clustering

551:   coefficient} $C(i)$~\cite{wattsstrogatz} is another frequently studied

552: network property:

553: \begin{equation}

554:   C(i) = M(\Gamma_i)\Big/\dbinom{k(i)}{2}

555: \end{equation}

556: $M(X)$ denotes the number of edges in a subgraph $X$. The

557: clustering coefficient measures how interconnected the neighborhood of

558: a vertex is. One interpretation is that $C(i)$ is the number of

559: connected neighbor pairs rescaled by the theoretical maximum.  $C(i)$ can

560: also be seen as the fraction of triangles that $i$ is a member of, normalized

561: to the interval $[0,1]$.

562:

563: % OBSERVATIONS

564: In Fig.~\ref{fig:clust} we display the clustering

565: coefficient as a function of the average distance. The curves for the

566: observed graph, null-model, and Inet model networks show a peak around the

567: same point as the peak in the $\bar{d}$-distribution. However, the

568: null-models do not exhibit as high a degree of clustering in the

569: periphery as the inferred networks. In other words, there are more triangles in

570: the periphery than can be expected from only the network's degree

571: distribution. In fact, for 100 null-model networks based on the AS

572: '06 network, no triangles existed for $\bar{d}>3.8$ with any vertex

573: having $\bar{d}>3.8$. This should be compared with 1124 triangles for

574: the AS '06 network itself (there are even 83 triangles where all

575: vertices have $\bar{d}>3.8$). This further suggests that

576: the periphery of the observed AS graphs is complex. As

577: triangles represent redundancy (the three vertices will still be

578: connected if any one of the edges are cut) this could help to explain

579: the increased robustness to deletion seen in Section~\ref{sec:delete}. As seen in

580: Fig.~\ref{fig:clust}(b), neither the Inet, nor the BA model predict a

581: significant number of peripheral triangles. The low deletion impact

582: values for peripheral vertices in these models may be

583: attributed to the presence of longer cycles.

584:

585: \begin{figure}

586:   \resizebox*{0.9 \linewidth}{!}{\includegraphics{balance.eps}}

587:   \caption{ Distance balance $b$ as a function of the average

588:     distance $\bar{d}$. The panels and symbols represent the

589:     same data sets as shown in Fig.~\ref{fig:density}.}

590:   \label{fig:balance}

591: \end{figure}

592:

593:

594: \subsection{Distance balance}

595:

596: % THEORY

597: In the context of scientific collaboration networks it has been

598: shown~\cite{mejn:scicolpre2} that the number of shortest paths

599: leaving a vertex via a specific neighbor is skew distributed. In

600: other words, most of the shortest paths from a vertex $i$ to the rest

601: of the network traverse a single neighbor of $i$. To rephrase this in

602: terms of the average distance, central

603: vertices are likely to have few neighbors with smaller

604: $\bar{d}$ values. This leads us to another view of centrality. Let the

605: \textit{distance balance} of $b(i)$ be the fraction of $i$-neighbors $j$ with  $\bar{d}(j)<\bar{d}(i)$. Clearly one can expect this to be an

606: increasing function of $\bar{d}$, but is it a linear increase?

607:

608: % OBSERVATIONS

609: In Fig.~\ref{fig:balance} we plot the distance balance as a function

610: of $\bar{d}$. As expected, all of the curves generally increase but

611: not linearly. Almost all the increase from 0 to 1 takes

612: place around the highest peak in Fig.~\ref{fig:density}, which gives

613: another characterization of the core and periphery: in the core, the

614: typical vertex has relatively few neighbors of higher centrality than

615: itself (and vice versa in the periphery). The $b(i)$ values in the

616: peripheral region of all curves approach values close to $1$. In

617: Fig.~\ref{fig:balance}(b) the curves of the observed data are somewhat

618: lower. This supports the previous observation that---as seen

619: previously in quantities such as degree, neighbor degree, and the

620: clustering coefficient---the periphery is structurally less different

621: from the core than what can be expected from random networks

622: constrained to the degree sequence of the observed networks. As seen

623: in Fig.~\ref{fig:density}(c), the Inet model behaves like the

624: null-model---the same observation holds for the average neighbor

625: degree (Fig.~\ref{fig:nbdeg}) and clustering coefficient

626: (Fig.~\ref{fig:clust}).  Unlike the Inet model, the BA model's curve

627: increases more smoothly which suggests (in accordance with what has

628: been observed above) a less pronounced core-periphery structure than

629: the observed networks.

630:

631:

632: \section{Summary and conclusions}

633:

634: This paper investigated how vertex-specific network

635: measures of the AS level Internet vary with the average distance from

636: a vertex to the other vertices of the graph. This projection of

637: vertices to the space of average distances gives a picture of how the network structure changes from the most central to the most peripheral vertices. Using the

638: distance separation measure we find that there is a well-defined

639: core-periphery dichotomy in the inferred networks. To some extent

640: this can be explained as an effect of the set of degrees of the

641: network---we notice that the average degree as a function of the

642: average distance has the same qualitative form for the observed

643: networks as our null-model networks. However, the

644: periphery is more complex than what is predicted by

645: degree alone. This is manifested in higher average degree, higher

646: average neighbor degree, lower deletion impact, higher clustering

647: coefficient, and lower distance balance than the observed

648: networks. To summarize, the AS graph has a more clear

649: split into a core and a periphery than can be anticipated by its

650: degree distribution and simple models of scale-free networks. At the

651: same time, the split is less dramatic and more nuanced than expected from a strict hierarchy. The additional network structure in the periphery may have consequences for spread of attacks and methods to defend against attack.

652: Further, the two topology generators (Inet and

653: BA model) that we tested could be extended to

654: model the periphery more accurately.

655:

656:

657: We used two kinds of observed AS data---easily accessible router RIBs

658: and more complete data sets where edges missing from the RIBs are

659: added. The effect of the missing edges is clearly visible: the

660: peripheries of the RIB-networks (with missing edges) have lower

661: average degree, lower number of triangles, and other traits. On the

662: other hand, the missing links do not change the network structure

663: qualitatively. Our conclusions would be unchanged if we used only the

664: RIB data.

665:

666: Future modeling and measuring research needs to be undertaken to

667: elucidate the detailed structure of the core and periphery of the AS

668: graph. Furthermore, the structures should be related to the strategies

669: of AS management~\cite{daub:as,peer:chang,inet}.

670:

671: \acknowledgements{

672:   PH acknowledges financial support from the Wenner-Gren

673:   foundations. The authors acknowledge the support of the

674:   National Science Foundation (grants CCR--0331580 and CCR--0311686),

675:   and the Santa Fe Institute.

676: }

677:

678: \begin{thebibliography}{10}

679:

680: \bibitem{ba:rev}

681: R.~Albert and A.-L. Barab\'{a}si.

682: \newblock Statistical mechanics of complex networks.

683: \newblock \textit{Rev. Mod. Phys}, 74:47--98, 2002.

684:

685: \bibitem{alb:attack}

686: R.~Albert, H.~Jeong, and A.-L. Barab\'{a}si.

687: \newblock Attack and error tolerance of complex networks.

688: \newblock \textit{Nature}, 406:378--382, 2000.

689:

690: \bibitem{vesp:kcore}

691: J.~I. Alvarez-Hamelin, L.~Dall'Asta, A.~Barrat, and A.~Vespignani.

692: \newblock k-core decomposition: a tool for the analysis of large scale

693:   {I}nternet graphs.

694: \newblock e-print cs/0511007.

695:

696: \bibitem{ahs:model}

697: J.~I. Alvarez-Hamelin and N.~Schabanel.

698: \newblock An internet graph model based on trade-off optimization.

699: \newblock \textit{Eur. Phys. J. B}, 38:231--237, 2004.

700:

701: \bibitem{ba:model}

702: A.-L. Barab\'{a}si and R.~Albert.

703: \newblock Emergence of scaling in random networks.

704: \newblock \textit{Science}, 286:509--512, October 1999.

705:

706: \bibitem{harary}

707: F.~Buckley and F.~Harary.

708: \newblock \textit{Distance in graphs}.

709: \newblock Addison-Wesley, Redwood City, 1989.

710:

711: \bibitem{mich:as}

712: H.~Chang, R.~Govindan, S.~Jamin, S.~J. Shenker, and W.~Willinger.

713: \newblock Towards capturing representative as-level internet topologies.

714: \newblock Technical Report UM-CSE-TR-454-02, Electrical Engineering and

715:   Computer Science Department, University of Michigan, 2002.

716:

717: \bibitem{peer:chang}

718: H.~Chang, S.~Jamin, and W.~Willinger.

719: \newblock To peer or not to peer: Modeling the evolution of the {I}nternet's

720:   {AS}-level topology.

721: \newblock to appear in Proceedings of SIGCOMM06.

722:

723: \bibitem{daub:as}

724: I.~Daubechies, K.~Drakakis, and T.~Khovanova.

725: \newblock A detailed study of the attachment strategies of new autonomous

726:   systems in the {AS} connectivity graph.

727: \newblock \textit{Internet Mathematics}, 2:185--246, 2006.

728:

729: \bibitem{doromen:book}

730: S.~N. Dorogovtsev and J.~F.~F. Mendes.

731: \newblock \textit{Evolution of Networks: From Biological Nets to the Internet and

732:   WWW}.

733: \newblock Oxford University Press, Oxford, 2003.

734:

735: \bibitem{hot:inet}

736: J.~C. Doyle, D.~L. Alderson, L.~Li, S.~Low, M.~Roughan, S.~Shalunov, R.~Tanaka,

737:   and W.~Willinger.

738: \newblock The ``robust yet fragile'' nature of the {I}nternet.

739: \newblock \textit{Proc. Natl. Acad. Sci. USA}, 102(41):14497--14502, 2005.

740:

741: \bibitem{fkp:model}

742: A.~Fabrikant, E.~Koutsoupias, and C.~H. Papadimitriou.

743: \newblock Heuristically optimized trade-offs: A new paradigm for power laws in

744:   the {I}nternet.

745: \newblock In \textit{Proceedings of the 29th International Conference on Automata,

746:   Languages, and Programming}, volume 2380 of \textit{Lecture notes in Computer

747:   science}, pages 110--122, Heidelberg, 2002. Springer.

748:

749: \bibitem{f3}

750: M.~Faloutsos, P.~Faloutsos, and C.~Faloutsos.

751: \newblock On power-law relationships of the {I}nternet topology.

752: \newblock \textit{Comput. Commun. Rev.}, 29:251--262, 1999.

753:

754: \bibitem{gale:rew}

755: D.~Gale.

756: \newblock A theorem of flows in networks.

757: \newblock \textit{Pacific J. Math.}, 7:1073--1082, 1957.

758:

759: \bibitem{our:attack}

760: P.~Holme, B.~J. Kim, C.~N. Yoon, and S.~K. Han.

761: \newblock Attack vulnerability of complex networks.

762: \newblock \textit{Phys. Rev. E}, 65:056109, 2002.

763:

764: \bibitem{lee:corr}

765: C.-Y. Lee.

766: \newblock Correlations among centrality measures in complex networks.

767: \newblock e-print physics/0605220.

768:

769: \bibitem{caida:corr}

770: P.~Mahadevan, D.~Krioukov, M.~Fomenkov, B.~Huffaker, X.~Dimitropoulos, K.~C.

771:   Claffy, and A.~Vahdat.

772: \newblock Lessons from three views of the {I}nternet topology.

773: \newblock tr-2005-02, Cooperative Association for Internet Data Analysis, 2005.

774:

775: \bibitem{three:mah}

776: P.~Mahadevan, D.~Krioukov, M.~Fomenkov, B.~Huffaker, X.~Dimitropoulos, k~c

777:   claffy, and A.~Vahdat.

778: \newblock The {I}nternet {AS}-level topology: Three data sources and one

779:   definitive metric.

780: \newblock \textit{ACM SIGCOMM Computer Communications Review}, 36:17--26, 2006.

781:

782: \bibitem{maslov:inet}

783: S.~Maslov, K.~Sneppen, and A.~Zaliznyak.

784: \newblock Detection of topological patterns in complex networks: Correlation

785:   profile of the {I}nternet.

786: \newblock \textit{Physica A}, 333:529--540, 2004.

787:

788: \bibitem{meina:pow}

789: A.~Medina, I.~Matta, and J.~Byers.

790: \newblock On the origin of power laws in {I}nternet topologies.

791: \newblock \textit{ACM Computer Communication Review}, 30(2):18--28, 2000.

792:

793: \bibitem{centr:keiko}

794: K.~Nakao.

795: \newblock Distribution of measures of centrality: Enumerated distributions of

796:   freeman's graph centrality measures.

797: \newblock \textit{Connections}, 13:10--22, 1990.

798:

799: \bibitem{mejn:scicolpre2}

800: M.~E.~J. Newman.

801: \newblock Scientific collaboration networks. {II}. {S}hortest paths, weighted

802:   networks, and centrality.

803: \newblock \textit{Phys. Rev. E}, 64:016132, 2001.

804:

805: \bibitem{mejn:rev}

806: M.~E.~J. Newman.

807: \newblock The structure and function of complex networks.

808: \newblock \textit{SIAM Review}, 45:167--256, 2003.

809:

810: \bibitem{vesp:inet}

811: R.~Pastor-Santorras and A.~Vespignani.

812: \newblock \textit{Evolution and structure of the Internet : a statistical physics

813:   approach}.

814: \newblock Cambridge Univeristy Press, Cambridge, 2004.

815:

816: \bibitem{pas:inet}

817: R.~Pastor-Satorras, A.~V\'{a}zquez, and A.~Vespignani.

818: \newblock Dynamical and correlation properties of the {I}nternet.

819: \newblock \textit{Phys. Rev. Lett.}, 87:258701, 2001.

820:

821: \bibitem{roberts:mcmc}

822: J.~M. {Roberts Jr.}

823: \newblock Simple methods for simulating sociomatrices with given marginal

824:   totals.

825: \newblock \textit{Social Networks}, 22:273--283, 2000.

826:

827: \bibitem{sab:clo}

828: G.~Sabidussi.

829: \newblock The centrality index of a graph.

830: \newblock \textit{Psychometrika}, 31:581--603, 1966.

831:

832: \bibitem{rex:infer}

833: L.~Subramanian, S.~Agarwal, J.~Rexford, and R.~H. Katz.

834: \newblock Characterizing the internet hierarchy from multiple vantage point.

835: \newblock In \textit{Proc. IEEE INFOCOM}, pages 618--627, New York, 2002. IEEE.

836:

837: \bibitem{ala:hier}

838: A.~Trusina, S.~Maslov, P.~Minnhagen, and K.~Sneppen.

839: \newblock Hierarchy measures in complex networks.

840: \newblock \textit{Phys. Rev. Lett.}, 92:178702, 2004.

841:

842: \bibitem{wattsstrogatz}

843: D.~J. Watts and S.~H. Strogatz.

844: \newblock Collective dynamics of {`small-world'} networks.

845: \newblock \textit{Nature}, 393:440--442, 1998.

846:

847: \bibitem{inet}

848: J.~Winick and S.~Jamin.

849: \newblock Inet-3.0: {I}nternet topology generator.

850: \newblock Technical Report UM-CSE-TR-456-02, Electrical Engineering and

851:   Computer Science Department, University of Michigan, 2000.

852:

853: \end{thebibliography}

854:

855:

856: \end{document}

857:

858:

859: