1: \documentclass[rmp,showpacs,twocolumn,superscriptaddress,floatfix]{revtex4}
2:
3: \usepackage{dcolumn,graphicx,amsmath,amssymb,txfonts}
4:
5:
6: \begin{document}
7:
8: \title{Radial Structure of the Internet}
9:
10: \author{Petter Holme}
11: \affiliation{Department of Computer Science, University of New Mexico,
12: Albuquerque, NM 87131, U.S.A.}
13:
14: \author{Josh Karlin}
15: \affiliation{Department of Computer Science, University of New Mexico,
16: Albuquerque, NM 87131, U.S.A.}
17:
18: \author{Stephanie Forrest}
19: \affiliation{Department of Computer Science, University of New Mexico,
20: Albuquerque, NM 87131, U.S.A.}
21: \affiliation{Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM
22: 87501, U.S.A.}
23:
24:
25: \begin{abstract}
26: The structure of the Internet at the Autonomous System (AS) level has
27: been studied by both the Physics and Computer Science communities. We
28: extend this work to include features of the core and the periphery,
29: taking a radial perspective on AS network structure. New methods for
30: plotting AS data are described, and they are used to analyze data sets
31: that have been extended to contain edges missing from earlier
32: collections. In particular, the average distance from one vertex to
33: the rest of the network is used as the baseline metric for
34: investigating radial structure. Common vertex-specific quantities are
35: plotted against this metric to reveal distinctive characteristics of
36: central and peripheral vertices. Two data sets are analyzed using
37: these measures as well as two common generative models
38: (Barab\'{a}si-Albert and Inet). We find a clear distinction between
39: the highly connected core and a sparse periphery. We also find that
40: the periphery has a more complex structure than that predicted by
41: degree distribution or the two generative models.
42: \end{abstract}
43:
44: \pacs{89.20.Hh,89.75.Fb, 89.75.Hc}
45: % 89.20.Hh -- World Wide Web, Internet
46: % 89.75.Fb -- Structures and organization in complex systems
47: % 89.75.Hc -- Networks and genealogical trees
48:
49: \maketitle
50:
51: \section{Introduction}
52:
53: Since the turn of the century there has been increasing interest in the
54: statistical study of networks~\cite{ba:rev,doromen:book,mejn:rev},
55: stimulated in large part by the availability of large-scale network
56: data sets. One network of great interest is the
57: Internet~\cite{vesp:inet}. The Internet is intriguing because its
58: complexity and size preclude comprehensive study. It is comprised of
59: millions of individual end-nodes connected to tens of thousands of ISPs
60: whose relationships are continually in flux and only partially
61: observable. One way to cope with these complexities is by analyzing a
62: single scale of Internet data, for example, a local office network of
63: computers and their inter-connections; a network of email address book
64: contacts; the network formed by URL links on the World Wide Web; or the
65: interdomain (Autonomous System) level of the Internet. This paper is
66: concerned with the last of these examples---the AS graph. The vertices
67: in the graph are themselves computer networks; roughly speaking an AS is
68: an independently operated network or set of networks owned by a single entity. Edges represent pairs of ASs that can directly
69: communicate.
70:
71:
72: A major finding of earlier AS studies is that node degree (number of
73: links to other ASs) has a power law distribution~\cite{f3}. The degree
74: distribution is, however, not the only structure that affects
75: Internet dynamics~\cite{hot:inet}. In this paper we investigate
76: higher-order (beyond the degree distribution) network structures that
77: also impact network dynamics. We analyze the AS graph using methods
78: that are appropriate for networks with a clear hierarchical
79: organization~\cite{vesp:inet,ala:hier}. In particular, we study
80: network quantities as a function of the average distance to other
81: vertices. This approach allows us to separate vertices of different
82: hierarchical levels, in a radial fashion, ranging from central (in the
83: sense of the closeness centrality~\cite{sab:clo}) to peripheral
84: vertices. This is, furthermore, a way to dissolve how clearly
85: separated the core and the periphery are. Most analysis methods
86: developed by physicists (degree frequencies, correlations, etc.)\ are
87: based on quantities averaged over the whole network and do not take a
88: hierarchical partitioning into account~\cite{vesp:inet}. Studies by
89: computer scientists, on the other hand, assume a division of the AS
90: level Internet into hierarchical levels~\cite{rex:infer}. We will
91: argue that the observed AS level networks do have pronounced
92: core-periphery dichotomy but that the periphery has more structure
93: than previously thought.
94:
95:
96: \section{Networks}
97:
98: This section briefly reviews the organization of the
99: AS-level Internet and describes how we obtained our data sets. We also
100: describe the network models to which we compare our observed
101: data. These models include one randomization scheme that samples random
102: networks with the same set of degrees as the original networks, and the generative BA and Inet models. Technically all three models are
103: null-models, but to contrast the randomized networks (having $N$
104: degrees of freedom) with the generative models (having only a few
105: degrees of freedom) we reserve the term null-model to the former.
106:
107: The data are represented as a network $G=(V,E)$ where $V$ is a set
108: of $N$ vertices (ASs) and $E$ is a set of $M$ undirected edges
109: (connections between ASs).
110: The Internet is currently composed of roughly $22,000$ individual
111: networks known as Autonomous Systems. Each of these systems peer with
112: a (usually small) set of ASs to form a connected network. The
113: protocol used to establish peering sessions and discover routes to
114: distant ASs is called the Border Gateway Protocol (BGP). Two typical
115: peering relationships are: customer-provider in which the provider
116: provides connectivity to the rest of the Internet for the customer;
117: and peer-peer in which the peering ASs transfer traffic between their
118: respective customers. The extreme core of the network, the Tier-1 ASs, have many peer-peer and
119: customer links but no providers. Nodes closer to the periphery of the
120: network have fewer customers and peers but more providers.
121:
122:
123: \subsection{AS networks}
124:
125: We analyze four real-world data sets (that is, data sets collected
126: using observed network data rather than simulated networks that are
127: generated synthetically), of which two are original. The first two are
128: well-known and well-studied~\cite{mich:as} dating from 2002 and the
129: second two data sets are recent, inferred from 2006 data. The first
130: graph in each pair consists of edges learned solely from dumps of
131: router state, known as Routing Information Bases (RIBs)
132: (\url{http://www.routeviews.org/data.html}). RIBs are a standard
133: source of AS connectivity data. The second graph in each pair
134: contains RIB information augmented with edges derived from other
135: sources (such as routing registries, looking glass servers, and update
136: messages) which produces a more accurate representation of the real
137: network. The additional sources are described below.
138:
139:
140: \subsubsection{Obtaining RIBs from Route Views}
141:
142: BGP routers store the most recent AS path for each IP block (prefix)
143: announced by its peers. These data are stored in the router's RIB,
144: and periodic RIB dumps from a large number of voluntary sources are
145: available from Route Views (\url{http://www.routeviews.org}). Each RIB
146: represents a static snapshot of all routes available to the router
147: from which it was obtained. Since BGP only disseminates each router's
148: best path, and this value is dynamic as links go up and down, a
149: sizable portion of the network is hidden from each router. In order
150: to obtain a more complete topology, common practice is to take the
151: union of the relationships found in a large number of RIB samples.
152: From the samples, AS relationships are then inferred from the routing
153: paths. A path is comprised of connected ASs and therefore each pair
154: of adjacent ASs in a path corresponds to an edge in the graph.
155:
156:
157: The 2002 graph taken from a single RIB (RIB '02) was inferred from Route
158: Views on May 15th of 2002. We constructed the 2006 RIB graph (RIB '06)
159: from the Route Views RIB on May 16th of 2006. The RIB '02 graph has
160: $N=13233$ and $M=27724$ while RIB '06 has $N=22403$ and $M=46343$.
161:
162: \subsubsection{Extending the RIB Dataset}
163: \label{sec:extended}
164:
165: There are other sources of AS connectivity data besides Route Views.
166: RIPE (\url{http://www.ripe.net}) has data collected from additional RIBs
167: beyond those contained in the Route Views data. Peering information
168: is directly available for a small number of ASs that are participating
169: Looking Glass (\url{http://www.traceroute.org}) routers. Finally, some
170: ASs register their peering relationships in regional registries such
171: as RIPE. The extended 2002 AS graph (AS '02) was constructed using
172: inferred topologies from all three of these sources, together with the
173: original Route Views data.
174:
175: RIB data represent a brief snapshot of routing state. There are many
176: paths that a router sees only briefly, and the chances of capturing
177: all of them from just a few RIB dumps is unlikely. In the extended
178: AS-graph of 2006 (AS '06), we augmented the Route Views RIB data with
179: all of the paths found in BGP update messages for the entire month of
180: April 2006 from both Route Views and RIPE. This gives a more
181: complete picture over time, although it is still biased by the limited
182: number of routers from which the data were collected.
183:
184: The extended 2002 AS-graph (AS '02) has $N=13579$, $M=37448$ and the
185: corresponding 2006 network (AS '06) has $N=22688$ $M=62637$. Thus the
186: extended data sets have $35\%$ (2002) and $67\%$ (2006) more edges than
187: their RIB counterparts.
188:
189: \subsection{Null-model networks}
190:
191: We are interested in network structure beyond degree distribution, so
192: we compare our AS network data against a null model
193: with the same degree distribution. Our null model is a random network
194: constrained to have the same set of degrees as the original network.
195: By comparing results for the observed networks with the same quantities
196: for the null model, we can observe additional network structure if it
197: exists. The standard way to sample such networks is by randomizing
198: the original network with stochastic rewiring of the edges (see
199: Ref.~\cite{gale:rew} for an early example). In our implementation we
200: create a new random network by enumerating the edges $E$ of the
201: original graph, and for each edge $(i,j)$ we are:
202: \begin{enumerate}
203: \item Choosing another edge $(i',j')$ randomly and replacing $(i,j)$
204: and $(i',j')$ with $(i,j')$ and $(i',j)$. If this creates a
205: multi- or self-edge, then we are reverting to the original edges $(i,j)$ and
206: $(i',j')$, and repeating with a new $(i',j')$.
207: \item \label{step:rew3} Choosing two edges
208: $(i_1,j_1)$ and $(i_2,j_2)$ and replacing them along with $(i,j')$ by
209: $(i_1,j')$, $(i,j_2)$ and $(i_2,j_1)$.
210: \end{enumerate}
211: Step \ref{step:rew3} guarantees ergodicity of the
212: sampling~\cite{roberts:mcmc}, i.e.\ that one can go between any pair of graphs
213: with a given set of degrees by successive edge-rewirings.
214:
215: \subsection{Generative network models}
216:
217: In addition to the observed (inferred from data) and null-model networks described above,
218: we also study networks produced according to two previously proposed
219: network-generation schemes~\cite{ba:model,inet}. The first is the well-known the
220: Barab\'{a}si-Albert preferential attachment model~\cite{ba:model}.
221: The second, known as the Inet model (version 3.0)~\cite{inet}, is
222: more complex and designed specifically for creating networks with AS
223: graph properties.
224:
225: \subsubsection{Barab\'{a}si-Albert model}
226:
227: The Barab\'{a}si-Albert (BA) model is a general growth model for
228: producing networks with power-law degree distributions
229: Ref.~\cite{ba:model}. Vertices and edges are iteratively added to the
230: network according to a preferential attachment rule, which ensures
231: that a power-law degree distribution emerges.
232:
233: More precisely, the initial configuration consists of $m$ isolated
234: vertices. From this configuration the network is iteratively grown.
235: At each time step one vertex is added together with $m$ edges leading
236: out from the new vertex. The edges are attached to vertices in the
237: graph such that:
238: \begin{enumerate}
239: \item The probability of attaching to a vertex $i$ is proportional to
240: $k(i)$.
241: \item No multiple edges, or self-edges, are formed.
242: \end{enumerate}
243: This procedure produces a network which has,
244: in the $N\rightarrow\infty$ limit, a degree
245: distribution $P(k)\sim k^{-3}$ for $k\geq m$, and $P(k)= 0$ for $k<m$.
246:
247: Because the BA model has only one integer parameter it is not very
248: flexible at fitting data. We use $m=3$ to make the average degree
249: as similar to the AS networks as possible. Other preferential
250: attachment models (e.g., Ref.~\cite{pas:inet}), can model the average
251: degree and slope of the degree distribution more closely. Such
252: improvements, we believe, are unlikely to change the conclusions drawn from the
253: original BA model.
254:
255:
256: \subsubsection{Inet model}
257:
258: The Inet model~\cite{inet} is less general than BA's. Its objective is
259: to regenerate the AS graph as accurately as possible rather than to focus
260: on a single mechanism to create and explain scale-free networks. The
261: scheme is rather detailed and we only sketch its strategy here. Starting with $N$ vertices, Inet first generates random
262: numbers that represent the final degree of the vertices such that
263: the degree distribution matches the observed distribution of the
264: AS-graph as closely as possible. This means that the low-degree end of
265: the distribution is more accurately modeled by Inet than the BA model
266: because the BA model will not produce a vertex with degree less than $m$.
267: In the real AS-graph there are a considerable fraction of
268: degree-one vertices. After the degrees are assigned to the vertices,
269: edges are added in such a way that the degree-degree correlation
270: properties of the original AS-graph is matched as closely as
271: possible.
272:
273: A more detailed explanation of this procedure and its rationale are
274: given in Ref.~\cite{inet}. We use Inet's default parameter settings,
275: except $N$ which we extracted from our datasets, producing an average
276: degree that is approximately six.
277:
278: \section{Numerical results}
279:
280: \begin{figure}
281: \resizebox*{0.9 \linewidth}{!}{\includegraphics{density.eps}}
282: \caption{ Normalized histograms of vertices with a specific average
283: distance $\bar{d}$ to the rest of the vertices. (a) shows curves for the
284: Oregon Route Views data (RIB '02), extended data (AS '02), and values for
285: random networks with the same degree sequences as AS '02. (b)
286: displays curves for the Oregon Route Views data (RIB '06), extended
287: data (AS '06), as well as randomized
288: networks with the degree sequence of AS '06. (c) shows the same AS
289: '06 curve as (b) along with the BA and Inet model results for
290: parameter values as close as possible to those of the AS '06
291: network. 100 averages were used for the null-model curves in (a)
292: and (b) as well as the model networks in (c). Lines are guides for
293: the eyes. The error-bars represent standard error (the point
294: symbols are often larger than the error bars).
295: }
296: \label{fig:density}
297: \end{figure}
298:
299: % INTRO
300: In this section we present the numerical results of our analysis. We
301: first discuss the average distance metric we use for displaying
302: network properties with a radial perspective. Then we define and
303: present the results for each network structural measure as a function
304: of the average distance to other vertices.
305:
306: % AVG DIST THEORY
307: Let $d(i,j)$ denote the graph distance between two vertices $i$ and
308: $j$---the number of edges in the shortest path between $i$ and $j$. A
309: simple measure for how peripheral a vertex is in the network is its
310: \textit{eccentricity}---the distance to the most distant vertex,
311: $\max_{j\in V} d(i,j)$~\cite{harary}. Eccentricity is thus an extremal
312: property of the network and is determined by a small fraction of
313: vertices. To reflect the typical path length of a vertex we rank
314: vertices according to an average property of the vertex. The
315: average property corresponding to eccentricity is the average distance
316: from one vertex to all of the others:
317: \begin{equation}\label{eq:dist}
318: \bar{d}(i)=\frac{1}{N-1}\sum_j d(i,j) ,
319: \end{equation}
320: where the sum is over all vertices, except $i$, in $V$. We note that
321: the reciprocal value of $\bar{d}(i)$, the \textit{closeness
322: centrality}, is a common measure for centrality in social network
323: studies~\cite{sab:clo,harary}. Average distance is a more intuitive measure in this context---$\bar{d}(i)\approx 2$ means that $i$ is on average
324: two hops away from other vertices, whereas the closeness value $0.5$
325: does not have such a direct interpretation.
326:
327: Another way to study eccentricity is by iteratively removing vertices
328: of low-degree to construct a sequence of $k$-cores (subgraphs in which
329: all vertices have degree $\geq k$)~\cite{rex:infer,vesp:kcore}. We used the average distance metric instead because it measures
330: separation of vertices---i.e.\ the values on the x-axis
331: are not only integers as for the eccentricity. Further, because it is
332: a global measure (in the sense that the entire network topology
333: affects $\bar{d}(i)$ for every $i$) it is likely more robust to errors
334: in the input data.
335:
336:
337:
338: \subsection{Radial vertex density}
339: % OBSERVATIONS
340: We first plot the fraction of
341: vertices as a function of $\bar{d}$. Fig.~\ref{fig:density} shows the
342: distribution of $\bar{d}$ for our data sets and the model AS
343: graphs. The observed networks produce graphs that are far from smooth,
344: unimodal distributions. Instead they have one peak close to
345: $\bar{d}=3$, a smaller peak around $\bar{d}=4$, and for the 2006 data,
346: a third peak near $\bar{d}=5$. The difference between the RIB-only and
347: the extended datasets is small, except around the second peak
348: in Fig.~\ref{fig:density}(b) which is higher in the RIB-only data. The
349: null-model curves are much more unimodal, although
350: they do not follow a simple, smooth functional form. Such a unimodal
351: form could be a result of the averaging of many null-model
352: curves, but the observation holds even if single realizations of the
353: randomization are plotted (data not shown). Thus, the observed AS graph is less homogeneous than what we would predict by considering only vertex degree.
354:
355: We interpret the two peaks as an effect of the hierarchical
356: organization of the Internet. The core (Tier-1 providers and other
357: large ISPs) is in the low-$\bar{d}$ tail, the $\bar{d}=3$ peak are
358: vertices directly connected to the core, and the $\bar{d}=4$ peak are
359: vertices whose closest neighbors are in the $\bar{d}=3$ peak. This
360: explains the approximately integer distance between the peaks.
361: Determining the edge relationship between the peaks (customer-provider or peer-peer) is a difficult problem~\cite{rex:infer} however we believe that they are likely to be from customers to providers as ASs generally only have peer-peer edges with networks of equal class.
362: The Tier-1 ASs that do not have any providers and are thus most core (AS numbers 209, 701,
363: 1239, 1668, 2914, 3356, 3549, 3561, 6461 and 7018 in our data sets)
364: have an average $\bar{d}=2.35\pm 0.03$ in the AS '02 data and
365: $\bar{d}=2.41\pm 0.03$ in the AS '06 data, and are thus in the center
366: of the network (left of the most central peak). Thus, the Tier-1 ASs are in
367: the extreme low end of the $\bar{d}$-spectrum.
368:
369: Results for the BA and Inet model networks are shown in
370: Fig.~\ref{fig:density}(c). The Inet model has a peak to the left of
371: the middle of the range of distances, but no second or third peak.
372: The BA model matches the observed network even less accurately---its peak is at a relatively high $\bar{d}$ value.
373:
374: \begin{figure}
375: \resizebox*{0.9 \linewidth}{!}{\includegraphics{degree.eps}}
376: \caption{ Degree $k$ as a function of the average distance
377: $\bar{d}$. The panels and symbols represent the same
378: data sets as in Fig.~\ref{fig:density}.}
379: \label{fig:degree}
380: \end{figure}
381:
382: \subsection{Degree}
383:
384: % THEORY
385: Degree distribution is now a classical quantity in the study of the
386: Internet topology. Ref.~\cite{f3} reports a highly
387: skewed distribution of degree, fitting well to a power-law with an
388: exponent around $2.2$. Since this finding, the degree distribution has
389: become a core component in models of the AS graph---both the BA and
390: Inet models as well as others~\cite{fkp:model,ahs:model,meina:pow}
391: create networks with power-law degree distributions. One
392: interpretation of degree is that it is a local centrality
393: measure~\cite{harary}. Further,
394: different measures of centrality are known to be highly
395: correlated~\cite{centr:keiko,lee:corr,our:attack} so one can expect
396: the average degree $k$ to be a decreasing function of the average
397: distance $\bar{d}$.
398:
399: % OBSERVATIONS
400: Figure~\ref{fig:degree} confirms this prediction for both the observed
401: and model networks. In Fig.~\ref{fig:degree}(a) and (b) we observe
402: that the $k(\bar{d})$-curves decrease dramatically until the
403: approximate location of the first peak in the distribution plots
404: Fig.~\ref{fig:density}(a) and (b). Therefore, $\bar{d}$ identifies a
405: natural border between the core vertices of high-degree and low
406: average distance, and the sparsely connected periphery. The observed graphs,
407: however, have higher degree in the periphery compared to the
408: null-model curves. This suggests that the network periphery
409: may have more complex wiring topology than that is predicted by
410: degree distribution alone. This pattern occurs in our
411: other network measurements as well.
412:
413: The Inet model (Fig.~\ref{fig:degree}(c)) fails to capture the
414: higher degree (implying additional complexity) in the
415: periphery. Because the BA model has a minimal degree of three, it is
416: difficult to compare to the observed networks. However, the decrease of the
417: $k(\bar{d})$-curves at the largest $\bar{d}$-peak is not conspicuous
418: in the BA model curves. Thus, there is no clear core-periphery
419: dichotomy in the BA model. This too is not surprising, because the BA
420: model was designed to produce ``scale-free'' networks in the sense of
421: fractals (if one zooms in on any part of system, it looks similar to the whole).
422:
423: \begin{figure}
424: \resizebox*{0.9 \linewidth}{!}{\includegraphics{nbdeg.eps}}
425: \caption{ Neighbor degree $K$ as a function of the average
426: distance $\bar{d}$. The panels and symbols represent the
427: same data sets as in Fig.~\ref{fig:density}.}
428: \label{fig:nbdeg}
429: \end{figure}
430:
431: \subsection{Neighbor degree}
432:
433: % THEORY
434: Degree is a property of individual vertices, with no information about how they are interconnected. In this sense degree is a measure of local
435: network structure. A common way to broaden the perspective to understand
436: the network's non-local organization~\cite{caida:corr} is to measure the
437: correlations of degrees between neighbors in the network. There are
438: three common approaches. The first, known as \textit{assortative mixing
439: coefficient}~\cite{mejn:rev}, measures the Pearson correlation
440: coefficient for each edge. This provides one number for the entire
441: network and is thus appropriate for comparisons between networks. The
442: second approach makes a density plot that displays the fraction
443: of edges with degree $(k_1,k_2)$. This kind of two-dimensional plot is
444: called a \textit{correlation
445: profile}~\cite{maslov:inet,three:mah}. Correlation profiles provide
446: more detailed information than the assortative mixing coefficient, but they
447: are less concise and more sensitive to noisy data. The third approach measures
448: average neighbor degree
449: \begin{equation}
450: K(i) = \frac{1}{k(i)}\sum_{j\in\Gamma_i} k(j)~,
451: \end{equation}
452: (where $\Gamma_i$ is the neighborhood of $i$) as a function of degree
453: $k(i)$~\cite{pas:inet}. All approaches
454: must be compared to null models because skewed degree
455: distributions are known to induce
456: anti-correlations~\cite{maslov:inet}. The third approach produces a one-dimensional plot
457: and thus forms a middle ground between the assortative mixing coefficient
458: and the correlation profile. It is also a method that can
459: be adapted to our radial-plot framework---by plotting $K$
460: against $\bar{d}$ we can monitor the correlation between centrality and
461: neighbor degree. For the AS-level Internet it has been observed
462: that the $K(k)$-curves decay~\cite{pas:inet}. In other words, high-degree
463: vertices are, on average, connected to vertices of low degree and vice
464: versa. Then, since degree decreases with $\bar{d}$,
465: one would then expect $K$ to be an increasing function
466: of $\bar{d}$.
467:
468: % OBSERVATIONS
469: As seen in Fig.~\ref{fig:nbdeg}, vertices at intermediate distances
470: have neighbors of highest degree. The peak in $K(\bar{d})$ coincides
471: with the largest peak in the histograms found in
472: Fig.~\ref{fig:density}, and the change of slope in
473: Fig.~\ref{fig:degree}. This suggests that the periphery is composed
474: of two levels: the intermediate majority which is primarily connected
475: to the core, and the extreme periphery that is connected to other
476: periphery vertices.
477:
478: It is also apparent in Fig.~\ref{fig:nbdeg}(a) and (b) that the
479: null-model qualitatively has the same shape as the observed network; but,
480: just as for $k$; $K$ are larger in the observed networks than the
481: null-model. Also, the Inet model underestimates the average neighbor
482: degree in the periphery. Finally, the BA model exhibits less
483: correlation between $K$ and $\bar{d}$.
484:
485: \begin{figure}
486: \resizebox*{0.9 \linewidth}{!}{\includegraphics{delete.eps}}
487: \caption{ Deletion impact $\phi$ as a function of the average
488: distance $\bar{d}$. The panels and symbols represent the
489: same data sets as in Fig.~\ref{fig:density}. }
490: \label{fig:delete}
491: \end{figure}
492:
493: \subsection{Deletion impact}
494: \label{sec:delete}
495: % THEORY
496: If a vertex is not actively routing packets due to fault or attack,
497: other vertices might be affected. We are interested in knowing how susceptible
498: a given network structure is to random node failures. Assuming that
499: the network is connected, let $S_i$ be the number of vertices in the
500: largest connected subgraph after the deletion of $i$. We define the
501: \textit{deletion impact} as
502: \begin{equation} \label{eq:del}
503: \phi(i) = \frac{N-1-S_i}{N-2}.
504: \end{equation}
505: This measure can take values in the interval $[0,1]$. A value of $0$
506: means that the entire network, except $i$, is still connected after
507: the deletion. A value of $1$ means that all of the network's edges
508: were attached to $i$ and that all of the vertices are isolated after
509: the deletion.
510:
511: % OBSERVATIONS
512:
513: Fig.~\ref{fig:delete} plots deletion impact as a function of the
514: average distance for the same data sets as the previous figures. All
515: curves are roughly decreasing. This means that the
516: network is more sensitive to the deletion of central, than peripheral,
517: vertices. This observation is anticipated from earlier studies
518: showing that the Internet is vulnerable to targeted attacks at the
519: vertices of highest degree~\cite{alb:attack} but robust to random
520: failures. This is because the majority of vertices have low
521: $\phi$-values. However, the deletion impact measure can detect more
522: subtle effects in the periphery.
523:
524: The first peak in the $\bar{d}$-distribution is, as mentioned above,
525: around $\bar{d}=3$. At this distance $\phi$ has decreased a thousand
526: times from the core where $\phi\sim 10^{-2}$. In this quantity we see
527: a substantial difference from the null-model; the peripheral vertices
528: of the inferred networks have significantly lower deletion impact than
529: the peripheral vertices of the null-model networks. This, we believe, is another effect of the high degree of peripheral vertices. The
530: fact that the periphery is relatively highly connected suggests that
531: there are alternate routes that could be used if a regular path is
532: obstructed by a vertex failure. In the case
533: of the Inet model, which has very few vertices of high $\bar{d}$, the
534: peripheral $\phi$ values are quite low because the periphery is well
535: connected to the core. As expected, $\phi=0$ for all vertices in the
536: BA model since all vertices have degree of at least three. The BA model thus produces
537: network that are more robust to vertex deletion than the observed networks are.
538:
539: \begin{figure}
540: \resizebox*{0.9 \linewidth}{!}{\includegraphics{clust.eps}}
541: \caption{ Clustering coefficient $C$ as a function of the average
542: distance $\bar{d}$. The panels and symbols represent the
543: same data sets as shown in Fig.~\ref{fig:density}.}
544: \label{fig:clust}
545: \end{figure}
546:
547: \subsection{Clustering coefficient}
548:
549: % THEORY
550: The \textit{clustering
551: coefficient} $C(i)$~\cite{wattsstrogatz} is another frequently studied
552: network property:
553: \begin{equation}
554: C(i) = M(\Gamma_i)\Big/\dbinom{k(i)}{2}
555: \end{equation}
556: $M(X)$ denotes the number of edges in a subgraph $X$. The
557: clustering coefficient measures how interconnected the neighborhood of
558: a vertex is. One interpretation is that $C(i)$ is the number of
559: connected neighbor pairs rescaled by the theoretical maximum. $C(i)$ can
560: also be seen as the fraction of triangles that $i$ is a member of, normalized
561: to the interval $[0,1]$.
562:
563: % OBSERVATIONS
564: In Fig.~\ref{fig:clust} we display the clustering
565: coefficient as a function of the average distance. The curves for the
566: observed graph, null-model, and Inet model networks show a peak around the
567: same point as the peak in the $\bar{d}$-distribution. However, the
568: null-models do not exhibit as high a degree of clustering in the
569: periphery as the inferred networks. In other words, there are more triangles in
570: the periphery than can be expected from only the network's degree
571: distribution. In fact, for 100 null-model networks based on the AS
572: '06 network, no triangles existed for $\bar{d}>3.8$ with any vertex
573: having $\bar{d}>3.8$. This should be compared with 1124 triangles for
574: the AS '06 network itself (there are even 83 triangles where all
575: vertices have $\bar{d}>3.8$). This further suggests that
576: the periphery of the observed AS graphs is complex. As
577: triangles represent redundancy (the three vertices will still be
578: connected if any one of the edges are cut) this could help to explain
579: the increased robustness to deletion seen in Section~\ref{sec:delete}. As seen in
580: Fig.~\ref{fig:clust}(b), neither the Inet, nor the BA model predict a
581: significant number of peripheral triangles. The low deletion impact
582: values for peripheral vertices in these models may be
583: attributed to the presence of longer cycles.
584:
585: \begin{figure}
586: \resizebox*{0.9 \linewidth}{!}{\includegraphics{balance.eps}}
587: \caption{ Distance balance $b$ as a function of the average
588: distance $\bar{d}$. The panels and symbols represent the
589: same data sets as shown in Fig.~\ref{fig:density}.}
590: \label{fig:balance}
591: \end{figure}
592:
593:
594: \subsection{Distance balance}
595:
596: % THEORY
597: In the context of scientific collaboration networks it has been
598: shown~\cite{mejn:scicolpre2} that the number of shortest paths
599: leaving a vertex via a specific neighbor is skew distributed. In
600: other words, most of the shortest paths from a vertex $i$ to the rest
601: of the network traverse a single neighbor of $i$. To rephrase this in
602: terms of the average distance, central
603: vertices are likely to have few neighbors with smaller
604: $\bar{d}$ values. This leads us to another view of centrality. Let the
605: \textit{distance balance} of $b(i)$ be the fraction of $i$-neighbors $j$ with $\bar{d}(j)<\bar{d}(i)$. Clearly one can expect this to be an
606: increasing function of $\bar{d}$, but is it a linear increase?
607:
608: % OBSERVATIONS
609: In Fig.~\ref{fig:balance} we plot the distance balance as a function
610: of $\bar{d}$. As expected, all of the curves generally increase but
611: not linearly. Almost all the increase from 0 to 1 takes
612: place around the highest peak in Fig.~\ref{fig:density}, which gives
613: another characterization of the core and periphery: in the core, the
614: typical vertex has relatively few neighbors of higher centrality than
615: itself (and vice versa in the periphery). The $b(i)$ values in the
616: peripheral region of all curves approach values close to $1$. In
617: Fig.~\ref{fig:balance}(b) the curves of the observed data are somewhat
618: lower. This supports the previous observation that---as seen
619: previously in quantities such as degree, neighbor degree, and the
620: clustering coefficient---the periphery is structurally less different
621: from the core than what can be expected from random networks
622: constrained to the degree sequence of the observed networks. As seen
623: in Fig.~\ref{fig:density}(c), the Inet model behaves like the
624: null-model---the same observation holds for the average neighbor
625: degree (Fig.~\ref{fig:nbdeg}) and clustering coefficient
626: (Fig.~\ref{fig:clust}). Unlike the Inet model, the BA model's curve
627: increases more smoothly which suggests (in accordance with what has
628: been observed above) a less pronounced core-periphery structure than
629: the observed networks.
630:
631:
632: \section{Summary and conclusions}
633:
634: This paper investigated how vertex-specific network
635: measures of the AS level Internet vary with the average distance from
636: a vertex to the other vertices of the graph. This projection of
637: vertices to the space of average distances gives a picture of how the network structure changes from the most central to the most peripheral vertices. Using the
638: distance separation measure we find that there is a well-defined
639: core-periphery dichotomy in the inferred networks. To some extent
640: this can be explained as an effect of the set of degrees of the
641: network---we notice that the average degree as a function of the
642: average distance has the same qualitative form for the observed
643: networks as our null-model networks. However, the
644: periphery is more complex than what is predicted by
645: degree alone. This is manifested in higher average degree, higher
646: average neighbor degree, lower deletion impact, higher clustering
647: coefficient, and lower distance balance than the observed
648: networks. To summarize, the AS graph has a more clear
649: split into a core and a periphery than can be anticipated by its
650: degree distribution and simple models of scale-free networks. At the
651: same time, the split is less dramatic and more nuanced than expected from a strict hierarchy. The additional network structure in the periphery may have consequences for spread of attacks and methods to defend against attack.
652: Further, the two topology generators (Inet and
653: BA model) that we tested could be extended to
654: model the periphery more accurately.
655:
656:
657: We used two kinds of observed AS data---easily accessible router RIBs
658: and more complete data sets where edges missing from the RIBs are
659: added. The effect of the missing edges is clearly visible: the
660: peripheries of the RIB-networks (with missing edges) have lower
661: average degree, lower number of triangles, and other traits. On the
662: other hand, the missing links do not change the network structure
663: qualitatively. Our conclusions would be unchanged if we used only the
664: RIB data.
665:
666: Future modeling and measuring research needs to be undertaken to
667: elucidate the detailed structure of the core and periphery of the AS
668: graph. Furthermore, the structures should be related to the strategies
669: of AS management~\cite{daub:as,peer:chang,inet}.
670:
671: \acknowledgements{
672: PH acknowledges financial support from the Wenner-Gren
673: foundations. The authors acknowledge the support of the
674: National Science Foundation (grants CCR--0331580 and CCR--0311686),
675: and the Santa Fe Institute.
676: }
677:
678: \begin{thebibliography}{10}
679:
680: \bibitem{ba:rev}
681: R.~Albert and A.-L. Barab\'{a}si.
682: \newblock Statistical mechanics of complex networks.
683: \newblock \textit{Rev. Mod. Phys}, 74:47--98, 2002.
684:
685: \bibitem{alb:attack}
686: R.~Albert, H.~Jeong, and A.-L. Barab\'{a}si.
687: \newblock Attack and error tolerance of complex networks.
688: \newblock \textit{Nature}, 406:378--382, 2000.
689:
690: \bibitem{vesp:kcore}
691: J.~I. Alvarez-Hamelin, L.~Dall'Asta, A.~Barrat, and A.~Vespignani.
692: \newblock k-core decomposition: a tool for the analysis of large scale
693: {I}nternet graphs.
694: \newblock e-print cs/0511007.
695:
696: \bibitem{ahs:model}
697: J.~I. Alvarez-Hamelin and N.~Schabanel.
698: \newblock An internet graph model based on trade-off optimization.
699: \newblock \textit{Eur. Phys. J. B}, 38:231--237, 2004.
700:
701: \bibitem{ba:model}
702: A.-L. Barab\'{a}si and R.~Albert.
703: \newblock Emergence of scaling in random networks.
704: \newblock \textit{Science}, 286:509--512, October 1999.
705:
706: \bibitem{harary}
707: F.~Buckley and F.~Harary.
708: \newblock \textit{Distance in graphs}.
709: \newblock Addison-Wesley, Redwood City, 1989.
710:
711: \bibitem{mich:as}
712: H.~Chang, R.~Govindan, S.~Jamin, S.~J. Shenker, and W.~Willinger.
713: \newblock Towards capturing representative as-level internet topologies.
714: \newblock Technical Report UM-CSE-TR-454-02, Electrical Engineering and
715: Computer Science Department, University of Michigan, 2002.
716:
717: \bibitem{peer:chang}
718: H.~Chang, S.~Jamin, and W.~Willinger.
719: \newblock To peer or not to peer: Modeling the evolution of the {I}nternet's
720: {AS}-level topology.
721: \newblock to appear in Proceedings of SIGCOMM06.
722:
723: \bibitem{daub:as}
724: I.~Daubechies, K.~Drakakis, and T.~Khovanova.
725: \newblock A detailed study of the attachment strategies of new autonomous
726: systems in the {AS} connectivity graph.
727: \newblock \textit{Internet Mathematics}, 2:185--246, 2006.
728:
729: \bibitem{doromen:book}
730: S.~N. Dorogovtsev and J.~F.~F. Mendes.
731: \newblock \textit{Evolution of Networks: From Biological Nets to the Internet and
732: WWW}.
733: \newblock Oxford University Press, Oxford, 2003.
734:
735: \bibitem{hot:inet}
736: J.~C. Doyle, D.~L. Alderson, L.~Li, S.~Low, M.~Roughan, S.~Shalunov, R.~Tanaka,
737: and W.~Willinger.
738: \newblock The ``robust yet fragile'' nature of the {I}nternet.
739: \newblock \textit{Proc. Natl. Acad. Sci. USA}, 102(41):14497--14502, 2005.
740:
741: \bibitem{fkp:model}
742: A.~Fabrikant, E.~Koutsoupias, and C.~H. Papadimitriou.
743: \newblock Heuristically optimized trade-offs: A new paradigm for power laws in
744: the {I}nternet.
745: \newblock In \textit{Proceedings of the 29th International Conference on Automata,
746: Languages, and Programming}, volume 2380 of \textit{Lecture notes in Computer
747: science}, pages 110--122, Heidelberg, 2002. Springer.
748:
749: \bibitem{f3}
750: M.~Faloutsos, P.~Faloutsos, and C.~Faloutsos.
751: \newblock On power-law relationships of the {I}nternet topology.
752: \newblock \textit{Comput. Commun. Rev.}, 29:251--262, 1999.
753:
754: \bibitem{gale:rew}
755: D.~Gale.
756: \newblock A theorem of flows in networks.
757: \newblock \textit{Pacific J. Math.}, 7:1073--1082, 1957.
758:
759: \bibitem{our:attack}
760: P.~Holme, B.~J. Kim, C.~N. Yoon, and S.~K. Han.
761: \newblock Attack vulnerability of complex networks.
762: \newblock \textit{Phys. Rev. E}, 65:056109, 2002.
763:
764: \bibitem{lee:corr}
765: C.-Y. Lee.
766: \newblock Correlations among centrality measures in complex networks.
767: \newblock e-print physics/0605220.
768:
769: \bibitem{caida:corr}
770: P.~Mahadevan, D.~Krioukov, M.~Fomenkov, B.~Huffaker, X.~Dimitropoulos, K.~C.
771: Claffy, and A.~Vahdat.
772: \newblock Lessons from three views of the {I}nternet topology.
773: \newblock tr-2005-02, Cooperative Association for Internet Data Analysis, 2005.
774:
775: \bibitem{three:mah}
776: P.~Mahadevan, D.~Krioukov, M.~Fomenkov, B.~Huffaker, X.~Dimitropoulos, k~c
777: claffy, and A.~Vahdat.
778: \newblock The {I}nternet {AS}-level topology: Three data sources and one
779: definitive metric.
780: \newblock \textit{ACM SIGCOMM Computer Communications Review}, 36:17--26, 2006.
781:
782: \bibitem{maslov:inet}
783: S.~Maslov, K.~Sneppen, and A.~Zaliznyak.
784: \newblock Detection of topological patterns in complex networks: Correlation
785: profile of the {I}nternet.
786: \newblock \textit{Physica A}, 333:529--540, 2004.
787:
788: \bibitem{meina:pow}
789: A.~Medina, I.~Matta, and J.~Byers.
790: \newblock On the origin of power laws in {I}nternet topologies.
791: \newblock \textit{ACM Computer Communication Review}, 30(2):18--28, 2000.
792:
793: \bibitem{centr:keiko}
794: K.~Nakao.
795: \newblock Distribution of measures of centrality: Enumerated distributions of
796: freeman's graph centrality measures.
797: \newblock \textit{Connections}, 13:10--22, 1990.
798:
799: \bibitem{mejn:scicolpre2}
800: M.~E.~J. Newman.
801: \newblock Scientific collaboration networks. {II}. {S}hortest paths, weighted
802: networks, and centrality.
803: \newblock \textit{Phys. Rev. E}, 64:016132, 2001.
804:
805: \bibitem{mejn:rev}
806: M.~E.~J. Newman.
807: \newblock The structure and function of complex networks.
808: \newblock \textit{SIAM Review}, 45:167--256, 2003.
809:
810: \bibitem{vesp:inet}
811: R.~Pastor-Santorras and A.~Vespignani.
812: \newblock \textit{Evolution and structure of the Internet : a statistical physics
813: approach}.
814: \newblock Cambridge Univeristy Press, Cambridge, 2004.
815:
816: \bibitem{pas:inet}
817: R.~Pastor-Satorras, A.~V\'{a}zquez, and A.~Vespignani.
818: \newblock Dynamical and correlation properties of the {I}nternet.
819: \newblock \textit{Phys. Rev. Lett.}, 87:258701, 2001.
820:
821: \bibitem{roberts:mcmc}
822: J.~M. {Roberts Jr.}
823: \newblock Simple methods for simulating sociomatrices with given marginal
824: totals.
825: \newblock \textit{Social Networks}, 22:273--283, 2000.
826:
827: \bibitem{sab:clo}
828: G.~Sabidussi.
829: \newblock The centrality index of a graph.
830: \newblock \textit{Psychometrika}, 31:581--603, 1966.
831:
832: \bibitem{rex:infer}
833: L.~Subramanian, S.~Agarwal, J.~Rexford, and R.~H. Katz.
834: \newblock Characterizing the internet hierarchy from multiple vantage point.
835: \newblock In \textit{Proc. IEEE INFOCOM}, pages 618--627, New York, 2002. IEEE.
836:
837: \bibitem{ala:hier}
838: A.~Trusina, S.~Maslov, P.~Minnhagen, and K.~Sneppen.
839: \newblock Hierarchy measures in complex networks.
840: \newblock \textit{Phys. Rev. Lett.}, 92:178702, 2004.
841:
842: \bibitem{wattsstrogatz}
843: D.~J. Watts and S.~H. Strogatz.
844: \newblock Collective dynamics of {`small-world'} networks.
845: \newblock \textit{Nature}, 393:440--442, 1998.
846:
847: \bibitem{inet}
848: J.~Winick and S.~Jamin.
849: \newblock Inet-3.0: {I}nternet topology generator.
850: \newblock Technical Report UM-CSE-TR-456-02, Electrical Engineering and
851: Computer Science Department, University of Michigan, 2000.
852:
853: \end{thebibliography}
854:
855:
856: \end{document}
857:
858:
859: