physics0605029/sym.tex
1: \documentclass[rmp,twocolumn,showpacs]{revtex4}
2: 
3: \usepackage{dcolumn,graphicx,amsmath,amssymb,txfonts}
4: 
5: \begin{document}
6: 
7: \title{Detecting degree symmetries in networks}
8: 
9: \author{Petter Holme}
10: \affiliation{Department of Computer Science, University of New Mexico,
11:   Albuquerque, NM 87131, U.S.A.}
12: 
13: \begin{abstract}
14:   The surrounding of a vertex in a network can be more or less
15:   symmetric. We derive measures of a specific kind of symmetry of a
16:   vertex which we call \textit{degree symmetry}---the property that
17:   many paths going out from a vertex have overlapping degree
18:   sequences. These measures are evaluated on artificial and real
19:   networks. Specifically we consider vertices in the human metabolic
20:   network. We also measure the average degree-symmetry coefficient for
21:   different classes of real-world network. We find that most studied
22:   examples are weakly positively degree-symmetric. The exceptions are
23:   an airport network (having a negative degree-symmetry coefficient)
24:   and one-mode projections of social affiliation networks that are
25:   rather strongly degree-symmetric.
26: \end{abstract}
27: 
28: \pacs{89.75.Fb, 89.75.Hc}
29: % 89.75.Fb -- Structures and organization in complex systems
30: % 89.75.Hc -- Networks and genealogical trees
31: 
32: \maketitle
33: 
34: \section{Introduction}
35: 
36: \begin{figure}
37:   \resizebox*{0.4\linewidth}{!}{\includegraphics{ill.eps}}
38:   \caption{ Illustrations of degree symmetry. Consider
39:     paths of length two (i.e.\ $l=2$). All paths out from the
40:     central (black) vertex have the degree sequence $(3,2)$ meaning
41:     the central vertex has high degree symmetry.}
42:   \label{fig:ill}
43: \end{figure}
44: 
45: With the advent of modern database technology numerous large scale
46: network data-sets have been made available. This development has
47: triggered a surge of activity in studies of statistical network
48: properties~\cite{ba:rev,mejn:rev,doromen:book}. The underlying idea of
49: these studies is that the network structure (the way the networks
50: differ from completely random networks) contain some information of the
51: function, both locally and globally, of the network. Hence a common
52: theme in these works has been the development of structural measures
53: to characterize network structure.
54: In this paper we propose and evaluate a measure of a previously
55: unstudied network structure---a special case of symmetry we call
56: \textit{degree symmetry}. In geometry an object
57: is symmetrical if it is invariant to rotations, reflections, and so
58: on. In networks, with no given geometrical embedding, these concepts
59: have to be relaxed. Furthermore, we would like to have a continuous
60: measure saying not only if a vertex is a local center of symmetry or
61: not, but also how symmetric the vertex is. The aspect of symmetry we
62: address is, roughly speaking, that if you look at the object
63: (network in our case) in different ways from a symmetric vertex it
64: still looks the same. We process of ``looking'' will in our case be
65: walking along paths (non-self intersecting sequences of
66: edges). Furthermore, since degree (number of neighbors) is commonly
67: regarded as the most fundamental quantity relating a vertex to its
68: function, we say two vertices ``look the same'' if they
69: have the same degree. We will thus derive our measure by performing
70: walks along all paths from a vertex and compare the sequence of
71: degrees of the vertices along these paths. The situation we have in
72: mind is depicted in Fig.~\ref{fig:ill}---all paths from the central
73: vertex have degree sequences starting with $(3,2,\cdots)$, thus the
74: central vertex is highly degree symmetric.
75: 
76: The rest of the paper is organized as follows: First we give a
77: detailed derivation of the degree-symmetry coefficient (in two
78: different versions, appropriate for different needs). Then we evaluate
79: these on example networks and a biochemical network. Finally we
80: discuss the average degree symmetry of different classes of real-world
81: networks.
82: 
83: 
84: \section{Derivation of the measure}
85: 
86: We will consider the network represented by a graph $G=(V,E)$ of $N$
87: vertices, $V$, and $M$ edges, $E$. For a vertex $i$ to have high degree
88: symmetry it has, as mentioned, to have many paths with the same
89: sequence of degrees. We will use a cut-off $l$ for the pathlength and
90: consider only paths of that length. The reason
91: for this cutoff is threefold: First, in all (with possibly some
92: curious exception) network processes, a vertex is more affected by its closest
93: surroundings then vertices further away. Thus one would like to have a
94: lower weight on the contribution from distant vertices. Second, the
95: number of vertices $n$ steps away grows fast
96: with the distance from $i$. For finite networks this means that the
97: paths soon reach the periphery of the network where unwanted
98: finite-size effects set in. Third, for computation speed, one benefit
99: from a cutoff.
100: 
101: \begin{figure}
102:   \resizebox*{0.8\linewidth}{!}{\includegraphics{def.eps}}
103:   \caption{ Illustrations of of concepts in the derivation of the
104:     degree symmetry coefficient. (a) illustrates the branching
105:     number. Consider paths of length three out from $i$. The branching
106:     number of the path $(i,j)$ is five (there are five paths from $i$
107:     of length three that goes through $j$). The branching number at
108:     $j'$ is two. (b) shows the set $\Delta(P,i)$, where $P$ is the
109:     path $(i,j,j')$.
110: }
111:   \label{fig:def}
112: \end{figure}
113: 
114: Assume there are $p$ paths of length $l$ from a vertex $i$. We then
115: denote the degree sequences of these paths
116: \begin{eqnarray}
117:   Q_l(i)&=&\Big\{[k(v^1_{1,i,l}),\cdots,k(v^l_{1,i,l})],\nonumber\\
118: &&\vdots\\
119:  && ,[k(v^1_{p,i,l}),\cdots,k(v^l_{p,i,l})]\Big\},\nonumber
120: \end{eqnarray}
121: where $k(v)$ denotes the degree
122: of a vertex $v$ and $v^j_{m,i,l}$ is the $j$'th vertex of along the
123: $m$'th path of length $l$ leading out from $i$. Then if there are
124: unexpectedly many vertices at the same ($j$-) index in the sequence
125: with the same degree, the vertex $i$ is a local center of degree
126: symmetry. A rough symmetry measure would thus be to count the fraction
127: of index-pairs with the same degree, i.e.\
128: \begin{equation}\label{eq:para}
129:   \frac{\tilde{s}_l(i)}{\Lambda}=\sum_{0\leq n<n'\leq p}\sum_{j=1}^l
130:     \delta\big(k(v^j_{n,i,l}), k(v^j_{n',i,l})\big),
131: \end{equation}
132: where
133: \begin{equation}\label{eq:lambda}
134: \Lambda=(l-1)\:\dbinom{p}{2}\mbox{~~and~~} \delta(x,y)=\left\{
135: \begin{array}{cl} 1 & \mbox{if $x=y$}\\ 0 &\mbox{if $x\neq
136:     y$}\end{array}
137: \right. .
138: \end{equation}
139: This measure is very crude and lack many desired statistical
140: features. For example, all paths that go
141: via a particular neighbor of $i$ will give a contribution to the
142: sum. In practice this means that vertices with a high degree
143: vertex rather far from itself (but closer that $l$) will trivially
144: have a high $\tilde{s}_l(i)/\Lambda$. A first step would thus be to omit the
145: contribution of vertices occurring in many sequences of $Q_l(i)$ at a
146: specific index. I.e., for all $l'\in (0,l)$ one wants to exclude the
147: terms
148: \begin{equation}\label{eq:para2}
149:   \sum_{n,n'} \delta\big(k(v^1_{n,i,l}),
150:   k(v^1_{n',i,l})\big),
151: \end{equation}
152: where $n$ and $n'$ are indices of paths that are identical the
153: first $l'$ steps, from Eq.~(\ref{eq:para}). Let $S_l(i)$ denote the
154: number of such terms.
155: 
156: 
157: To calculate $S_l(i)$ consider a path $P=(i,\cdots,j)$ of length $l'<l$. Let
158: $b_l(P,i)$ be the number of paths from $i$ of length $l$ that start
159: with the path $P$. We call $b_l(P,i)$ the \textit{branching number} of
160: $P$, see Fig.~\ref{fig:def}(a). All pairs of paths starting with $P$
161: will contribute to $\tilde{s}_l(i)$ a distance $l'$ from $i$ (since
162: they all pass
163: through $j$). Let $\Delta(P,i)$ be the set of neighbors to
164: $j$ that is not on the path $P$ from $i$ to $j$, see Fig.~\ref{fig:def}(b). (The number of
165: elements in $\Delta(P,i)$ is thus $k_j-1$.) 
166: This situation gives a contribution
167: \begin{equation}\label{eq:cont}
168:   S_l(P,i) = \dbinom{b_l(P,i)}{2}+ \sum_{j\in\Delta(P,i)}S_l((P,j),i)
169: \end{equation}
170: from vertices of indices in the interval $[l',l]$ of $Q_l(i)$ to
171: $\tilde{s}_l(i)$, where $(P,j')$ denotes the path $(i,\cdots,j,j')$.
172: 
173: 
174: To further improve the
175: measure one would like to, assuming some null-model, subtract the
176: expected random contribution to $\tilde{s}_l(i)/\Lambda$. If this can be
177: achieved one would have a symmetry coefficient $s_l(i)$ that is zero when
178: the symmetry is what can be expected from the null-model, larger if $i$
179: is a center of unexpectedly high symmetry, and less than zero if $i$ is
180: degree anti-symmetric. A final symmetry coefficient could thus
181: be written
182: \begin{equation}\label{eq:proto}
183: s_l(i)=\frac{\tilde{s}_l(i)- S_l(i)}{\Lambda- S_l(i)}-\nu , \mbox{~~
184:   provided $\Lambda > S_l(i)$}
185: \end{equation}
186: where $\nu$ is the expected value of $(\tilde{s}_l(i)- S_l(i)) / (\Lambda-
187: S_l(i))$ in a null-model. $\Lambda= S_l(i)$ can only happen if there is
188: one or no path of length $l$. In both these cases the degree-symmetry
189: concept makes no sense so, if $\Lambda =
190: S_l(i)\in\{0,1\}$, we set $s_l(i)=0$.
191: The null-model we assume is random
192: constrained on the degree distribution of the network. I.e., given the
193: fraction $p_k$ of $k$-degree vertices the network is as random as
194: possible. As it turns out $\nu$ is tricky to calculate
195: analytically. There are two ways to proceed---either one calculates
196: an approximative $\nu$ or one obtains $\nu$ via averaging
197: $(\tilde{s}_l(i)- S_l(i)) / (\Lambda- S_l(i))$ over realizations of the
198: null-model. Except being more accurate, the latter approach has the
199: advantage of giving an error estimate of $s_l(i)$---one can by
200: specifying a
201: p-value define significantly symmetric, or anti-symmetric,
202: vertices. We will use both approaches: The approximative method for
203: analyzing example networks and the numerical method for
204: analyzing real-world data.
205: 
206: 
207: We obtain an approximative value of $\nu$,  $\nu^\mathrm{app.}$, by
208: assuming $\nu$ is
209: approximately equal to the probability that a pair of vertices,
210: reached by walking along paths, is the same.
211: Note that, since there are $k$ ways into a
212: degree-$k$ vertex, when following a path the probability to reach a
213: degree-$k$ vertex is
214: \begin{equation}\label{eq:kpk}
215:   \frac{kp_k}{\sum_{k'} k'p_{k'}} = \frac{kp_k}{\langle k\rangle}.
216: \end{equation}
217: Thus the probability $\nu^\mathrm{app.}$ that two vertices of
218: the same degree is reached by following different paths is 
219: \begin{equation}\label{eq:nu}
220:   \nu^\mathrm{app.}=\sum_kp_k\left(\frac{kp_k}{\langle k\rangle}\right)^2 =
221:   \frac{1}{\langle k\rangle^2} \sum_kk^2p_k^3.
222: \end{equation}
223: One reason this approach is not exact is that the number of terms in
224: the expression for $\tilde{s}_l(i)$ increases with the degree of the
225: $j$ in $\Delta(P,i)$ of Eq.~(\ref{eq:cont}). There are other
226: higher-order effects to related to other correlations between the path
227: structure and the degree of the vertices.
228: 
229: To summarize we have two measures of local vertex symmetry, one
230: approximative:
231: \begin{equation}\label{eq:app}
232:   s^\mathrm{app.}_l(i)=\frac{\tilde{s}_l(i)- S_l(i)}{\Lambda-
233:     S_l(i)}-\frac{1}{\langle k\rangle^2} \sum_kk^2p_k^3 ,
234: \end{equation}
235: and one based one Monte Carlo sampling
236: \begin{equation}\label{eq:mc}
237:   s^\mathrm{MC}_l(i)=\frac{\tilde{s}_l(i)- S_l(i)}{\Lambda-
238:     S_l(i)}-\left\langle \frac{\tilde{s}_l(i)- S_l(i)}{\Lambda-
239:     S_l(i)}\right\rangle .
240: \end{equation}
241: The sampling is conveniently done by random rewiring the edges of the
242: original network~\cite{roberts:mcmc}.
243: 
244: \section{Algorithm}\label{sect:algo}
245: 
246: The heart of algorithm, as suggested in the previous section, is a
247: depth-first search with depth $l$. When the returning along the traced
248: out paths the branching number can be calculated recursively through
249: \begin{equation}
250:   b_l(P,i) = \left\{\begin{array}{ll} 1 & \mbox{if $P$ has length $l$}\\
251:    \sum_{j'\in\Delta((P,j'),i)}b_l((P,j'),i) &  
252:   \mbox{otherwise}\end{array}\right. . \label{eq:bn}
253: \end{equation}
254: $S_l(P_i)$ can be calculated simultaneously using
255: Eq.~(\ref{eq:cont}). A slight complication is that the same vertex may
256: appear in different branches of the depth first search while calculating
257: $b$ and $\tilde{s}$. For small cut-off values this is easy to handle:
258: For $l=2$ it does not affect the calculation at all. For $l=3$ one
259: would only have
260: to keep different depths (of Eqs.~(\ref{eq:cont}) and (\ref{eq:bn}))
261: separate. For the calculation of $\tilde{s}_l(i)$ the terms
262: of $Q_l(i)$ has to be stored. Since the number of paths $p$ grows
263: fast with $l$, this can be quite a constraint for a large
264: $l$. Luckily it suffices to store a histogram $h(l',k)$ counting the
265: number of vertices of degree $k$ at position $l'$ of the paths
266: $Q_l(i)$. $p$ (and thus $\Lambda$) can be calculated as the number of
267: time the depth $l$ of the depth first search is reached. The running
268: time of the algorithm is $O(p)$. A mean field
269: approximation for networks with few triangles gives
270: $O(p) \approx O(\langle k\rangle^l)$.
271: 
272: \section{Extensions and considerations}
273: 
274: The method outlined above can quite straightforwardly be extended to
275: network with directed edges, distinct types of edges or (integer) edge
276: weights.
277: 
278: Imagine a network with $z$ different edge sets $E_1,\cdots,E_z$. Such
279: networks frequently occur in cellular biochemistry---e.g.\ protein
280: interaction networks where different types of protein interaction can
281: be recorded~\cite{hh:pfp}, or gene regulation networks where
282: the edges can be activating or inhibitory. One sensible way to extend
283: the above procedure is to use the union of the edges as your
284: graph but to say two pairs of vertices in $Q_l(i)$ are identical if
285: their degrees with respect to all of the networks are the
286: same. To formalize this $Q_l(i)$ would be generalized to
287: \begin{eqnarray}
288:   Q_l(i)&=&\Big\{[\mathbf{k}(v^1_{1,i,l}),\cdots,
289:   \mathbf{k}(v^l_{1,i,l})],\nonumber\\
290: &&\vdots\\
291:  && ,[\mathbf{k}(v^1_{p,i,l}),\cdots,
292:  \mathbf{k}(v^l_{p,i,l})]\Big\},\nonumber
293: \end{eqnarray}
294: where $\mathbf{k}(v)$ is a vector with $v$'s degrees with respect to
295: the different edge-types.
296: and the $\delta$-function of Eq.~(\ref{eq:para2}) would be one if the
297: arguments are equal at all their indices, and zero otherwise. The
298: $\nu^\mathrm{app.}$ has to be redefined too:
299: \begin{equation}
300:   \nu^\mathrm{app.} =
301:   \frac{1}{\langle k\rangle^2} \sum_{k',k''}k'p_{k'}\,k''p_{k''}
302:   \prod_{i=1}^z\sum_{j=1}^zp_i(k_j|k')p_i(k_j|k''),
303: \end{equation}
304: where $p_i(k|k')$ is the conditional probability that a vertex
305: has degree $k$ with respect to edge set $E_i$ given that its degree
306: in the union network is $k'$. The case of a directed network can be
307: treated similarly---one consider paths following edges in both
308: directions but a vertex pair gives a contribution to $\tilde{s}$ only
309: if both the in- and out-degrees are the same.
310: 
311: 
312: The approach of Sect.~\ref{sect:algo} can straightforwardly be applied
313: to networks where
314: multiple edges are allowed. Since multiple edges can be used to model
315: weighted graphs~\cite{mejn:wei} the generalization to weighted graphs
316: (at least where edge-weights represent the probability of following an
317: edge) is simple. The other aspect of multigraphs, self-edges, is
318: trivially dealt with---by the requirement that a paths should not
319: intersect themselves a self-edge will never be followed and can thus
320: be omitted already when the graph is constructed.
321: 
322: 
323: The overlap required for a vertex pair to be considered equal in the
324: calculation of the symmetry coefficient is rather strict. Sometimes one
325: would like to treat two paths as similar even if their degrees differs
326: slightly. Particularly, this applies to broad degree
327: distributions. The functional difference between degree-2 and degree-3
328: vertices may be significant; but whether a vertex has degree 1002 or
329: 1003 probably does not matter. To achieve such a relaxation one can
330: construct a integer sequence $K_1<K_2<\cdots$ and let
331: \begin{equation}\label{eq:ks}
332:   \delta(k,k') =\left\{\begin{array}{cl} 1 & \mbox{if $K_i\leq
333:   k,k'<K_{i+1}$ for some $i$}\\ 0 &
334:   \mbox{otherwise}\end{array}\right. .
335: \end{equation}
336: I.e., one construct a series of equivalence classes of vertices.
337: For a power-law, or similarly broad, degree distributions one can let
338: $K_{i+1}-K_i$ increase exponentially with $i$. In this case one also
339: has to modify the definition of $\nu^\mathrm{app.}$
340: \begin{equation}
341:   \nu^\mathrm{app.}=\frac{1}{\langle k\rangle^2}\sum_i \left(\sum_{K_i\leq
342:   k<K_{i+1}} p_k\right) \left(\sum_{K_i\leq k<K_{i+1}} k p_k\right)^2 .
343: \end{equation}
344: 
345: \begin{figure}
346:   \resizebox*{\linewidth}{!}{\includegraphics{ex.eps}}
347:   \caption{ Degree symmetries of small example networks. (a) is
348:     consistent with the example Fig.~\ref{fig:ill}(a). (b) is an
349:     example of a graph with only positive degree symmetries. (c) shows
350:     a graph with only negative degree symmetries. The cut-off length
351:     $l=2$ is used.}
352:   \label{fig:ex}
353: \end{figure}
354: 
355: 
356: \section{Degree symmetries of example networks}
357: 
358: In this section we evaluate the measure for example networks and
359: real-world networks. We will use the smallest non-trivial cut-off
360: $l=2$ throughout this section. Most conclusions hold for $l=3$ or
361: $4$.
362: 
363: \subsection{Small test graphs}
364: 
365: To get a feeling for the $s_l$ measure we start by considering a few
366: small test networks, see Fig.~\ref{fig:ex}. In Fig.~\ref{fig:ex}(a) we
367: display a network with the same degree symmetry, with respect to the
368: central vertex (triangle), as Fig.~\ref{fig:ill}. As expected the
369: central vertex has a strong degree symmetry coefficient. To carry
370: through the calculation of Eq.~(\ref{eq:app}) once we obtain the
371: degree distribution $p_2=8/13$, $p_3=4/13$ and
372: $p_4=1/13$ giving $\nu^\mathrm{app.}=165/832\approx0.198$. All length-2 paths
373: out from the central vertex have the degree sequence $(3,2)$ so
374: $\tilde{s}_2(\triangle) = 4$, $S_2(\triangle) = 4$ and
375: $\Lambda= 28$ giving $s_2^\mathrm{app.}(\triangle)=667/832\approx0.802$. The
376: degree-3 vertices (squares) have two degree sequences of their outgoing
377: paths $(4,3)$ and $(2,2)$, whereas paths from degree-2 vertices
378: (triangles) have degree sequences $(3,4)$ and $(2,3)$. This difference
379: is larger than expected from the null model (random networks with
380: eight degree-2 vertices, four degree-3 vertices and one degree-4
381: vertex), thus the negative $s_2$ values for these vertices.
382: 
383: In Fig.~\ref{fig:ex}(b) we show a graph where all vertices have
384: positive degree-symmetry coefficient. Paths from degree-2 vertices
385: have only the degree sequence $(3,2)$ and paths from degree-3 vertices
386: have only the degree sequence $(2,3)$. Thus, for every vertex, the
387: view of degrees along the path out to the rest of the network is the
388: same no matter which direction one looks in from that vertex. A
389: radically different view is seen in Fig.~\ref{fig:ex}(c). In this case
390: the vertices have three distinct positions in the network. The vertices marked
391: with squares have degree two and four outgoing paths of degree
392: sequences $(2,4)$, $(4,4)$, $(4,2)$ and $(4,2)$. The circles, despite
393: their different network position (as being part of triangles), have
394: the same set of degree sequences for their paths of length two. The
395: degree-3 vertices have six length-2 paths: three having the degree
396: sequence $(2,2)$, three having degree sequence $(4,2)$. It is easy to
397: convince oneself that this close to as dissimilar a network with four
398: degree-2 and two degree-4 vertices can be. Consequently all vertices
399: have negative degree-symmetry indices. It is worth pointing out that
400: the Fig.~\ref{fig:ex}(c) possesses other symmetries than
401: degree-symmetry. The layout has, for example, reflexive symmetry along
402: a vertical axis. We emphasize that such symmetries would need to be
403: captured by other measures.
404: 
405: \subsection{Regular networks}
406: 
407: If all vertices have the same degree a network is called
408: \textit{regular}~\cite{janson}. Then by definition all paths are
409: known to fully overlap. This trivial overlap should be canceled in
410: our symmetry measure so $s_l(i)=0$ for all $l$ and $i$. Since
411: $S_l(i)$ is the number of terms in $\tilde{s}_l(i)$ and all
412: these terms are one we have $S_l(i)=\tilde{s}_l(i)=\Lambda$.
413: Furthermore, $\nu^\mathrm{app.}=1$ which gives $s_l(i)$ for all
414: vertices and cut-off lengths.
415: 
416: \subsection{Random graphs}
417: 
418: \begin{figure}
419:   \resizebox*{0.7\linewidth}{!}{\includegraphics{er.eps}}
420:   \caption{ The average approximative symmetry coefficient for $l=3$
421:     and random graphs with $M=2N$. The line is a fit to a power-law
422:     decay form ($0.124+0.435N^{-1.02}$, to be exact).}
423:   \label{fig:er}
424: \end{figure}
425: 
426: Next we evaluate the average approximative symmetry coefficient
427: $\langle s^\mathrm{app.}\rangle$ for random
428: graphs~\cite{janson}---graphs obtained by successively adding $M$ edges
429: between $N$ vertices with the restriction that no multiple edge, or
430: self-edge, may occur. Such networks have no correlations at all and
431: can serve as a reference point for neutrality~\cite{mejn:rev}. Ideally
432: we would like such networks to, on average, have a degree-symmetry
433: coefficient of zero. As seen in Fig.~\ref{fig:er} $\langle
434: s^\mathrm{app.}_l\rangle$ converge to a small but positive value.
435: The decay is roughly inversely proportional to $N$---the same scaling as
436: the fraction of triangles in the network---which suggests that the
437: presence of triangles, and perhaps other short-cycles, is an important
438: source of finite size effects of $s^\mathrm{app.}_l$.
439: We conclude that the Monte Carlo sampling measure $s^\mathrm{MC}_l$
440: (or a more elaborate measure) is 
441: needed if one wants to compare different networks. If, on the other
442: hand, one aims to compare different vertices of the same network the
443: faster $s^\mathrm{app.}_l(i)$ calculation is sufficient. This is not
444: an uncommon situation in the design of network measures. Another
445: example of this where neutrality is non-zero in the large-$N$ limit is
446: \textit{modularity}, measuring how good a subgraphs that
447: are densely connected within but not between each
448: other~\cite{gui:mod}.
449: 
450: 
451: \section{Degree symmetries of real networks}
452: 
453: In this section we apply our measures to real-world networks. First we
454: take a look at the symmetry coefficients of specific vertices in the
455: metabolic network of humans, then we look at the average symmetry
456: coefficients of various classes of networks.
457: 
458: \subsection{Human metabolic networks}\label{sect:meta}
459: 
460: \begin{figure}
461:   \resizebox*{0.9 \linewidth}{!}{\includegraphics{hsa.eps}}
462:   \caption{ The 2-neighborhood of spermine---a vertex with high
463:     degree-symmetry---(a), and C04850---a vertex with low degree
464:     symmetry---(b), in the human metabolic network. The symbols
465:     indicate the equivalence classes
466:     defined by exponentially growing intervals. Filled circles have
467:     degree two, unfilled circles have degree four or five, a vertex
468:     symbolized by  an
469:     $n$-gon have degree in the interval $[2^n,2^{n+1})$. 
470:     In case the chemical names are overly long the
471:     KEGG codes are given (``C'' and five digits):  C07282 represents
472:     eIF5A-precursor-deoxyhypusine, C04850 represents
473:     1,3-$\beta$-D-galactosyl-($\alpha$-1,4-L-fucosyl)-N-acetyl-D-glucosaminyl-R,
474:     C04556 represents 4-amino-2-methyl-5-phosphomethylpyrimidine,
475:     C04467 represents $\alpha$-L-fucosyl-1,2-$\beta$-D-galactosyl-R
476:     and C01311 represents
477:     1,4-$\beta$-D-galactosyl-($\alpha$-1,3-L-fucosyl)-N-acetyl-D-glu\-cos\-aminyl-R. }
478:   \label{fig:hsa}
479: \end{figure}
480: 
481: An important use of statistical graph theory is to characterize
482: chemical reaction networks. Of many possible network
483: representations~\cite{zhao:meta} we let vertices be chemical
484: substances, and for all reactions of an organism we link
485: substrates with products. For example, the hypothetical reaction
486: $\mathrm{A}+ \mathrm{B} \longleftrightarrow \mathrm{C}+\mathrm{D}$ would
487: contribute with the edges $(\mathrm{A},\mathrm{C})$,
488: $(\mathrm{A},\mathrm{D})$ and  $(\mathrm{B},\mathrm{C})$,
489: $(\mathrm{B},\mathrm{D})$ to the metabolic network. The data is
490: derived from the KEGG database
491: (\url{http://www.genome.jp/}), and described in detail in
492: Ref.~\cite{our:bio}. Since the degree distributions of metabolic
493: networks are highly skewed~\cite{jeong:meta}
494: we use a exponentially increasing set of intervals as equivalence
495: classes (as discussed in the connect of Eq.~(\ref{eq:ks})): $K_n=2^n$.
496: 
497: It has been argued that degree is strongly related to the function of
498: the chemical substance~\cite{jeong:meta,gui:meta}. This means that the degree
499: symmetry potentially can give additional information about the function of the vertices. For the
500: human metabolic network, and $l=2$, roughly half of the vertices have
501: a p-value of less than 5\% (i.e., in the null-model sampling of the
502: calculation of $s_2^\mathrm{MC}$, less than 5\% or more than 95\% of
503: the values of
504: \begin{equation}
505: \frac{\tilde{s}_l(i)- S_l(i)}{\Lambda -  S_l(i)}
506: \end{equation}
507: are smaller than the value of the real network). In Fig.~\ref{fig:hsa}(a)
508: we show the 2-neighborhood of one vertex with significantly higher
509: $s_2^\mathrm{MC}$ than expected; Fig.~\ref{fig:hsa}(b)
510: depict the 2-neighborhood of a vertex with significantly higher
511: $s_2^\mathrm{MC}$. The reason these particular vertices are used as examples is
512: that their 2-neighborhoods are of appropriate sizes, neither too big,
513: nor too small, to be displayed and described. Spermine,
514: Fig.~\ref{fig:hsa}(a), is a substance with high
515: degree-symmetry---$s_2^\mathrm{MC} = 0.89\pm0.02$. Both its neighbors are
516: in the same degree-equivalence class of vertices with degree four to
517: seven. Of vertices two steps away from spermine there is also a
518: significant overlap with two (out of four) neighbors to the neighbor
519: spermidine being in the equivalence class defined by degrees in the
520: interval $[8,16)$; whereas two vertices are in the equivalence class
521: of degrees in
522: $[4,8)$. The three paths from spermine via S-adenosylmethioninamine
523: also contribute to the overlap in the two steps from spermine as two
524: vertices (methylthioadenosine and spermindine) have degrees in the
525: same equivalence class. The neighborhood of C04850, seen in
526: Fig.~\ref{fig:hsa}(b), is visually less balanced and also having a
527: negative degree-symmetry---$s_2^\mathrm{MC} = -0.11\pm0.01$. We note that
528: there are some vertex pairs in the second neighborhood whose
529: degree-classes overlap, but apparently this is not enough to make the
530: symmetry coefficient non-negative.
531: 
532: \subsection{Average symmetry values}
533: 
534: \begin{table*}
535: \caption{\label{tab:avg} The network sizes $N$ and $M$ and the average
536:   numerical degree-symmetry coefficient $s_2^\mathrm{MC}$ of
537:   real-world networks. In the interstate
538:   network the vertices are American interstate highway junctions and
539:   two junctions are connected if there is a road with no junction in
540:   between. In the street networks the vertices are Swedish city-street
541:   segments connected if they share a junction.
542:   In the airport network (obtained from
543:   http://vlado.fmf.uni-lj.si/pub/networks/pajek/data/gphs.htm) the
544:   vertices are American airports and edges represent a regular, non-stop
545:   route. In the citation
546:   networks the vertices are papers and two papers are connected if
547:   they one cites the other. The ``scientometrics'' network consists of
548:   papers from the journal \textit{Scientometrics}. The ``small-world''
549:   network are all papers citing Ref.~\cite{milg:1} or having the
550:   phrase ``small world'' in the title. (The citation networks were
551:   obtained from
552:   http://vlado.fmf.uni-lj.si/pub/networks/data/cite/. These networks
553:   are the result of searches in the WebofScience used with
554:   the permission of ISI Philadelphia.) The board of directors and
555:   Ajou student networks are derived from one-mode projections of
556:   affiliation networks (where edges goes from persons to corporate
557:   boards and university classes respectively). The Ajou student
558:   network is averaged over graphs of 16 semesters. One edge
559:   represent two students taking at least three classes together that
560:   semester. The high school networks are gathered from
561:   questionnaires---an edge means that two persons have listed each
562:   other as acquaintances. It is averaged over 84 individual
563:   schools. In the electronic communication networks one edge
564:   represent that at least one of the vertices has contacted the other
565:   over some electronic medium. The food webs are networks of
566:   water-living species and an edge means that one species prey on the
567:   other. For the protein networks an edge means that two proteins
568:   interact (the two graphs correspond to two different
569:   types of experiments determining the interaction edges). The
570:   metabolic networks consist of chemical substances and edges are
571:   constructed as described in Sect.~\ref{sect:meta}. Values for
572:   animal metabolism is averaged over six networks, fungi metabolism
573:   is averaged over two, and bacteria metabolism is averaged over 96
574:   networks.
575: }
576: \begin{ruledtabular}
577:  \begin{tabular}{lrc|ccc}
578:   \multicolumn{2}{c}{network} & Ref. & $N$ & $M$ &
579:   $s_2^\mathrm{MC}$\\\hline
580:   geographical networks & interstate highways & & 935 & 1315 &
581:   $0.016\pm 0.003$ \\
582:     & streets, Stockholm & \cite{rosv:city} & 3325 & 5100 & $0.014\pm
583:     0.003$ \\
584:     & streets, Malm\"{o} & \cite{rosv:city} & 1868 & 3026 & $0.020\pm
585:     0.003$ \\
586:     & streets, G\"{o}teborg & \cite{rosv:city} & 1258 & 1516 &
587:   $0.026\pm 0.003$\\
588:    & airport & & 332 & 2126 & $-0.0573\pm 0.0002$ \\\hline
589:     citation networks & scientometrics &  & 2728 & 10398 &
590:     $0.015\pm 0.020$\\
591:     & small-world &  & 233 & 994 &
592:     $0.007\pm 0.002$\\\hline
593:     one-mode projections of & board of directors & \cite{davis} & 6193 &
594:     43074 & $0.175 \pm 0.004$ \\ 
595:     affiliation networks & Ajou University students &
596:     \cite{our:ajou2}& $7285 \pm 128$
597:     & $75898\pm 6566$& $0.13\pm 0.01$ \\\hline
598:     acquaintance networks & high school friendship & \cite{addh} &
599:     $571\pm 43$ &  $1104\pm60$ &$0.020\pm 0.002$ \\\hline
600:     electronic communication networks& e-mail & \cite{eckmann:dialog} & 3186 &
601:     31856 &  $-0.01\pm 0.01$\\
602:     & Internet community & \cite{pok} & 28295 & 115335
603:     & $0.01898\pm 0.0001$ \\\hline
604:     food webs & Little Rock lake & \cite{martinez:rock} & 92 & 960 &
605:      $0.042\pm 0.001$ \\
606:     & Ythan estuary & \cite{ythan1} &  134 & 593 &  $0.027 \pm
607:     0.002$\\\hline
608:     neural network & \textit{C.\ elegans} & \cite{cenn:brenner} & 280
609:     & 1973 & $0.0839\pm 0.0001$ \\\hline
610:     biochemical networks & \textit{S.\ cervisiae} protein &
611:     \cite{pagel:mips,hh:pfp} & 4580 & 7434
612:     &  $0.0205 \pm 0.0001$\\
613:     & \textit{S.\ cervisiae} genetic & \cite{pagel:mips,hh:pfp} & 4580
614:     & 5129 & $0.0996\pm 0.0001$ \\
615:     & animal metabolism & \cite{our:bio} & $1621\pm 123$& $4662\pm
616:     473$ & $0.02\pm 0.01$ \\
617:     & plant metabolism, \textit{A. thaliana} & \cite{our:bio} & 1561 &
618:     4302 & $0.0133\pm 0.0003$ \\
619:     & fungi metabolism & \cite{our:bio} & $1281\pm 97$& $3654 \pm 289$&
620:     $0.03 \pm 0.02$ \\
621:     &  bacteria metabolism & \cite{our:bio} & $1070 \pm 35$ & $2776\pm
622:     109$ & $0.018\pm 0.002$
623:     \\
624:   \end{tabular}
625: \end{ruledtabular}
626: \end{table*}
627: 
628: So far we have discussed degree symmetries of vertices. In this
629: section we average $s_l$ over $V$ to obtain
630: a graph-wide measure for degree symmetry. In Table~\ref{tab:avg} we
631: display values of
632: $s_2^\mathrm{MC}$ for a number of different network types. Some of
633: these have highly skewed degree distributions. For these,
634: the exponentially increasing degree equivalence classes of
635: Sect.~\ref{sect:meta} are appropriate. Since we intend to compare all
636: networks we use the same equivalence classes for all networks. The
637: first observation is that almost all networks have a positive average
638: symmetry coefficient. The only clear exception is the airport
639: network. This means that if you start a two-leg airplane
640: trip at a particular airport, choosing between two random itineraries
641: (without caring about the frequency of flights), then the probability
642: of the airports along these itineraries being different in number of
643: connections is smaller than in a random network. The strongest
644: degree-symmetries are found in one-mode projections of social
645: affiliation networks. Note that the other social networks, derived
646: from questionnaires and electronic communication does not have such
647: strong symmetry coefficients.
648: In one-mode projections high-degree vertices are
649: known to have strong tendency to attach to other
650: high-degree vertices, and
651: low-degree vertices to attach to other low-degree-vertices---so called
652: assortative mixing~\cite{mejn:assmix}. If this property is strong
653: there will be regions of vertices with high degree and other regions
654: with low-degree vertices. The paths within these regions would also
655: have similar degree sequences. Thus high assortative mixing can be related
656: to high degree symmetry, the first causing the second or vice
657: versa. They are, of course, not equivalent---e.g., the example network
658: with all vertices having positive symmetry coefficients
659: (Fig.~\ref{fig:ex}(b)) is maximally disassortatively mixed (in the
660: sense of Ref.~\cite{mejn:assmix}). Where the weak symmetry coefficients
661: of other networks come from is outside the scope of this
662: investigation. One possible explanation would be that functional
663: units~\cite{alon} might often be degree-symmetric centers.
664: 
665: 
666: \section{Summary and conclusions}
667: 
668: We have derived a measure for a specific notion of symmetry in
669: networks---the property that the paths out from a vertex have
670: overlapping degree sequences. The measure is designed so that random
671: networks, conditioned only to have the same set of degrees as the
672: original network, have the value zero. We propose two versions of the
673: symmetry coefficient, the first being approximately zero for random
674: networks, the second requiring a randomization procedure (and thus
675: longer simulation time) but being more accurately zero for random networks. The measure was
676: evaluated on example graphs. We show that they are able to
677: detect vertices in degree-symmetric, and potentially functionally
678: meaningful positions in the human metabolic network. The average
679: degree-symmetry of various networks were also investigated. We found almost
680: all networks having a weakly positive degree coefficient. The
681: exceptions being the network of American airports and their
682: interconnections (having a negative degree-symmetry coefficient) and
683: one-mode projections of social affiliation networks (having rather
684: strongly positive values).
685: Our measure is not the first to be based on a the properties of paths
686: going out from a vertex. For example people have been using path
687: counts for assessing the functional similarity of pairs of
688: vertices~\cite{blondel:sim,simrank,our:sim}. In social network studies
689: such measures are commonly called ``ego-centric''~\cite{wf}.
690: 
691: 
692: Symmetry concepts have been successfully utilized in many field of
693: physics. We believe degree symmetry, and other classes of network
694: symmetries, will be a fruitful direction of future network
695: studies. Degree symmetry is in particular, we believe, an important
696: concept for networks where degree is strongly related to the function
697: of the vertex. Two open questions from this study is what causes the
698: rather ubiquitous weakly positive degree symmetries, and what process in the
699: airline decision making that causes the negative average symmetry
700: coefficient of the airline network.
701: 
702: 
703: \begin{acknowledgements}
704:   The author acknowledges financial support from the Wenner-Gren
705:   foundations and help with data acquisition from: Gerald Davis,
706:   Jean-Pierre Eckman, Michael Gastner, Mikael Huss, Beom Jun Kim,
707:   Sungmin Park and Martin Rosvall. This research uses data from Add
708:   Health, a program project designed by J. Richard Udry, Peter
709:   S. Bearman, and Kathleen Mullan Harris, and funded by a grant
710:   P01--HD31921 from the National Institute of Child Health and Human
711:   Development, with cooperative funding from 17 other
712:   agencies. Special acknowledgment is due Ronald R. Rindfuss and
713:   Barbara Entwisle for assistance in the original design. Persons
714:   interested in obtaining data files from Add Health should contact
715:   Add Health, Carolina Population Center, 123 W. Franklin Street,
716:   Chapel Hill, NC 27516--2524 (addhealth@unc.edu).
717: \end{acknowledgements}
718: 
719: \begin{thebibliography}{10}
720: 
721: \bibitem{ba:rev}
722: R.~Albert and A.-L. Barab\'{a}si.
723: Statistical mechanics of complex networks.
724: \textit{Rev. Mod. Phys}, 74:47--98, 2002.
725: 
726: \bibitem{addh}
727: P.~Bearman, J.~Moody, and K.~Stovel.
728: Chains of affection: The structure of adolescent romantic and sexual
729:   networks.
730: \textit{American Journal of Sociology}, 110:44--91.
731: 
732: \bibitem{blondel:sim}
733: V.~D. Blondel, A.~Gajardo, M.~Heymans, P.~Senellart, and P.~{Van Dooren}.
734: A measure of similarity between graph vertices: Applications to
735:   synonym extraction and web searching.
736: \textit{SIAM Rev.}, 46:647--666, 2004.
737: 
738: \bibitem{davis}
739: G.~F. Davis, M.~Yoo, and W.~E. Baker.
740: The small world of the {A}merican corporate elite, 1982-2001.
741: \textit{Strategic Organization}, 1:301--326, 2003.
742: 
743: \bibitem{doromen:book}
744: S.~N. Dorogovtsev and J.~F.~F. Mendes.
745: \textit{Evolution of Networks: From Biological Nets to the Internet and
746:   WWW}.
747: Oxford University Press, Oxford, 2003.
748: 
749: \bibitem{eckmann:dialog}
750: J.-P. Eckmann, E.~Moses, and D.~Sergi.
751: Entropy of dialogues creates coherent structures in e-mail traffic.
752: \textit{Proc. Natl. Acad. Sci. USA}, 101:14333--14337, 2004.
753: 
754: \bibitem{gui:meta}
755: R.~Guimer\`{a} and L.~A. {Nunes Amaral}.
756: Functional cartography of complex metabolic networks.
757: \textit{Nature}, 433:895--900, 2005.
758: 
759: \bibitem{gui:mod}
760: R.~Guimer\`{a}, M.~Sales-Pardo, and L.~A. {Nunes Amaral}.
761: Modularity from fluctuations in random graphs and complex networks.
762: \textit{Phys. Rev. E}, 70:025101, 2004.
763: 
764: \bibitem{ythan1}
765: S.~J. Hall and D.~Raffaelli.
766: Food web patterns: Lessons from a species-rich web.
767: \textit{Journal of Animal Ecology}, 60:823--842, 1991.
768: 
769: \bibitem{pok}
770: P.~Holme, C.~R. Edling, and F.~Liljeros.
771: Structure and time evolution of an {I}nternet dating community.
772: \textit{Social Networks}, 26:155--174, 2004.
773: 
774: \bibitem{hh:pfp}
775: P.~Holme and M.~Huss.
776: Role-similarity based functional prediction in networked systems:
777:   application to the yeast proteome.
778: \textit{J. Roy. Soc. Interface}, 2:327--333, 2005.
779: 
780: \bibitem{our:ajou2}
781: P.~Holme, S.~M. Park, B.~J. Kim, and C.~R. Edling.
782: Korean university life in a network perspective: Dynamics of a large
783:   affiliation network.
784: e-print cond-mat/0411634.
785: 
786: \bibitem{our:bio}
787: M.~Huss and P.~Holme.
788: Currency and commodity metabolites: Their identification and relation
789:   to the modularity of metabolic networks.
790: e-print q-bio/0603038.
791: 
792: \bibitem{janson}
793: S.~Janson, T.~{\L}uczac, and A.~Ruci\'{n}ski.
794: \textit{Random Graphs}. Whiley, New York, 1999.
795: 
796: \bibitem{simrank}
797: G.~Jeh and J.~Widom.
798: Sim{R}ank: {A} measure of structural-context similarity.
799: In \textit{Proceedings of the eighth ACM SIGKDD international conference
800:   on knowledge discovery and data mining}, pages 538--543, Edmonton, 2002.
801: 
802: \bibitem{jeong:meta}
803: H.~Jeong, B.~Tombor, Z.~N. Oltvai, and A.-L. Barab\'{a}si.
804: The large-scale organization of metabolic networks.
805: \textit{Nature}, 407:651--654, 2000.
806: 
807: \bibitem{our:sim}
808: E.~A. Leicht, P.~Holme, and M.~E.~J. Newman.
809: Vertex similarity in networks.
810: \textit{Phys. Rev. E}, 73:026120, 2006.
811: 
812: \bibitem{martinez:rock}
813: N.~D. Martinez.
814: Artifacts or attributes? {E}ffects of resolution on the {L}ittle
815:   {R}ock {L}ake food web.
816: \textit{Ecological Monographs}, 61:367--392, 1991.
817: 
818: \bibitem{milg:1}
819: S.~Milgram.
820: The small world problem.
821: \textit{Psycol. Today}, 2:60--67, 1967.
822: 
823: \bibitem{mejn:assmix}
824: M.~E.~J. Newman.
825: Assortative mixing in networks.
826: \textit{Phys. Rev. Lett.}, 89:208701, 2002.
827: 
828: \bibitem{mejn:rev}
829: M.~E.~J. Newman.
830: The structure and function of complex networks.
831: \textit{SIAM Review}, 45:167--256, 2003.
832: 
833: \bibitem{mejn:wei}
834: M.~E.~J. Newman.
835: Analysis of weighted networks.
836: \textit{Phys. Rev. E}, 70:056131, 2004.
837: 
838: \bibitem{pagel:mips}
839: P.~Pagel, S.~Kovac, M.~Oesterheld, B.~Brauner, I.~Dunger-Kaltenbach,
840:   G.~Frishman, C.~Montrone, P.~Mark, V.~St\"{u}mpflen, H.~W. Mewes, A.~Ruepp,
841:   and D.~Frishman.
842: The {MIPS} mammalian protein-protein interaction database.
843: \textit{Bioinformatics}, 21:832--834, 2004.
844: 
845: \bibitem{roberts:mcmc}
846: J.~M. {Roberts Jr.}
847: Simple methods for simulating sociomatrices with given marginal
848:   totals.
849: \textit{Social Networks}, 22:273--283, 2000.
850: 
851: \bibitem{rosv:city}
852: M.~Rosvall, A.~Trusina, P.~Minnhagen, and K.~Sneppen.
853: Networks and cities: An information perspective.
854: \textit{Phys. Rev. Lett.}, 94:028701, 2005.
855: 
856: \bibitem{alon}
857: S.~Shen-Orr, R.~Milo, S.~Mangan, and U.~Alon.
858: Network motifs in the transcriptional regulation network of
859:   {E}scherichia coli.
860: \textit{Nature Genetics}, 31:64--68, 2002.
861: 
862: \bibitem{wf}
863: S.~Wasserman and K.~Faust.
864: \textit{Social network analysis: Methods and applications}.
865: Cambridge University Press, Cambridge, 1994.
866: 
867: \bibitem{cenn:brenner}
868: J.~G. White, E.~Southgate, J.~N. Thomson, and S.~Brenner.
869: The structure of the nervous system of the nematode {C}aenorhabditis
870:   elegans.
871: \textit{Phil. Trans. R. Soc. Lond. Ser. B}, 314:1--340, 1986.
872: 
873: \bibitem{zhao:meta}
874: J.~Zhao, H.~Yu, J.~Luo, Z.~W. Cao, and Y.-X. Li.
875: Complex networks theory for analyzing metabolic networks.
876: e-print q-bio/0603015.
877: 
878: \end{thebibliography}
879: 
880: 
881: \end{document}
882: