0605:physics0605029/sym.tex

1: \documentclass[rmp,twocolumn,showpacs]{revtex4}

2:

3: \usepackage{dcolumn,graphicx,amsmath,amssymb,txfonts}

4:

5: \begin{document}

6:

7: \title{Detecting degree symmetries in networks}

8:

9: \author{Petter Holme}

10: \affiliation{Department of Computer Science, University of New Mexico,

11:   Albuquerque, NM 87131, U.S.A.}

12:

13: \begin{abstract}

14:   The surrounding of a vertex in a network can be more or less

15:   symmetric. We derive measures of a specific kind of symmetry of a

16:   vertex which we call \textit{degree symmetry}---the property that

17:   many paths going out from a vertex have overlapping degree

18:   sequences. These measures are evaluated on artificial and real

19:   networks. Specifically we consider vertices in the human metabolic

20:   network. We also measure the average degree-symmetry coefficient for

21:   different classes of real-world network. We find that most studied

22:   examples are weakly positively degree-symmetric. The exceptions are

23:   an airport network (having a negative degree-symmetry coefficient)

24:   and one-mode projections of social affiliation networks that are

25:   rather strongly degree-symmetric.

26: \end{abstract}

27:

28: \pacs{89.75.Fb, 89.75.Hc}

29: % 89.75.Fb -- Structures and organization in complex systems

30: % 89.75.Hc -- Networks and genealogical trees

31:

32: \maketitle

33:

34: \section{Introduction}

35:

36: \begin{figure}

37:   \resizebox*{0.4\linewidth}{!}{\includegraphics{ill.eps}}

38:   \caption{ Illustrations of degree symmetry. Consider

39:     paths of length two (i.e.\ $l=2$). All paths out from the

40:     central (black) vertex have the degree sequence $(3,2)$ meaning

41:     the central vertex has high degree symmetry.}

42:   \label{fig:ill}

43: \end{figure}

44:

45: With the advent of modern database technology numerous large scale

46: network data-sets have been made available. This development has

47: triggered a surge of activity in studies of statistical network

48: properties~\cite{ba:rev,mejn:rev,doromen:book}. The underlying idea of

49: these studies is that the network structure (the way the networks

50: differ from completely random networks) contain some information of the

51: function, both locally and globally, of the network. Hence a common

52: theme in these works has been the development of structural measures

53: to characterize network structure.

54: In this paper we propose and evaluate a measure of a previously

55: unstudied network structure---a special case of symmetry we call

56: \textit{degree symmetry}. In geometry an object

57: is symmetrical if it is invariant to rotations, reflections, and so

58: on. In networks, with no given geometrical embedding, these concepts

59: have to be relaxed. Furthermore, we would like to have a continuous

60: measure saying not only if a vertex is a local center of symmetry or

61: not, but also how symmetric the vertex is. The aspect of symmetry we

62: address is, roughly speaking, that if you look at the object

63: (network in our case) in different ways from a symmetric vertex it

64: still looks the same. We process of ``looking'' will in our case be

65: walking along paths (non-self intersecting sequences of

66: edges). Furthermore, since degree (number of neighbors) is commonly

67: regarded as the most fundamental quantity relating a vertex to its

68: function, we say two vertices ``look the same'' if they

69: have the same degree. We will thus derive our measure by performing

70: walks along all paths from a vertex and compare the sequence of

71: degrees of the vertices along these paths. The situation we have in

72: mind is depicted in Fig.~\ref{fig:ill}---all paths from the central

73: vertex have degree sequences starting with $(3,2,\cdots)$, thus the

74: central vertex is highly degree symmetric.

75:

76: The rest of the paper is organized as follows: First we give a

77: detailed derivation of the degree-symmetry coefficient (in two

78: different versions, appropriate for different needs). Then we evaluate

79: these on example networks and a biochemical network. Finally we

80: discuss the average degree symmetry of different classes of real-world

81: networks.

82:

83:

84: \section{Derivation of the measure}

85:

86: We will consider the network represented by a graph $G=(V,E)$ of $N$

87: vertices, $V$, and $M$ edges, $E$. For a vertex $i$ to have high degree

88: symmetry it has, as mentioned, to have many paths with the same

89: sequence of degrees. We will use a cut-off $l$ for the pathlength and

90: consider only paths of that length. The reason

91: for this cutoff is threefold: First, in all (with possibly some

92: curious exception) network processes, a vertex is more affected by its closest

93: surroundings then vertices further away. Thus one would like to have a

94: lower weight on the contribution from distant vertices. Second, the

95: number of vertices $n$ steps away grows fast

96: with the distance from $i$. For finite networks this means that the

97: paths soon reach the periphery of the network where unwanted

98: finite-size effects set in. Third, for computation speed, one benefit

99: from a cutoff.

100:

101: \begin{figure}

102:   \resizebox*{0.8\linewidth}{!}{\includegraphics{def.eps}}

103:   \caption{ Illustrations of of concepts in the derivation of the

104:     degree symmetry coefficient. (a) illustrates the branching

105:     number. Consider paths of length three out from $i$. The branching

106:     number of the path $(i,j)$ is five (there are five paths from $i$

107:     of length three that goes through $j$). The branching number at

108:     $j'$ is two. (b) shows the set $\Delta(P,i)$, where $P$ is the

109:     path $(i,j,j')$.

110: }

111:   \label{fig:def}

112: \end{figure}

113:

114: Assume there are $p$ paths of length $l$ from a vertex $i$. We then

115: denote the degree sequences of these paths

116: \begin{eqnarray}

117:   Q_l(i)&=&\Big\{[k(v^1_{1,i,l}),\cdots,k(v^l_{1,i,l})],\nonumber\\

118: &&\vdots\\

119:  && ,[k(v^1_{p,i,l}),\cdots,k(v^l_{p,i,l})]\Big\},\nonumber

120: \end{eqnarray}

121: where $k(v)$ denotes the degree

122: of a vertex $v$ and $v^j_{m,i,l}$ is the $j$'th vertex of along the

123: $m$'th path of length $l$ leading out from $i$. Then if there are

124: unexpectedly many vertices at the same ($j$-) index in the sequence

125: with the same degree, the vertex $i$ is a local center of degree

126: symmetry. A rough symmetry measure would thus be to count the fraction

127: of index-pairs with the same degree, i.e.\

128: \begin{equation}\label{eq:para}

129:   \frac{\tilde{s}_l(i)}{\Lambda}=\sum_{0\leq n<n'\leq p}\sum_{j=1}^l

130:     \delta\big(k(v^j_{n,i,l}), k(v^j_{n',i,l})\big),

131: \end{equation}

132: where

133: \begin{equation}\label{eq:lambda}

134: \Lambda=(l-1)\:\dbinom{p}{2}\mbox{~~and~~} \delta(x,y)=\left\{

135: \begin{array}{cl} 1 & \mbox{if $x=y$}\\ 0 &\mbox{if $x\neq

136:     y$}\end{array}

137: \right. .

138: \end{equation}

139: This measure is very crude and lack many desired statistical

140: features. For example, all paths that go

141: via a particular neighbor of $i$ will give a contribution to the

142: sum. In practice this means that vertices with a high degree

143: vertex rather far from itself (but closer that $l$) will trivially

144: have a high $\tilde{s}_l(i)/\Lambda$. A first step would thus be to omit the

145: contribution of vertices occurring in many sequences of $Q_l(i)$ at a

146: specific index. I.e., for all $l'\in (0,l)$ one wants to exclude the

147: terms

148: \begin{equation}\label{eq:para2}

149:   \sum_{n,n'} \delta\big(k(v^1_{n,i,l}),

150:   k(v^1_{n',i,l})\big),

151: \end{equation}

152: where $n$ and $n'$ are indices of paths that are identical the

153: first $l'$ steps, from Eq.~(\ref{eq:para}). Let $S_l(i)$ denote the

154: number of such terms.

155:

156:

157: To calculate $S_l(i)$ consider a path $P=(i,\cdots,j)$ of length $l'<l$. Let

158: $b_l(P,i)$ be the number of paths from $i$ of length $l$ that start

159: with the path $P$. We call $b_l(P,i)$ the \textit{branching number} of

160: $P$, see Fig.~\ref{fig:def}(a). All pairs of paths starting with $P$

161: will contribute to $\tilde{s}_l(i)$ a distance $l'$ from $i$ (since

162: they all pass

163: through $j$). Let $\Delta(P,i)$ be the set of neighbors to

164: $j$ that is not on the path $P$ from $i$ to $j$, see Fig.~\ref{fig:def}(b). (The number of

165: elements in $\Delta(P,i)$ is thus $k_j-1$.)

166: This situation gives a contribution

167: \begin{equation}\label{eq:cont}

168:   S_l(P,i) = \dbinom{b_l(P,i)}{2}+ \sum_{j\in\Delta(P,i)}S_l((P,j),i)

169: \end{equation}

170: from vertices of indices in the interval $[l',l]$ of $Q_l(i)$ to

171: $\tilde{s}_l(i)$, where $(P,j')$ denotes the path $(i,\cdots,j,j')$.

172:

173:

174: To further improve the

175: measure one would like to, assuming some null-model, subtract the

176: expected random contribution to $\tilde{s}_l(i)/\Lambda$. If this can be

177: achieved one would have a symmetry coefficient $s_l(i)$ that is zero when

178: the symmetry is what can be expected from the null-model, larger if $i$

179: is a center of unexpectedly high symmetry, and less than zero if $i$ is

180: degree anti-symmetric. A final symmetry coefficient could thus

181: be written

182: \begin{equation}\label{eq:proto}

183: s_l(i)=\frac{\tilde{s}_l(i)- S_l(i)}{\Lambda- S_l(i)}-\nu , \mbox{~~

184:   provided $\Lambda > S_l(i)$}

185: \end{equation}

186: where $\nu$ is the expected value of $(\tilde{s}_l(i)- S_l(i)) / (\Lambda-

187: S_l(i))$ in a null-model. $\Lambda= S_l(i)$ can only happen if there is

188: one or no path of length $l$. In both these cases the degree-symmetry

189: concept makes no sense so, if $\Lambda =

190: S_l(i)\in\{0,1\}$, we set $s_l(i)=0$.

191: The null-model we assume is random

192: constrained on the degree distribution of the network. I.e., given the

193: fraction $p_k$ of $k$-degree vertices the network is as random as

194: possible. As it turns out $\nu$ is tricky to calculate

195: analytically. There are two ways to proceed---either one calculates

196: an approximative $\nu$ or one obtains $\nu$ via averaging

197: $(\tilde{s}_l(i)- S_l(i)) / (\Lambda- S_l(i))$ over realizations of the

198: null-model. Except being more accurate, the latter approach has the

199: advantage of giving an error estimate of $s_l(i)$---one can by

200: specifying a

201: p-value define significantly symmetric, or anti-symmetric,

202: vertices. We will use both approaches: The approximative method for

203: analyzing example networks and the numerical method for

204: analyzing real-world data.

205:

206:

207: We obtain an approximative value of $\nu$,  $\nu^\mathrm{app.}$, by

208: assuming $\nu$ is

209: approximately equal to the probability that a pair of vertices,

210: reached by walking along paths, is the same.

211: Note that, since there are $k$ ways into a

212: degree-$k$ vertex, when following a path the probability to reach a

213: degree-$k$ vertex is

214: \begin{equation}\label{eq:kpk}

215:   \frac{kp_k}{\sum_{k'} k'p_{k'}} = \frac{kp_k}{\langle k\rangle}.

216: \end{equation}

217: Thus the probability $\nu^\mathrm{app.}$ that two vertices of

218: the same degree is reached by following different paths is

219: \begin{equation}\label{eq:nu}

220:   \nu^\mathrm{app.}=\sum_kp_k\left(\frac{kp_k}{\langle k\rangle}\right)^2 =

221:   \frac{1}{\langle k\rangle^2} \sum_kk^2p_k^3.

222: \end{equation}

223: One reason this approach is not exact is that the number of terms in

224: the expression for $\tilde{s}_l(i)$ increases with the degree of the

225: $j$ in $\Delta(P,i)$ of Eq.~(\ref{eq:cont}). There are other

226: higher-order effects to related to other correlations between the path

227: structure and the degree of the vertices.

228:

229: To summarize we have two measures of local vertex symmetry, one

230: approximative:

231: \begin{equation}\label{eq:app}

232:   s^\mathrm{app.}_l(i)=\frac{\tilde{s}_l(i)- S_l(i)}{\Lambda-

233:     S_l(i)}-\frac{1}{\langle k\rangle^2} \sum_kk^2p_k^3 ,

234: \end{equation}

235: and one based one Monte Carlo sampling

236: \begin{equation}\label{eq:mc}

237:   s^\mathrm{MC}_l(i)=\frac{\tilde{s}_l(i)- S_l(i)}{\Lambda-

238:     S_l(i)}-\left\langle \frac{\tilde{s}_l(i)- S_l(i)}{\Lambda-

239:     S_l(i)}\right\rangle .

240: \end{equation}

241: The sampling is conveniently done by random rewiring the edges of the

242: original network~\cite{roberts:mcmc}.

243:

244: \section{Algorithm}\label{sect:algo}

245:

246: The heart of algorithm, as suggested in the previous section, is a

247: depth-first search with depth $l$. When the returning along the traced

248: out paths the branching number can be calculated recursively through

249: \begin{equation}

250:   b_l(P,i) = \left\{\begin{array}{ll} 1 & \mbox{if $P$ has length $l$}\\

251:    \sum_{j'\in\Delta((P,j'),i)}b_l((P,j'),i) &

252:   \mbox{otherwise}\end{array}\right. . \label{eq:bn}

253: \end{equation}

254: $S_l(P_i)$ can be calculated simultaneously using

255: Eq.~(\ref{eq:cont}). A slight complication is that the same vertex may

256: appear in different branches of the depth first search while calculating

257: $b$ and $\tilde{s}$. For small cut-off values this is easy to handle:

258: For $l=2$ it does not affect the calculation at all. For $l=3$ one

259: would only have

260: to keep different depths (of Eqs.~(\ref{eq:cont}) and (\ref{eq:bn}))

261: separate. For the calculation of $\tilde{s}_l(i)$ the terms

262: of $Q_l(i)$ has to be stored. Since the number of paths $p$ grows

263: fast with $l$, this can be quite a constraint for a large

264: $l$. Luckily it suffices to store a histogram $h(l',k)$ counting the

265: number of vertices of degree $k$ at position $l'$ of the paths

266: $Q_l(i)$. $p$ (and thus $\Lambda$) can be calculated as the number of

267: time the depth $l$ of the depth first search is reached. The running

268: time of the algorithm is $O(p)$. A mean field

269: approximation for networks with few triangles gives

270: $O(p) \approx O(\langle k\rangle^l)$.

271:

272: \section{Extensions and considerations}

273:

274: The method outlined above can quite straightforwardly be extended to

275: network with directed edges, distinct types of edges or (integer) edge

276: weights.

277:

278: Imagine a network with $z$ different edge sets $E_1,\cdots,E_z$. Such

279: networks frequently occur in cellular biochemistry---e.g.\ protein

280: interaction networks where different types of protein interaction can

281: be recorded~\cite{hh:pfp}, or gene regulation networks where

282: the edges can be activating or inhibitory. One sensible way to extend

283: the above procedure is to use the union of the edges as your

284: graph but to say two pairs of vertices in $Q_l(i)$ are identical if

285: their degrees with respect to all of the networks are the

286: same. To formalize this $Q_l(i)$ would be generalized to

287: \begin{eqnarray}

288:   Q_l(i)&=&\Big\{[\mathbf{k}(v^1_{1,i,l}),\cdots,

289:   \mathbf{k}(v^l_{1,i,l})],\nonumber\\

290: &&\vdots\\

291:  && ,[\mathbf{k}(v^1_{p,i,l}),\cdots,

292:  \mathbf{k}(v^l_{p,i,l})]\Big\},\nonumber

293: \end{eqnarray}

294: where $\mathbf{k}(v)$ is a vector with $v$'s degrees with respect to

295: the different edge-types.

296: and the $\delta$-function of Eq.~(\ref{eq:para2}) would be one if the

297: arguments are equal at all their indices, and zero otherwise. The

298: $\nu^\mathrm{app.}$ has to be redefined too:

299: \begin{equation}

300:   \nu^\mathrm{app.} =

301:   \frac{1}{\langle k\rangle^2} \sum_{k',k''}k'p_{k'}\,k''p_{k''}

302:   \prod_{i=1}^z\sum_{j=1}^zp_i(k_j|k')p_i(k_j|k''),

303: \end{equation}

304: where $p_i(k|k')$ is the conditional probability that a vertex

305: has degree $k$ with respect to edge set $E_i$ given that its degree

306: in the union network is $k'$. The case of a directed network can be

307: treated similarly---one consider paths following edges in both

308: directions but a vertex pair gives a contribution to $\tilde{s}$ only

309: if both the in- and out-degrees are the same.

310:

311:

312: The approach of Sect.~\ref{sect:algo} can straightforwardly be applied

313: to networks where

314: multiple edges are allowed. Since multiple edges can be used to model

315: weighted graphs~\cite{mejn:wei} the generalization to weighted graphs

316: (at least where edge-weights represent the probability of following an

317: edge) is simple. The other aspect of multigraphs, self-edges, is

318: trivially dealt with---by the requirement that a paths should not

319: intersect themselves a self-edge will never be followed and can thus

320: be omitted already when the graph is constructed.

321:

322:

323: The overlap required for a vertex pair to be considered equal in the

324: calculation of the symmetry coefficient is rather strict. Sometimes one

325: would like to treat two paths as similar even if their degrees differs

326: slightly. Particularly, this applies to broad degree

327: distributions. The functional difference between degree-2 and degree-3

328: vertices may be significant; but whether a vertex has degree 1002 or

329: 1003 probably does not matter. To achieve such a relaxation one can

330: construct a integer sequence $K_1<K_2<\cdots$ and let

331: \begin{equation}\label{eq:ks}

332:   \delta(k,k') =\left\{\begin{array}{cl} 1 & \mbox{if $K_i\leq

333:   k,k'<K_{i+1}$ for some $i$}\\ 0 &

334:   \mbox{otherwise}\end{array}\right. .

335: \end{equation}

336: I.e., one construct a series of equivalence classes of vertices.

337: For a power-law, or similarly broad, degree distributions one can let

338: $K_{i+1}-K_i$ increase exponentially with $i$. In this case one also

339: has to modify the definition of $\nu^\mathrm{app.}$

340: \begin{equation}

341:   \nu^\mathrm{app.}=\frac{1}{\langle k\rangle^2}\sum_i \left(\sum_{K_i\leq

342:   k<K_{i+1}} p_k\right) \left(\sum_{K_i\leq k<K_{i+1}} k p_k\right)^2 .

343: \end{equation}

344:

345: \begin{figure}

346:   \resizebox*{\linewidth}{!}{\includegraphics{ex.eps}}

347:   \caption{ Degree symmetries of small example networks. (a) is

348:     consistent with the example Fig.~\ref{fig:ill}(a). (b) is an

349:     example of a graph with only positive degree symmetries. (c) shows

350:     a graph with only negative degree symmetries. The cut-off length

351:     $l=2$ is used.}

352:   \label{fig:ex}

353: \end{figure}

354:

355:

356: \section{Degree symmetries of example networks}

357:

358: In this section we evaluate the measure for example networks and

359: real-world networks. We will use the smallest non-trivial cut-off

360: $l=2$ throughout this section. Most conclusions hold for $l=3$ or

361: $4$.

362:

363: \subsection{Small test graphs}

364:

365: To get a feeling for the $s_l$ measure we start by considering a few

366: small test networks, see Fig.~\ref{fig:ex}. In Fig.~\ref{fig:ex}(a) we

367: display a network with the same degree symmetry, with respect to the

368: central vertex (triangle), as Fig.~\ref{fig:ill}. As expected the

369: central vertex has a strong degree symmetry coefficient. To carry

370: through the calculation of Eq.~(\ref{eq:app}) once we obtain the

371: degree distribution $p_2=8/13$, $p_3=4/13$ and

372: $p_4=1/13$ giving $\nu^\mathrm{app.}=165/832\approx0.198$. All length-2 paths

373: out from the central vertex have the degree sequence $(3,2)$ so

374: $\tilde{s}_2(\triangle) = 4$, $S_2(\triangle) = 4$ and

375: $\Lambda= 28$ giving $s_2^\mathrm{app.}(\triangle)=667/832\approx0.802$. The

376: degree-3 vertices (squares) have two degree sequences of their outgoing

377: paths $(4,3)$ and $(2,2)$, whereas paths from degree-2 vertices

378: (triangles) have degree sequences $(3,4)$ and $(2,3)$. This difference

379: is larger than expected from the null model (random networks with

380: eight degree-2 vertices, four degree-3 vertices and one degree-4

381: vertex), thus the negative $s_2$ values for these vertices.

382:

383: In Fig.~\ref{fig:ex}(b) we show a graph where all vertices have

384: positive degree-symmetry coefficient. Paths from degree-2 vertices

385: have only the degree sequence $(3,2)$ and paths from degree-3 vertices

386: have only the degree sequence $(2,3)$. Thus, for every vertex, the

387: view of degrees along the path out to the rest of the network is the

388: same no matter which direction one looks in from that vertex. A

389: radically different view is seen in Fig.~\ref{fig:ex}(c). In this case

390: the vertices have three distinct positions in the network. The vertices marked

391: with squares have degree two and four outgoing paths of degree

392: sequences $(2,4)$, $(4,4)$, $(4,2)$ and $(4,2)$. The circles, despite

393: their different network position (as being part of triangles), have

394: the same set of degree sequences for their paths of length two. The

395: degree-3 vertices have six length-2 paths: three having the degree

396: sequence $(2,2)$, three having degree sequence $(4,2)$. It is easy to

397: convince oneself that this close to as dissimilar a network with four

398: degree-2 and two degree-4 vertices can be. Consequently all vertices

399: have negative degree-symmetry indices. It is worth pointing out that

400: the Fig.~\ref{fig:ex}(c) possesses other symmetries than

401: degree-symmetry. The layout has, for example, reflexive symmetry along

402: a vertical axis. We emphasize that such symmetries would need to be

403: captured by other measures.

404:

405: \subsection{Regular networks}

406:

407: If all vertices have the same degree a network is called

408: \textit{regular}~\cite{janson}. Then by definition all paths are

409: known to fully overlap. This trivial overlap should be canceled in

410: our symmetry measure so $s_l(i)=0$ for all $l$ and $i$. Since

411: $S_l(i)$ is the number of terms in $\tilde{s}_l(i)$ and all

412: these terms are one we have $S_l(i)=\tilde{s}_l(i)=\Lambda$.

413: Furthermore, $\nu^\mathrm{app.}=1$ which gives $s_l(i)$ for all

414: vertices and cut-off lengths.

415:

416: \subsection{Random graphs}

417:

418: \begin{figure}

419:   \resizebox*{0.7\linewidth}{!}{\includegraphics{er.eps}}

420:   \caption{ The average approximative symmetry coefficient for $l=3$

421:     and random graphs with $M=2N$. The line is a fit to a power-law

422:     decay form ($0.124+0.435N^{-1.02}$, to be exact).}

423:   \label{fig:er}

424: \end{figure}

425:

426: Next we evaluate the average approximative symmetry coefficient

427: $\langle s^\mathrm{app.}\rangle$ for random

428: graphs~\cite{janson}---graphs obtained by successively adding $M$ edges

429: between $N$ vertices with the restriction that no multiple edge, or

430: self-edge, may occur. Such networks have no correlations at all and

431: can serve as a reference point for neutrality~\cite{mejn:rev}. Ideally

432: we would like such networks to, on average, have a degree-symmetry

433: coefficient of zero. As seen in Fig.~\ref{fig:er} $\langle

434: s^\mathrm{app.}_l\rangle$ converge to a small but positive value.

435: The decay is roughly inversely proportional to $N$---the same scaling as

436: the fraction of triangles in the network---which suggests that the

437: presence of triangles, and perhaps other short-cycles, is an important

438: source of finite size effects of $s^\mathrm{app.}_l$.

439: We conclude that the Monte Carlo sampling measure $s^\mathrm{MC}_l$

440: (or a more elaborate measure) is

441: needed if one wants to compare different networks. If, on the other

442: hand, one aims to compare different vertices of the same network the

443: faster $s^\mathrm{app.}_l(i)$ calculation is sufficient. This is not

444: an uncommon situation in the design of network measures. Another

445: example of this where neutrality is non-zero in the large-$N$ limit is

446: \textit{modularity}, measuring how good a subgraphs that

447: are densely connected within but not between each

448: other~\cite{gui:mod}.

449:

450:

451: \section{Degree symmetries of real networks}

452:

453: In this section we apply our measures to real-world networks. First we

454: take a look at the symmetry coefficients of specific vertices in the

455: metabolic network of humans, then we look at the average symmetry

456: coefficients of various classes of networks.

457:

458: \subsection{Human metabolic networks}\label{sect:meta}

459:

460: \begin{figure}

461:   \resizebox*{0.9 \linewidth}{!}{\includegraphics{hsa.eps}}

462:   \caption{ The 2-neighborhood of spermine---a vertex with high

463:     degree-symmetry---(a), and C04850---a vertex with low degree

464:     symmetry---(b), in the human metabolic network. The symbols

465:     indicate the equivalence classes

466:     defined by exponentially growing intervals. Filled circles have

467:     degree two, unfilled circles have degree four or five, a vertex

468:     symbolized by  an

469:     $n$-gon have degree in the interval $[2^n,2^{n+1})$.

470:     In case the chemical names are overly long the

471:     KEGG codes are given (``C'' and five digits):  C07282 represents

472:     eIF5A-precursor-deoxyhypusine, C04850 represents

473:     1,3-$\beta$-D-galactosyl-($\alpha$-1,4-L-fucosyl)-N-acetyl-D-glucosaminyl-R,

474:     C04556 represents 4-amino-2-methyl-5-phosphomethylpyrimidine,

475:     C04467 represents $\alpha$-L-fucosyl-1,2-$\beta$-D-galactosyl-R

476:     and C01311 represents

477:     1,4-$\beta$-D-galactosyl-($\alpha$-1,3-L-fucosyl)-N-acetyl-D-glu\-cos\-aminyl-R. }

478:   \label{fig:hsa}

479: \end{figure}

480:

481: An important use of statistical graph theory is to characterize

482: chemical reaction networks. Of many possible network

483: representations~\cite{zhao:meta} we let vertices be chemical

484: substances, and for all reactions of an organism we link

485: substrates with products. For example, the hypothetical reaction

486: $\mathrm{A}+ \mathrm{B} \longleftrightarrow \mathrm{C}+\mathrm{D}$ would

487: contribute with the edges $(\mathrm{A},\mathrm{C})$,

488: $(\mathrm{A},\mathrm{D})$ and  $(\mathrm{B},\mathrm{C})$,

489: $(\mathrm{B},\mathrm{D})$ to the metabolic network. The data is

490: derived from the KEGG database

491: (\url{http://www.genome.jp/}), and described in detail in

492: Ref.~\cite{our:bio}. Since the degree distributions of metabolic

493: networks are highly skewed~\cite{jeong:meta}

494: we use a exponentially increasing set of intervals as equivalence

495: classes (as discussed in the connect of Eq.~(\ref{eq:ks})): $K_n=2^n$.

496:

497: It has been argued that degree is strongly related to the function of

498: the chemical substance~\cite{jeong:meta,gui:meta}. This means that the degree

499: symmetry potentially can give additional information about the function of the vertices. For the

500: human metabolic network, and $l=2$, roughly half of the vertices have

501: a p-value of less than 5\% (i.e., in the null-model sampling of the

502: calculation of $s_2^\mathrm{MC}$, less than 5\% or more than 95\% of

503: the values of

504: \begin{equation}

505: \frac{\tilde{s}_l(i)- S_l(i)}{\Lambda -  S_l(i)}

506: \end{equation}

507: are smaller than the value of the real network). In Fig.~\ref{fig:hsa}(a)

508: we show the 2-neighborhood of one vertex with significantly higher

509: $s_2^\mathrm{MC}$ than expected; Fig.~\ref{fig:hsa}(b)

510: depict the 2-neighborhood of a vertex with significantly higher

511: $s_2^\mathrm{MC}$. The reason these particular vertices are used as examples is

512: that their 2-neighborhoods are of appropriate sizes, neither too big,

513: nor too small, to be displayed and described. Spermine,

514: Fig.~\ref{fig:hsa}(a), is a substance with high

515: degree-symmetry---$s_2^\mathrm{MC} = 0.89\pm0.02$. Both its neighbors are

516: in the same degree-equivalence class of vertices with degree four to

517: seven. Of vertices two steps away from spermine there is also a

518: significant overlap with two (out of four) neighbors to the neighbor

519: spermidine being in the equivalence class defined by degrees in the

520: interval $[8,16)$; whereas two vertices are in the equivalence class

521: of degrees in

522: $[4,8)$. The three paths from spermine via S-adenosylmethioninamine

523: also contribute to the overlap in the two steps from spermine as two

524: vertices (methylthioadenosine and spermindine) have degrees in the

525: same equivalence class. The neighborhood of C04850, seen in

526: Fig.~\ref{fig:hsa}(b), is visually less balanced and also having a

527: negative degree-symmetry---$s_2^\mathrm{MC} = -0.11\pm0.01$. We note that

528: there are some vertex pairs in the second neighborhood whose

529: degree-classes overlap, but apparently this is not enough to make the

530: symmetry coefficient non-negative.

531:

532: \subsection{Average symmetry values}

533:

534: \begin{table*}

535: \caption{\label{tab:avg} The network sizes $N$ and $M$ and the average

536:   numerical degree-symmetry coefficient $s_2^\mathrm{MC}$ of

537:   real-world networks. In the interstate

538:   network the vertices are American interstate highway junctions and

539:   two junctions are connected if there is a road with no junction in

540:   between. In the street networks the vertices are Swedish city-street

541:   segments connected if they share a junction.

542:   In the airport network (obtained from

543:   http://vlado.fmf.uni-lj.si/pub/networks/pajek/data/gphs.htm) the

544:   vertices are American airports and edges represent a regular, non-stop

545:   route. In the citation

546:   networks the vertices are papers and two papers are connected if

547:   they one cites the other. The ``scientometrics'' network consists of

548:   papers from the journal \textit{Scientometrics}. The ``small-world''

549:   network are all papers citing Ref.~\cite{milg:1} or having the

550:   phrase ``small world'' in the title. (The citation networks were

551:   obtained from

552:   http://vlado.fmf.uni-lj.si/pub/networks/data/cite/. These networks

553:   are the result of searches in the WebofScience used with

554:   the permission of ISI Philadelphia.) The board of directors and

555:   Ajou student networks are derived from one-mode projections of

556:   affiliation networks (where edges goes from persons to corporate

557:   boards and university classes respectively). The Ajou student

558:   network is averaged over graphs of 16 semesters. One edge

559:   represent two students taking at least three classes together that

560:   semester. The high school networks are gathered from

561:   questionnaires---an edge means that two persons have listed each

562:   other as acquaintances. It is averaged over 84 individual

563:   schools. In the electronic communication networks one edge

564:   represent that at least one of the vertices has contacted the other

565:   over some electronic medium. The food webs are networks of

566:   water-living species and an edge means that one species prey on the

567:   other. For the protein networks an edge means that two proteins

568:   interact (the two graphs correspond to two different

569:   types of experiments determining the interaction edges). The

570:   metabolic networks consist of chemical substances and edges are

571:   constructed as described in Sect.~\ref{sect:meta}. Values for

572:   animal metabolism is averaged over six networks, fungi metabolism

573:   is averaged over two, and bacteria metabolism is averaged over 96

574:   networks.

575: }

576: \begin{ruledtabular}

577:  \begin{tabular}{lrc|ccc}

578:   \multicolumn{2}{c}{network} & Ref. & $N$ & $M$ &

579:   $s_2^\mathrm{MC}$\\\hline

580:   geographical networks & interstate highways & & 935 & 1315 &

581:   $0.016\pm 0.003$ \\

582:     & streets, Stockholm & \cite{rosv:city} & 3325 & 5100 & $0.014\pm

583:     0.003$ \\

584:     & streets, Malm\"{o} & \cite{rosv:city} & 1868 & 3026 & $0.020\pm

585:     0.003$ \\

586:     & streets, G\"{o}teborg & \cite{rosv:city} & 1258 & 1516 &

587:   $0.026\pm 0.003$\\

588:    & airport & & 332 & 2126 & $-0.0573\pm 0.0002$ \\\hline

589:     citation networks & scientometrics &  & 2728 & 10398 &

590:     $0.015\pm 0.020$\\

591:     & small-world &  & 233 & 994 &

592:     $0.007\pm 0.002$\\\hline

593:     one-mode projections of & board of directors & \cite{davis} & 6193 &

594:     43074 & $0.175 \pm 0.004$ \\

595:     affiliation networks & Ajou University students &

596:     \cite{our:ajou2}& $7285 \pm 128$

597:     & $75898\pm 6566$& $0.13\pm 0.01$ \\\hline

598:     acquaintance networks & high school friendship & \cite{addh} &

599:     $571\pm 43$ &  $1104\pm60$ &$0.020\pm 0.002$ \\\hline

600:     electronic communication networks& e-mail & \cite{eckmann:dialog} & 3186 &

601:     31856 &  $-0.01\pm 0.01$\\

602:     & Internet community & \cite{pok} & 28295 & 115335

603:     & $0.01898\pm 0.0001$ \\\hline

604:     food webs & Little Rock lake & \cite{martinez:rock} & 92 & 960 &

605:      $0.042\pm 0.001$ \\

606:     & Ythan estuary & \cite{ythan1} &  134 & 593 &  $0.027 \pm

607:     0.002$\\\hline

608:     neural network & \textit{C.\ elegans} & \cite{cenn:brenner} & 280

609:     & 1973 & $0.0839\pm 0.0001$ \\\hline

610:     biochemical networks & \textit{S.\ cervisiae} protein &

611:     \cite{pagel:mips,hh:pfp} & 4580 & 7434

612:     &  $0.0205 \pm 0.0001$\\

613:     & \textit{S.\ cervisiae} genetic & \cite{pagel:mips,hh:pfp} & 4580

614:     & 5129 & $0.0996\pm 0.0001$ \\

615:     & animal metabolism & \cite{our:bio} & $1621\pm 123$& $4662\pm

616:     473$ & $0.02\pm 0.01$ \\

617:     & plant metabolism, \textit{A. thaliana} & \cite{our:bio} & 1561 &

618:     4302 & $0.0133\pm 0.0003$ \\

619:     & fungi metabolism & \cite{our:bio} & $1281\pm 97$& $3654 \pm 289$&

620:     $0.03 \pm 0.02$ \\

621:     &  bacteria metabolism & \cite{our:bio} & $1070 \pm 35$ & $2776\pm

622:     109$ & $0.018\pm 0.002$

623:     \\

624:   \end{tabular}

625: \end{ruledtabular}

626: \end{table*}

627:

628: So far we have discussed degree symmetries of vertices. In this

629: section we average $s_l$ over $V$ to obtain

630: a graph-wide measure for degree symmetry. In Table~\ref{tab:avg} we

631: display values of

632: $s_2^\mathrm{MC}$ for a number of different network types. Some of

633: these have highly skewed degree distributions. For these,

634: the exponentially increasing degree equivalence classes of

635: Sect.~\ref{sect:meta} are appropriate. Since we intend to compare all

636: networks we use the same equivalence classes for all networks. The

637: first observation is that almost all networks have a positive average

638: symmetry coefficient. The only clear exception is the airport

639: network. This means that if you start a two-leg airplane

640: trip at a particular airport, choosing between two random itineraries

641: (without caring about the frequency of flights), then the probability

642: of the airports along these itineraries being different in number of

643: connections is smaller than in a random network. The strongest

644: degree-symmetries are found in one-mode projections of social

645: affiliation networks. Note that the other social networks, derived

646: from questionnaires and electronic communication does not have such

647: strong symmetry coefficients.

648: In one-mode projections high-degree vertices are

649: known to have strong tendency to attach to other

650: high-degree vertices, and

651: low-degree vertices to attach to other low-degree-vertices---so called

652: assortative mixing~\cite{mejn:assmix}. If this property is strong

653: there will be regions of vertices with high degree and other regions

654: with low-degree vertices. The paths within these regions would also

655: have similar degree sequences. Thus high assortative mixing can be related

656: to high degree symmetry, the first causing the second or vice

657: versa. They are, of course, not equivalent---e.g., the example network

658: with all vertices having positive symmetry coefficients

659: (Fig.~\ref{fig:ex}(b)) is maximally disassortatively mixed (in the

660: sense of Ref.~\cite{mejn:assmix}). Where the weak symmetry coefficients

661: of other networks come from is outside the scope of this

662: investigation. One possible explanation would be that functional

663: units~\cite{alon} might often be degree-symmetric centers.

664:

665:

666: \section{Summary and conclusions}

667:

668: We have derived a measure for a specific notion of symmetry in

669: networks---the property that the paths out from a vertex have

670: overlapping degree sequences. The measure is designed so that random

671: networks, conditioned only to have the same set of degrees as the

672: original network, have the value zero. We propose two versions of the

673: symmetry coefficient, the first being approximately zero for random

674: networks, the second requiring a randomization procedure (and thus

675: longer simulation time) but being more accurately zero for random networks. The measure was

676: evaluated on example graphs. We show that they are able to

677: detect vertices in degree-symmetric, and potentially functionally

678: meaningful positions in the human metabolic network. The average

679: degree-symmetry of various networks were also investigated. We found almost

680: all networks having a weakly positive degree coefficient. The

681: exceptions being the network of American airports and their

682: interconnections (having a negative degree-symmetry coefficient) and

683: one-mode projections of social affiliation networks (having rather

684: strongly positive values).

685: Our measure is not the first to be based on a the properties of paths

686: going out from a vertex. For example people have been using path

687: counts for assessing the functional similarity of pairs of

688: vertices~\cite{blondel:sim,simrank,our:sim}. In social network studies

689: such measures are commonly called ``ego-centric''~\cite{wf}.

690:

691:

692: Symmetry concepts have been successfully utilized in many field of

693: physics. We believe degree symmetry, and other classes of network

694: symmetries, will be a fruitful direction of future network

695: studies. Degree symmetry is in particular, we believe, an important

696: concept for networks where degree is strongly related to the function

697: of the vertex. Two open questions from this study is what causes the

698: rather ubiquitous weakly positive degree symmetries, and what process in the

699: airline decision making that causes the negative average symmetry

700: coefficient of the airline network.

701:

702:

703: \begin{acknowledgements}

704:   The author acknowledges financial support from the Wenner-Gren

705:   foundations and help with data acquisition from: Gerald Davis,

706:   Jean-Pierre Eckman, Michael Gastner, Mikael Huss, Beom Jun Kim,

707:   Sungmin Park and Martin Rosvall. This research uses data from Add

708:   Health, a program project designed by J. Richard Udry, Peter

709:   S. Bearman, and Kathleen Mullan Harris, and funded by a grant

710:   P01--HD31921 from the National Institute of Child Health and Human

711:   Development, with cooperative funding from 17 other

712:   agencies. Special acknowledgment is due Ronald R. Rindfuss and

713:   Barbara Entwisle for assistance in the original design. Persons

714:   interested in obtaining data files from Add Health should contact

715:   Add Health, Carolina Population Center, 123 W. Franklin Street,

716:   Chapel Hill, NC 27516--2524 (addhealth@unc.edu).

717: \end{acknowledgements}

718:

719: \begin{thebibliography}{10}

720:

721: \bibitem{ba:rev}

722: R.~Albert and A.-L. Barab\'{a}si.

723: Statistical mechanics of complex networks.

724: \textit{Rev. Mod. Phys}, 74:47--98, 2002.

725:

726: \bibitem{addh}

727: P.~Bearman, J.~Moody, and K.~Stovel.

728: Chains of affection: The structure of adolescent romantic and sexual

729:   networks.

730: \textit{American Journal of Sociology}, 110:44--91.

731:

732: \bibitem{blondel:sim}

733: V.~D. Blondel, A.~Gajardo, M.~Heymans, P.~Senellart, and P.~{Van Dooren}.

734: A measure of similarity between graph vertices: Applications to

735:   synonym extraction and web searching.

736: \textit{SIAM Rev.}, 46:647--666, 2004.

737:

738: \bibitem{davis}

739: G.~F. Davis, M.~Yoo, and W.~E. Baker.

740: The small world of the {A}merican corporate elite, 1982-2001.

741: \textit{Strategic Organization}, 1:301--326, 2003.

742:

743: \bibitem{doromen:book}

744: S.~N. Dorogovtsev and J.~F.~F. Mendes.

745: \textit{Evolution of Networks: From Biological Nets to the Internet and

746:   WWW}.

747: Oxford University Press, Oxford, 2003.

748:

749: \bibitem{eckmann:dialog}

750: J.-P. Eckmann, E.~Moses, and D.~Sergi.

751: Entropy of dialogues creates coherent structures in e-mail traffic.

752: \textit{Proc. Natl. Acad. Sci. USA}, 101:14333--14337, 2004.

753:

754: \bibitem{gui:meta}

755: R.~Guimer\`{a} and L.~A. {Nunes Amaral}.

756: Functional cartography of complex metabolic networks.

757: \textit{Nature}, 433:895--900, 2005.

758:

759: \bibitem{gui:mod}

760: R.~Guimer\`{a}, M.~Sales-Pardo, and L.~A. {Nunes Amaral}.

761: Modularity from fluctuations in random graphs and complex networks.

762: \textit{Phys. Rev. E}, 70:025101, 2004.

763:

764: \bibitem{ythan1}

765: S.~J. Hall and D.~Raffaelli.

766: Food web patterns: Lessons from a species-rich web.

767: \textit{Journal of Animal Ecology}, 60:823--842, 1991.

768:

769: \bibitem{pok}

770: P.~Holme, C.~R. Edling, and F.~Liljeros.

771: Structure and time evolution of an {I}nternet dating community.

772: \textit{Social Networks}, 26:155--174, 2004.

773:

774: \bibitem{hh:pfp}

775: P.~Holme and M.~Huss.

776: Role-similarity based functional prediction in networked systems:

777:   application to the yeast proteome.

778: \textit{J. Roy. Soc. Interface}, 2:327--333, 2005.

779:

780: \bibitem{our:ajou2}

781: P.~Holme, S.~M. Park, B.~J. Kim, and C.~R. Edling.

782: Korean university life in a network perspective: Dynamics of a large

783:   affiliation network.

784: e-print cond-mat/0411634.

785:

786: \bibitem{our:bio}

787: M.~Huss and P.~Holme.

788: Currency and commodity metabolites: Their identification and relation

789:   to the modularity of metabolic networks.

790: e-print q-bio/0603038.

791:

792: \bibitem{janson}

793: S.~Janson, T.~{\L}uczac, and A.~Ruci\'{n}ski.

794: \textit{Random Graphs}. Whiley, New York, 1999.

795:

796: \bibitem{simrank}

797: G.~Jeh and J.~Widom.

798: Sim{R}ank: {A} measure of structural-context similarity.

799: In \textit{Proceedings of the eighth ACM SIGKDD international conference

800:   on knowledge discovery and data mining}, pages 538--543, Edmonton, 2002.

801:

802: \bibitem{jeong:meta}

803: H.~Jeong, B.~Tombor, Z.~N. Oltvai, and A.-L. Barab\'{a}si.

804: The large-scale organization of metabolic networks.

805: \textit{Nature}, 407:651--654, 2000.

806:

807: \bibitem{our:sim}

808: E.~A. Leicht, P.~Holme, and M.~E.~J. Newman.

809: Vertex similarity in networks.

810: \textit{Phys. Rev. E}, 73:026120, 2006.

811:

812: \bibitem{martinez:rock}

813: N.~D. Martinez.

814: Artifacts or attributes? {E}ffects of resolution on the {L}ittle

815:   {R}ock {L}ake food web.

816: \textit{Ecological Monographs}, 61:367--392, 1991.

817:

818: \bibitem{milg:1}

819: S.~Milgram.

820: The small world problem.

821: \textit{Psycol. Today}, 2:60--67, 1967.

822:

823: \bibitem{mejn:assmix}

824: M.~E.~J. Newman.

825: Assortative mixing in networks.

826: \textit{Phys. Rev. Lett.}, 89:208701, 2002.

827:

828: \bibitem{mejn:rev}

829: M.~E.~J. Newman.

830: The structure and function of complex networks.

831: \textit{SIAM Review}, 45:167--256, 2003.

832:

833: \bibitem{mejn:wei}

834: M.~E.~J. Newman.

835: Analysis of weighted networks.

836: \textit{Phys. Rev. E}, 70:056131, 2004.

837:

838: \bibitem{pagel:mips}

839: P.~Pagel, S.~Kovac, M.~Oesterheld, B.~Brauner, I.~Dunger-Kaltenbach,

840:   G.~Frishman, C.~Montrone, P.~Mark, V.~St\"{u}mpflen, H.~W. Mewes, A.~Ruepp,

841:   and D.~Frishman.

842: The {MIPS} mammalian protein-protein interaction database.

843: \textit{Bioinformatics}, 21:832--834, 2004.

844:

845: \bibitem{roberts:mcmc}

846: J.~M. {Roberts Jr.}

847: Simple methods for simulating sociomatrices with given marginal

848:   totals.

849: \textit{Social Networks}, 22:273--283, 2000.

850:

851: \bibitem{rosv:city}

852: M.~Rosvall, A.~Trusina, P.~Minnhagen, and K.~Sneppen.

853: Networks and cities: An information perspective.

854: \textit{Phys. Rev. Lett.}, 94:028701, 2005.

855:

856: \bibitem{alon}

857: S.~Shen-Orr, R.~Milo, S.~Mangan, and U.~Alon.

858: Network motifs in the transcriptional regulation network of

859:   {E}scherichia coli.

860: \textit{Nature Genetics}, 31:64--68, 2002.

861:

862: \bibitem{wf}

863: S.~Wasserman and K.~Faust.

864: \textit{Social network analysis: Methods and applications}.

865: Cambridge University Press, Cambridge, 1994.

866:

867: \bibitem{cenn:brenner}

868: J.~G. White, E.~Southgate, J.~N. Thomson, and S.~Brenner.

869: The structure of the nervous system of the nematode {C}aenorhabditis

870:   elegans.

871: \textit{Phil. Trans. R. Soc. Lond. Ser. B}, 314:1--340, 1986.

872:

873: \bibitem{zhao:meta}

874: J.~Zhao, H.~Yu, J.~Luo, Z.~W. Cao, and Y.-X. Li.

875: Complex networks theory for analyzing metabolic networks.

876: e-print q-bio/0603015.

877:

878: \end{thebibliography}

879:

880:

881: \end{document}

882: