1: \documentclass[twocolumn,pre,aps,showpacs]{revtex4}
2:
3: \usepackage{dcolumn,graphicx,amsmath,amssymb,pxfonts}
4:
5: \newcommand{\mfr}{M_\mathrm{fr}}
6:
7: \begin{document}
8:
9: \title{On network bipartivity}
10:
11: \author{Petter \surname{Holme}}
12: \email{holme@tp.umu.se}
13: \affiliation{Department of Physics, Ume{\aa} University,
14: 901~87 Ume{\aa}, Sweden}
15:
16: \author{Fredrik \surname{Liljeros}}
17: \affiliation{Department of Epidemiology, Swedish Institute for
18: Infectious Disease Control, 171~82 Solna, Sweden}
19: \affiliation{Department of Sociology, Stockholm University, 106~91
20: Stockholm, Sweden}
21:
22: \author{Christofer R.\ \surname{Edling}}
23: \affiliation{Department of Sociology, Stockholm University, 106~91
24: Stockholm, Sweden}
25:
26: \author{Beom Jun \surname{Kim}}
27: \affiliation{Department of Molecular Science
28: and Technology, Ajou University, Suwon 442-749, Korea}
29:
30: \begin{abstract}
31: Systems with two types of agents with a preference for heterophilous
32: interaction produces networks that are more or less close to
33: bipartite. We propose two measures quantifying the notion of
34: bipartivity. The two measures---one well-known and natural, but
35: computationally intractable; one computationally less complex, but
36: also less intuitive---are examined on model networks that
37: continuously interpolates between bipartite graphs and graphs with
38: many odd circuits. We find that the bipartivity measures increase
39: as we tune the control parameters of the test networks to
40: intuitively increase the bipartivity, and thus conclude that the
41: measures are quite relevant. We also measure and discuss the values
42: of our bipartivity measures for empirical social networks
43: (constructed from professional collaborations, Internet communities
44: and field surveys). Here we find, as expected, that networks arising
45: from romantic online interaction have high, and professional
46: collaboration networks have low bipartivity values. In some other
47: cases, probably due to low average degree of the network, the
48: bipartivity measures cannot distinguish between romantic and
49: friendship oriented interaction.
50: \end{abstract}
51:
52: \pacs{89.75.Fb, 89.75.Hc, 05.50.+q}
53: %89.75.Fb Structures and organization in complex systems
54: %89.75.Hc Networks and genealogical trees
55: %05.50.+q Lattice theory and statistics (Ising, Potts, etc.)
56:
57: \maketitle
58:
59: \section{Introduction\label{sec:intro}}
60:
61: Any system, natural or man-made, consisting of entities that interact
62: pairwise can be described in terms of a network. Networks in the real
63: life often contain some degree of randomness, and has also some
64: structure arising from the strategies or laws the entities follow to
65: make new contacts. Such networks---that can only be described as having
66: both randomness and structure---are called complex networks and has
67: lately received much attention in the physicist
68: community~\cite{review,review2}. Among the most important developments
69: in this recent surge of activity in network research is arguably
70: the categorization and quantification of static network structures
71: such as clustering~\cite{WS}, degree distribution~\cite{sf},
72: assortative mixing coefficient~\cite{assmix}, grid
73: coefficient~\cite{grid}, etc. A network with no
74: circuit of odd length is called \textit{bipartite}. Many systems are
75: naturally modeled as bipartite networks: Biochemical networks can be
76: described by vertices representing chemical substances separated by
77: vertices representing chemical reactions~\cite{jeong}. As another
78: example, we have the so called ``two-mode'' representation of
79: affiliation networks where one kind of vertices represents e.g.\
80: organizations and the other type represents individual actors, and the
81: edges indicates to which organizations an actor belongs. But there are
82: also networks that are not necessarily bipartite, but closer to
83: bipartite than what can be expected from a completely random
84: network. Examples of such networks are those that are formed by two
85: types of agents with a preference for heterophilous interaction (human
86: sexual contacts~\cite{liljeros,lea}, and human romance or partnership
87: networks~\cite{partner} being two cases). In many cases one knows the
88: type of the individual vertices (the gender of the actors in the
89: examples above)~\cite{freeman}, but in other cases such information
90: might be lacking (the data studied in Ref.~\cite{HEL} for a concrete
91: example). Nevertheless, the `bipartivity'---how far away from being
92: bipartite a graph is---is a measurable structure; and therefore, we
93: believe, deserves attention.
94:
95: How can we measure bipartivity? The idea we use in this paper is the
96: following: We suppose that all agents of one type tried their best in
97: forming a connection to an agent of the other type. Then we measure to
98: what extent this assumption fail. We can assign a label
99: $\sigma_v\in\{-1,+1\}$ to each vertex $v$ and check for the maximal
100: fraction of edges between vertices of different sign. This fraction
101: will be equal to or higher than the actual fraction of edges between
102: vertices of different type. But, at least for strong heterophilous
103: preference in the network formation, the difference should be
104: small. For weak heterophilous preference this approach will likely
105: fail to produce a correct classification of the individual vertices.
106: Still, the number of even circuits should be larger than in a
107: network created under the same circumstances but with no heterophilous
108: preference; and this will (as we will see) give a lower value of such
109: a bipartivity measure. So even if we cannot reproduce the correct
110: fraction of vertices of different type, we have a measure that is a
111: monotonous function of the strength of the heterophilous
112: preference. It is convenient (at least for people familiar with
113: statistical mechanics) to phrase a problem like this in terms of the
114: antiferromagnetic Ising model. Our bipartivity measure---the maximal
115: fraction of edges between vertices of different sign---is directly
116: related to the ground state energy of the antiferromagnetic Ising
117: model (the relation is given in Sect.~\ref{sec:b1def}). Throughout the
118: paper we will often use the terminology of such spin systems, such as
119: the antiferromagnetic Ising model. For example we talk of an edge
120: between two vertices of the same tag as a `frustrated' edge.
121:
122: The spin system analogy to combinatorial optimization problems such
123: as the one we are facing---to find minimal fraction of frustrated
124: edges---is nothing new. With this approach the fraction of frustrated
125: edges defines a cost function corresponding to the energy of the spin
126: system. The two most studied problems in this area are the
127: $p$-coloring problem and the graph bisection problem. In the
128: $p$-coloring problem the question is whether or not the vertices of a
129: graph can be assigned one of $p$ colors in such a way that no edge
130: goes between two vertices of the same color. This problem is solvable
131: in linear time for $p=2$, but NP-complete (i.e.\ in the general case
132: not calculable in polynomial time~\cite{hope}) for $p>2$. The graph
133: bisection problem (also NP-complete) is to partition the vertex-set
134: into two sets of equal size such that the number of edges between the
135: two sets is minimized~\cite{jerrum,schreiber,FA}. Both these problems
136: can, just as ours, be phrased in terms of spin-models with
137: antiferromagnetic interaction. Our minimization problem is a little
138: bit different from the bisection problem in that the two sections can
139: have arbitrary sizes. However, as in the bisection and $p$-coloring
140: problems we are also faced with an NP-complete optimization
141: problem. (Our aim---to find the ground state energy of
142: antiferromagnetic Ising model can be mapped to a min-flow max-cut
143: problem~\cite{alava} which is NP-hard on general
144: networks~\cite{karp}.)
145:
146: As the spin models of statistical physics are familiar to statistical
147: physicists, it is not surprising that topics like the Ising and
148: \textsl{XY} models on various model networks~\cite{bw,spin} have received
149: much attention in physicists' network literature. The motivation for
150: such studies, as models of real-world systems, is that they can capture
151: some features of opinion formation or similar social
152: processes~\cite{socstatmech}. The present work can also be described
153: as a study of a spin model on a complex network, but unlike the above
154: mentioned studies, the spin model is used as a tool to measure a
155: static network structure.
156:
157: \section{The measures}
158: In the following sections we will go through the two bipartivity
159: measures. We state the definitions, dissect the algorithms and give
160: analytic discussions about the limit properties.
161:
162: We represent a undirected network by $G=(V,E)$ and a directed network by
163: $G_\mathrm{dir}=(V,A)$, where $V$ is the set of vertices, $E$ is a set
164: of edges (or undirected pairs of vertices), and $A$ is a set of arcs
165: (or ordered pairs of vertices). A \textit{path} of length $l$ is a
166: sequence of vertices $v_1,\cdots,v_l$ such that $(v_i,v_{i+1})\in E$
167: (or $(v_i,v_{i+1})\in A$ for directed graphs); a \textit{circuit} is a
168: path where the first and last vertex are identical. In an
169: \textit{elementary} path, or circuit, no vertex appears twice (except
170: the first and last in case of circuits). In the present paper we will
171: only talk about elementary paths and circuits---so, for brevity we omit
172: the word `elementary.' Throughout the paper, when necessary, we let
173: sub- or superscript `dir' denote directed versions of quantities. In
174: many cases the generalization from undirected to directed networks is
175: straightforward; in these cases we will pursue the discussion in the
176: framework of undirected networks.
177:
178: \subsection{The measure $b_1$}
179: \subsubsection{Definition\label{sec:b1def}}
180: The first measure we consider is simply the fraction of unfrustrated
181: edges in the ground state of the antiferromagnetic Ising model on the
182: network. In terms of the antiferromagnetic Ising model the quantity
183: can be written as
184: \begin{equation}
185: b_1 = 1-\frac{\mfr}{M}=\frac{1}{2}-\frac{E_0}{2M}~,\label{eq:b1}
186: \end{equation}
187: where $\mfr$ is the number of frustrated edges in the ground state
188: (the usual cost function in the two-coloring problem).
189: $E_0$ is the ground state energy
190: \begin{equation}
191: E_0=\min_{\{\sigma_v\}}~, H\label{eq:e0}
192: \end{equation}
193: where $H$ is the Hamiltonian of the antiferromagnetic Ising model:
194: \begin{subequations}
195: \begin{eqnarray}
196: H&=&\sum_{(v,w)\in E} \sigma_v\sigma_w\\
197: H_\mathrm{dir}&=&\sum_{(v,w)\in A} \sigma_v\sigma_w
198: \end{eqnarray}
199: \end{subequations}
200: The directed quantity is obtained by substituting $H$ by
201: $H_\mathrm{dir}$ in Eqs.~(\ref{eq:b1}) and (\ref{eq:e0}), and edges by
202: arcs in the above discussion. The topology of the energy landscape is
203: determined by the underlying network, and can in general be very
204: complex~\cite{barahona}.
205:
206: \subsubsection{Limit properties}
207:
208: The $b_1$ measure takes values in the interval $(1/2,1]$. The upper
209: bound is attained for bipartite graphs. It is easy to see that $b_1$
210: cannot be lower than $1/2$: Consider a ground state configuration for
211: which the opposite is true. Then there must be at least one vertex
212: with more than half of its edges frustrated. Flipping this spin
213: would reduce the energy, which contradicts the fact that the system is
214: in the ground state. We do not know if this bound is realized for any
215: finite graphs, but $b_1=1/2$ is the limit value for $b_1$ for a fully
216: connected graph as $N\rightarrow\infty$: Partition the fully connected
217: graph $K_N$ of $N$ vertices (and $M=N(N-1)/2$ edges) into one set of
218: $N'$ and one set of $N-N'$ vertices and assign opposite spins to the
219: elements of these sets. The number of frustrated edges is precisely
220: the number of edges within each set which is:
221: \begin{eqnarray}
222: \mfr(K_N)&=&\frac{N'(N'-1)}{2}+\frac{(N-N')(N-1-N')}{2}
223: \nonumber\\&=&M-N'(N-N')~.
224: \end{eqnarray}
225: Thus the minimum number of frustrated edges is exactly $N^2/4-N/2$ for
226: $N'=N/2$, and the fraction of unfrustrated edges is
227: \begin{equation}
228: b_1 = \frac{1}{2-2/N}\rightarrow\frac{1}{2}
229: \mbox{~as~} N\rightarrow\infty~.
230: \end{equation}
231: The above arguments can be generalized to directed networks
232: straightforwardly.
233:
234: \subsubsection{Minimization by exchange Monte Carlo}
235: The complexity of the ``energy landscape'' of the antiferromagnetic
236: Ising model on an arbitrary network is difficult to judge \textit{a
237: priori}. There are indications that no natural network would be
238: too hard for a regular simulated annealing
239: approach~\cite{simann,jerrum}. To be on safer ground, we use a Monte Carlo
240: scheme that is evidently very efficient to sweep even an extremely
241: `rugged' energy landscape without getting stuck in local minima---the
242: so called exchange Monte Carlo (XMC)~\cite{xmc}. The idea of exchange
243: Monte Carlo is to run standard Metropolis Monte Carlo for $N_T$
244: replicas of the system, each at a specific temperature. Then from time
245: to time two replicas at adjacent temperatures are compared, and with a
246: probability\begin{equation}
247: P_\mathrm{exch.}=\left\{\begin{array}{ll} 1 & \mbox{if $\Delta<0$}\\
248: e^{-\Delta} & \mbox{otherwise}\end{array}\right.~,
249: \end{equation}
250: where
251: \begin{equation}
252: \Delta=\left(\frac{1}{T}-\frac{1}{T'}\right)(E'-E)~,
253: \end{equation}
254: and $E$ is the energy of the configuration at temperature $T$
255: (similarly for $T'$ and $E'$), and $T<T'$.
256: the two replicas are swapped between the temperatures. This condition
257: is designed so that the Monte Carlo scheme preserves the Boltzmann
258: distribution. This is not decisive for us who are looking for the
259: ground state energy, rather that performing a proper sampling of the
260: configuration space, but anyway kept in our measurements. Besides just
261: running the XMC scheme we also periodically quench the system,
262: i.e.\ we sweep through all vertices of the network consecutively and flip
263: spins that lower the energy. The sweeps are continued until a sweep
264: with no spin-flips has occurred. For later reference we introduce the
265: notations $t_\mathrm{avg}$ for the total number of MC sweeps---we
266: refer to the number of MC sweeps as `time'---$t_\mathrm{quench}$ for
267: the time between each quench, $t_\mathrm{exch}$ for the time between
268: exchange trials, $t_\mathrm{measure}$ for the time between measurement
269: sweeps (where the energy is sampled).
270:
271: For the exchange Monte Carlo scheme to efficiently sample the
272: configuration space all replicas needs to tour the whole range of
273: temperatures in a reasonably short time. At the same time one would
274: not like the exchange trials, at any neighboring temperatures, to be
275: constantly affirmative---then the separation of the two temperatures
276: would be of no use. We follow Ref.~\cite{xmc} and choose the
277: temperature set
278: \begin{equation}
279: T_i=T_\mathrm{low}\left(
280: \frac{T_\mathrm{high}}{T_\mathrm{low}}
281: \right)^{(i-1)/(N_T-1)}~,
282: \end{equation}
283: where $1\leq i\leq N_T$ enumerates the replicas. $T_\mathrm{low}$ is
284: the lowest and $T_\mathrm{high}$ represent the highest temperatures
285: respectively. To find the actual parameter values (which will be
286: stated in Secs.\ \ref{sec:mod2} and \ref{sec:real_res}) one has to
287: check that the replicas travels throughout the temperature range with
288: reasonable exchange ratios for all temperature gaps.
289:
290: \begin{figure}
291: \centering{\resizebox*{8cm}{!}{\includegraphics{ex.eps}}}
292: \caption{Some graphs in the discussion of the $b_2$ quantity. The
293: coloring of the vertices minimizes $\mfr$. Black edges indicate
294: frustration. (a) An almost bipartite graph with many
295: triangles. (b) A graph where all odd-circuits contribute to the
296: frustration. (c) A graph were only the shortest circuits
297: contribute to the frustration.}
298: \label{fig:ex}
299: \end{figure}
300:
301: \subsection{The measure $b_2$}
302: Apart from finding an approximative value of $b_1$, one can also
303: define a quantity that is exactly solvable in polynomial time. Our
304: intention is in the first hand not to make a heuristic algorithm for
305: calculating $b_1$, but rather a quantity that captures the same
306: structure, i.e.\ that grows monotonously with $b_1$.
307:
308: That a graph contains no odd circuits is the defining property of
309: bipartiteness~\cite{intro}. It is thus natural that we base a
310: bipartivity measure on an odd-circuit count in some
311: way. Unfortunately, defining a quantity in this way becomes a little
312: bit more complicated than at first expected. One complication is
313: that a graph can be very close to bipartite and still contain many
314: odd-circuits (see Fig.~\ref{fig:ex}(a)). A way of dealing with this
315: problem is to mark as few edges as possible such that each odd circuit
316: contains at least one marked edge. In many cases a marked edge will
317: correspond to a frustrated edge of the ground state of the
318: antiferromagnetic Ising model. In Fig.~\ref{fig:ex}(a) only the upper,
319: horizontal edge needs to be marked. Another problem one faces is
320: how to deal with odd circuits of different length---in a network with
321: very few odd circuits a circuit of, say, length seven would contribute
322: as much to the global frustration of the network as a triangle (a
323: subgraph of three adjacent vertices---see
324: Fig.~\ref{fig:ex}(b)). But in many real networks the total length of
325: the odd circuits is very long (this is true for all networks we
326: measure, see Sect.~\ref{sec:rwn}), much larger than $M$, in these cases
327: the short circuits are in general the most important in determining
328: the ground state configuration. For example, in Fig.~\ref{fig:ex}(c)
329: $M=23$, and while we have 11 triangles, summing the lengths of all odd
330: circuits gives 218 (33 from the 11 triangles, 45 from the nine circuits of
331: length five, and so on). However, only the triangles contributes to the
332: ground state configuration in the sense that each triangle has the
333: same configuration as the ground state of an isolated triangle, while
334: all odd circuits of length larger than four (e.g.\ the periphery) has not
335: the best coloring for a circulant of that length. To deal with this we
336: need to weight short circuits higher than long. We will do this by
337: assigning a cut-off length and neglect all circuits exceeding this
338: length.
339:
340: \subsubsection{Definition\label{sec:def}}
341:
342: Now, we make an algorithm of the above ideas as follows: Let $C_n$ be
343: the set of odd circuits of length $\leq n$. Let $\Sigma(C_n)$ be the
344: accumulated length of the circuits in $C_n$ (so, for example
345: $\Sigma(C_3)=3$ in Fig.~\ref{fig:ex}(b)). Now we assign the cut-off
346: $3M$ to $\Sigma(C_n)$, and let $\hat{n}$ be the smallest $n$
347: such that $\Sigma(C_n)\geq 3M$. Next we turn to the marking procedure
348: sketched above. Let $\nu(e)$ denote the number of circuits in
349: $C_{\hat{n}}$ passing through the edge $e$. Clearly edges of
350: high $\nu$ are likely to be frustrated in the ground state
351: (viz.\ Fig.~\ref{fig:ex}(a)). We now estimate $\mfr$ roughly
352: as the number of edges that has to be marked so that each odd circuit
353: of length $\leq\hat{n}$ is marked at least once. To be precise we
354: perform the following algorithm:
355: \begin{enumerate}
356: \item Start with $C=C_{\hat{n}}$.
357: \item Sort the edges in order of $\nu$.\label{step:0}
358: \item Repeat the following while $C\neq\varnothing$:\label{step:1}
359: \begin{enumerate}
360: \item \label{step:a} Mark the edge $e$ with highest $\nu$.
361: \item \label{step:b} Remove all circuits in $C$ containing $e$.
362: \item \label{step:c} Recalculate $\nu$ for each edge.
363: \end{enumerate}
364: \end{enumerate}
365: Then the number of iterations $m'$ is the assessment of $\mfr$, and we
366: define our bipartivity measure as
367: \begin{equation}
368: b_2=1-\frac{m'}{M}~.
369: \end{equation}
370:
371: This algorithm is not an attempt to actually identify the frustrated
372: edges, rather it is supposed to give a high $\mfr$ for a system with
373: high (total) geometric frustration, and vice versa: Firstly, it does
374: not necessarily find the minimal number of edges needed to be marked
375: for all odd circuits of length less than $\hat{n}$ to contain a marked
376: edge. But we expect this steepest descent optimization to come close
377: in most cases. Secondly, an odd circuit can in reality only have an odd
378: number of frustrated edges, but in the algorithm there is no such
379: restriction on the number of marked edges.
380:
381: In case there are more than one edge with the highest $\nu$ (in step
382: \ref{step:a} of the algorithm) we choose the edge to mark at
383: random. The variance between different random seeds turns out to be
384: negligible in most cases. We will run the algorithm for different seeds to
385: choose the highest $b_2$ value, and get an idea about the error in
386: $b_2$ from the selection of edge to mark. An alternative (and more
387: ambitious) approach would be to iterate the whole calculation until
388: the highest $b_2$ has reappeared a fixed number of times (cf.\
389: \cite{ww}).
390:
391: If we assume a sparse network (i.e.\ $N\propto M$) the running time of
392: the algorithm above is $O(M^2)$. To see this we first note that there
393: can be at most $O(M)$ iterations at step~\ref{step:1}. To find the
394: edge with highest $\nu$ (in step~\ref{step:a}) we do not need to sort
395: all edges more than once (as done in step~\ref{step:0}). Instead we
396: can find this out while recalculating $\nu$ (in
397: step~\ref{step:c}). Removing all circuits containing $e$ (as in
398: step~\ref{step:b}) can be done in time bounded by the total length of
399: circuits containing $e$, which cannot be larger than
400: $3M$. Step~\ref{step:c} also needs to go through all circuits passing
401: $e$ and thus needs the same running time as step~\ref{step:b}. To sum
402: this up the running time for this section of the algorithm is of order
403: $N^2$.
404:
405: \subsubsection{Limit properties}
406:
407: In the $N\rightarrow\infty$ limit the $b_2$ measure lies in almost the
408: same interval as $b_1$. The upper limit $b_2=1$ is attained if and
409: only if the graph is bipartite. (If the graph is bipartite
410: $C_{\hat{n}}$ is empty and $\nu(a)=0$ for all $a$, so $m'=0$ and
411: $b_2=1$. If there exists odd circuits $m'\geq 0$, so $b_2<1$.) $b_2$
412: cannot be as low as 0 (if one marks all edges, every circuit must be
413: marked). Since the $b_2$-definition is inspired by the ground-state
414: configuration of the antiferromagnetic Ising model, we expect a
415: similar lower bound to $b_2$ as to $b_1$. In Appendix~\ref{sec:bound}
416: we argue that the lower bound on the $b_2$, as for the $b_1$ measure,
417: is $1/2$ in the $N \rightarrow \infty$ limit.
418:
419:
420: \subsubsection{The complete algorithm}
421: So far we have overlooked the central part in calculating the $b_2$
422: measure---namely to find odd circuits. To do this we use a modified
423: version of Johnson's algorithm~\cite{johnson}. In principle Johnson's
424: algorithm is a depth first search where, to avoid futile searching,
425: some vertices are blocked while stepping down the search tree. The
426: running time for Johnson's algorithm is $O(M(C+1))$ (if $M>N$) where
427: $C$ is the total number of circuits. Now $C$ can grow fast with
428: $N$ which would make the finding of all odd circuits a quite
429: intractable computation. In many cases the cut-off of the circuit length,
430: that we introduced above to give less priority to long circuits, saves
431: us by setting a limit on the search depth. To implement this we
432: let $\bar{n}$ be the current upper bound on circuit length (or search
433: depth), and $\bar{\Sigma}$ be the current sum of odd circuits
434: $\leq\bar{n}$. As soon as $\bar{\Sigma}\geq M$ we iteratively
435: decrease $\bar{n}$ by $2$ and recalculate $\bar{\Sigma}$ until
436: $\bar{\Sigma}<3$. If $\bar{\Sigma}< M$ when the search is over we
437: rerun the procedure where we use $\bar{n}+2$ as our new (fixed)
438: $\bar{n}$~\cite{note:alt}. When the search is over we assign
439: $\hat{n}$ the value $\bar{n}$. For dense bipartite graphs the
440: algorithm is intractable. In the worst case, the full bipartite
441: graph, $K_{N/2,N/2}$, there are
442: \begin{equation}
443: C(K_{N/2,N/2})= \sum_{k=4}^{N}\frac{1}{2k}
444: \left[\frac{(N/2)!}{(N/2-k/2)!}\right]^2
445: \end{equation}
446: circuits (where the sum is over even values of $k$)~\cite{note:bip}
447: giving a running time of $O(N^2C(K_{N/2,N/2}))$. One can of course
448: decide whether or not a graph is bipartite in linear time, but
449: non-bipartite cases of similar complexity are easily constructed (by,
450: e.g., adding an isolated triangle). In practice these worst cases are,
451: probably, very rare---a, relatively speaking, very low density of odd
452: circuits is needed to get a small $\hat{n}$---even in the real-world
453: network with highest bipartivity we have $\hat{n}=3$. In this case
454: ($\hat{n}=3$) all odd circuits are found in $O(M^2)$ time.
455:
456: Now we turn to a more complete description of the algorithm. Johnson's
457: algorithm takes the `least' (smallest in some enumeration) vertex in a
458: strongly connected subgraph as its starting point. To find strongly
459: connected components we use the algorithm in Ref.~\cite{SCC}. To sum
460: up, the algorithm reads:
461: \begin{enumerate}
462: \item Mark all vertices as unchecked.
463: \item While there are unchecked vertices, iterate the
464: following:\label{step:wh}
465: \begin{enumerate}
466: \item Pick an unchecked vertex $v$.
467: \item Find the largest strongly connected component $\Lambda_v$
468: containing $v$.
469: \item Set $\Lambda:=\Lambda_v$ and repeat the following steps as long
470: as $\Lambda\neq\varnothing$:
471: \begin{enumerate}
472: \item Pick the least vertex $u$ of $\Lambda$.
473: \item Call a subroutine implementing the modified Johnson's
474: algorithm. Recalculate $\bar{n}$ and add $C_{\bar{n}}$ to a list
475: $\mathcal{C}$. Delete circuits longer than $\bar{n}$ from $\mathcal{C}$.
476: \item Delete $u$ from $\Lambda$.
477: \end{enumerate}
478: \end{enumerate}
479: \item Set $\hat{n}:=\bar{n}$.
480: \item Run the algorithm described above (in Sect.~\ref{sec:def}) to
481: mark edges and calculate $b_2$.\label{step:calc}
482: \end{enumerate}
483: In all cases, step~\ref{step:wh} sets the limit on running time. As
484: mentioned, in most application we expect the running time of
485: step~\ref{step:wh} to be $O(M^2)$ (similarly to that of
486: step~\ref{step:calc}).
487:
488: \section{The Networks}
489:
490: \begin{figure}
491: \centering{\resizebox*{8cm}{!}{\includegraphics{mo.eps}}}
492: \caption{Construction of the test networks. (a) shows the
493: generalization of the ER model (Model 1). (b) shows interpolation
494: between quadratic and triangular lattices (Model 2). (c) shows the
495: model with predominantly longer circuits (Model 3). All models
496: are bipartite for $r_{1,2,3}=1$. Additional edges creates
497: odd circuits (frustration) for lower $r_{1,2,3}$-values. The black
498: lines illustrates these additional edges. The white and non-white
499: vertices symbolize a partition giving $b_1=1$ in the $r_{1,2,3}=1$
500: case (it is not meant to represent the optimal coloring when
501: $r_{1,2,3}<1$).
502: }
503: \label{fig:mo}
504: \end{figure}
505:
506: \subsection{Test networks with tunable bipartivity\label{sec:mod}}
507: To test and compare the $b_1$ and $b_2$ quantities we construct three
508: types of test networks where the bipartivity can be tuned by model
509: parameters. The principle behind all models is to start from
510: bipartite networks and add lesser or greater number of edges within a
511: partition to create odd circuits.
512:
513: One type (Model 1) is a quite straightforward generalization of the
514: Erd\"{o}s-Renyi (ER) model~\cite{ER}: We partition the vertices in two
515: disjoint sets of sizes $\tilde{N}$ and $N-\tilde{N}$. Then we add $r_1
516: M$ edges randomly between vertices of the different sets, and
517: $(1-r_1)M$ edges regardless of what set the vertices belongs to (see
518: Fig.~\ref{fig:mo}(a)). In this way we interpret $r_1$ as the
519: strength of the heterophilous preference in a model where bipartivity
520: is the only structural bias. The choice of vertex pairs
521: is done with randomness, the only restriction being that loops and
522: multiple edges are not allowed. If $r_1=0$ the model reduces to the ER
523: model, while for $r_1=1$ the networks are bipartite (cf.\
524: Ref.~\cite{nws}). This model is probably the most random (i.e.\
525: having least structural biases) model with tunable bipartivity. The
526: disadvantage is that the expectation values of $b_1$ and $b_2$ are
527: hard to calculate (even in the frustrated limit $r_1=0$).
528:
529: Model 2 interpolates between two-dimensional square- and triangular
530: lattices. We start, for $r_2=0$, with a triangular grid with periodic
531: boundary condition. Let $L$, the linear dimension of the system (i.e.\
532: $N=L^2$), be even. For a non-zero parameter value we (by uniform
533: randomness) delete $r_1L^2$ `diagonal' edges creating frustration as
534: illustrated in Fig.~\ref{fig:mo}(b). To be more precise, if we index
535: the vertices as $(i_x,i_y)$, $1\leq i_x,i_y\leq L$; then the edges are
536: $[(i_x,i_y),(i_x+1,i_y)]$ and $[(i_x,i_y),(i_x,i_y+1)]$ (giving the
537: square grid) plus $r_1L^2$ edges of the form
538: $[(i_x,i_y+1),(i_x+1,i_y)]$ chosen by uniform randomness (addition is
539: modulo $L$). This model has a high degree of short circuits. The
540: extremes $r_2=0$ and $r_1=1$ represent two generic lattice types. The
541: symmetries of the regular networks simplify the calculations of
542: e.g.\ limit properties for the bipartivity measures. If $r_2=1$ the
543: system is bipartite (note that $L$ has to be even for this to hold) so
544: $b_{1,2}=1$. When $r_2=0$ we have $b_1=b_2=2/3$: For the lower limit
545: of the $b_1$ quantity, see Ref.~\cite{WAHO}. For the lower limit $b_2$
546: we note that $\Sigma(C_3)=6N$ (since each vertex can be associated
547: with two triangles). This gives $\hat{n}=3$ and $\nu=2$ for all
548: edges. Now it is enough to mark $N$ edges (e.g.\ all
549: $[(i_x,i_y),(i_x+1,i_y)]$ edges). In this case we note that each edge
550: will have $\nu=2$ when it is marked, which means that the marking
551: sequence is optimal and that the number of iteration cannot be less
552: with another choice of edges to mark. So $b_2=1-N/3N=2/3$. The major
553: disadvantage with Model 2 is that the average degree is a function of
554: $r_2$ ($M=(3-r_2)L^2$). This change in the average degree can make it
555: harder to separate effects of the shift in bipartivity from the shift
556: in average degree.
557:
558: In both model 1 and (even more) model 2 triangles will dominate
559: the set of odd circuits. To test networks with predominantly longer
560: circuits we construct a Model 3 as follows (see Fig.~\ref{fig:mo}(c)):
561: We make two circulants of size $N/2$ with the vertices
562: $\{v_1^i,\cdots,v_{N/2}^i\}$ and edges
563: $\{(v_1^i,v_2^i),\cdots,(v_{N/2-1}^i,v_{N/2}^i), (v_{N/2}^i,v_1^i)\}$,
564: $i\in\{1,2\}$. Then we add $M_\mathrm{trans}$ transverse edges between
565: the circulants. $M_\mathrm{trans}/2$ of these edges are placed out
566: separated by equal distance $N/M_\mathrm{trans}$ separating the double
567: circulants into $M_\mathrm{trans}/2$ `sectors.' Then we fill up each
568: sector with another transverse edge: With probability $r_3$ we add an
569: $(v^1_i,v^2_i)$ edge (such that $(v^1_i,v^2_i)$ is none of the
570: previously added transverse edges), otherwise we add a
571: $(v^1_i,v^2_i+1)$ edge (addition modulo $N/2$). We note, to a first
572: approximation, that if $r_3=0$ marking (in the process of calculating
573: $b_2$) one edge between every transverse edge on one of the circulants
574: is needed to mark the shortest odd circuits. This will make $b_2\in
575: O(1-M_\mathrm{trans}/N)$.
576:
577: \subsection{Real-world networks\label{sec:rwn}}
578: Physicists' networks studies has, in the spirit of statistical mechanics,
579: emphasized the properties remaining when the system grows beyond any
580: limit. Bipartivity, as discussed above, is well defined for all system
581: sizes. Still it is a quantity that can potentially suffer from
582: finite-size effects (from the fact that not all real neighbors of all
583: actors in a empirically constructed social network are a part of the
584: graph) and is therefore preferably measured for large networks. Now
585: the problem is to find data for large-scale real-world networks of
586: social interaction. In general two methods has been successful for this
587: purpose---one either uses professional collaborations of some sort or
588: data from interaction over the Internet (either in Internet
589: communities~\cite{HEL,smith}, or through email exchange~\cite{ebel}.
590:
591: \subsubsection{Professional collaboration networks}
592: In the professional collaboration networks we study the
593: vertices are professionals of some field---networks of scientists
594: and company directors are considered in this papers, the movie-actor
595: network is another frequently studied example; the edges represent
596: that two actors has been involved in the same professional
597: collaboration. This is some-times referred to as a ``one-mode''
598: representation of an affiliation network (as opposed to the bipartite
599: two-mode representation discussed in Sect.~\ref{sec:intro}).
600:
601: Professional collaboration networks are no doubt interesting in their
602: own right as accounts for the interaction dynamics of the respective
603: fields. Assuming that the formation of professional ties follow
604: similar principles as general human interaction, we can use
605: professional collaboration networks to draw conclusions about the
606: structure of more general social networks. However, at one point (at
607: least) professional collaboration differs from general social
608: interactions: A collaboration tie does not necessarily imply a strong
609: personal acquaintance, but in these networks each collaboration
610: constitutes a fully connected cluster. This leads to higher fraction
611: of short circuits than, say, a friendship network.
612:
613: One of the professional collaboration network we use is of
614: scientists who has uploaded manuscripts to the preprint repository
615: arxiv.org. Two scientists are linked if their name (identified by
616: surname and initials) appear together on at least one preprint. A
617: detailed description of this network can be found in
618: Ref.~\cite{newman3}. In the other professional collaboration network
619: the vertices represent company directors from the Fortune top 1000
620: list of companies in USA the year 2001. An edge (collaboration) in
621: this network means that two directors are sitting in board of the same
622: company. A detailed description of this network can be found in
623: Ref~\cite{davis}. Sizes of the networks can be seen in
624: Table~\ref{tab:b}.
625:
626: \subsubsection{Online interaction networks}
627:
628: In online interaction networks, the vertices are users of Internet
629: communities and an arc (A,B) is added if A contacts B, or
630: if A adds B to his/her list of friends~\cite{smith,HEL}. Another kind
631: of online interaction networks are email networks~\cite{ebel}, where
632: an arc can be assigned if an email is sent, or if a person adds
633: another to his/her address book. Just as for professional collaboration
634: networks, one can argue that online interaction networks are
635: representative as general social networks. One can assume that new
636: contacts are formed through preference-matching searches to a larger
637: extent, and introduction by mutual friends to a lesser extent, than in
638: general friendship networks. Since the introduction of mutual friends
639: to each other is believed to be the major cause of high clustering
640: (large density of triangles, or, large transitivity)~\cite{newman1}
641: one can expect a lower clustering in networks of online interaction
642: (still the clustering in these network seems to be finite in the
643: $N\rightarrow\infty$ limit~\cite{HEL}).
644:
645: The specific online interaction networks we consider are constructed
646: from the Internet communities nioki.com and pussokram.com. The
647: nioki.com data is described in Ref.~\cite{smith}. In this data an arc
648: (A,B) means that B is listed as a friend by A, which
649: allows A to see if B is online and send instant messages to B. In
650: the pussokram.com data the arcs correspond to communication between
651: the users. There are four different types of communication in this
652: specific network (all described in detail in Ref.~\cite{HEL}). We use
653: the networks obtained from two types of interaction (`messages'---like
654: ordinary emails within the community, and ``guest book''---where one user
655: contacts another by writing in his/her guest book), and the network
656: of any of the four types. Network sizes can be found at
657: Table~\ref{tab:b}.
658:
659: Another large difference between the pussokram.com and nioki.com data
660: is that the former community has a very pronounced romantic profile,
661: encouraging flirts and romantic correspondence. nioki.com has also a
662: search engine to ``trouve l'amour'' (find love), but that is all.
663:
664: Apart from the two Internet communities, we study another type of
665: online interaction network based on the flow email. For this network
666: all in- and out-going email traffic to a server was logged for around
667: three months~\cite{ebel}. The server handles undergraduate students'
668: email accounts at Kiel University, Germany. Thus there are two
669: categories of vertices---internal vertices, whose activity is accurately
670: mapped; and external vertices, that only have edges leading to internal
671: vertices. In this study we restrict ourselves to the network of
672: internal-internal contacts. The reason we do not include external
673: contacts is that we would miss the (probably many) circuits containing
674: external-external edges which would bias the bipartivity.
675:
676: \subsubsection{Network from interview and field survey\label{sec:soc}}
677: Apart from the above networks, all obtained from databases, we also
678: measure the bipartivity of two networks obtained from interview and
679: field surveys. The first data set is gathered by observations of
680: interaction between members of a university karate
681: club~\cite{zach}. We also study the network of acquaintance ties in a
682: prison~\cite{prison}. The outgoing arcs from A corresponds to
683: prisoners listed by A in response to the question: ``What fellows on
684: the tier are you closest friends with?'' Due to their acquisition
685: methods these kind of real-world networks has to be rather small. This
686: can, as mentioned, result in finite size effects. On the other hand
687: they, most likely, more truly reflect the structure of real
688: acquaintance networks.
689:
690: \section{Results}
691:
692: In this section we present the results of the test networks and the
693: measurement for the real-world social networks.
694:
695: \subsection{Test networks\label{sec:mod2}}
696:
697: As expected, both $b_1$ and $b_2$ are monotonously increasing as
698: functions of the $r_1$, $r_2$ and $r_3$ parameters of
699: (almost~\cite{note:not_really} all our test network (see
700: Fig.~\ref{fig:mx})). This is encouraging and suggests that both
701: $b_1$ and $b_2$ are quite relevant measures of bipartivity.
702:
703: The Model 1 measurements shown in Fig.~\ref{fig:mx}(a) are made with the
704: model parameters $N=2\tilde{N}=100$ and $M=800$. We have checked many
705: other sizes too, but all have the characteristic appearance of
706: Fig.~\ref{fig:mx}(a)---a linear increase of $b_1$ and $b_2$ for larger
707: $r_1$ and an flatter slope for $r_1$ close to zero. This shape is
708: expected from the discussion in Sect.~\ref{sec:intro}---in networks
709: where a heterophilous preference is the only structure-inducing
710: force, only the strong preference limit gives a strong measurable
711: effect: Close to the ER limit $r_1\approx 0$, the original two
712: partitions will not be identified correctly, only when the different
713: partition (to a large extent) have different sign the bipartivity will
714: be proportional to the strength of the heterophilous preference.
715:
716: As seen in Fig.~\ref{fig:mx}(b) Model 2 shows an almost linear
717: functional form of $b_{1,2}(r_2)$. In this case, triangles dominate
718: the odd circuits even at small values of $r_2$. Tuning $r_2$ will give
719: a proportional increase of the number of triangles. Thus a linear
720: $r_2$ dependence of $b_2$ would be expected.
721:
722: Also Model 3 has linear $b_{1,2}$ vs.\ $r_3$ curves. The model
723: parameters used are $N=100$ and $M_\mathrm{trans}=10$. As mentioned in
724: Section~\ref{sec:mod}, we expect $b_2\approx M_\mathrm{trans}/N$ for
725: $r_3=0$, which is confirmed in Fig.~\ref{fig:mx}(c).
726:
727: The measurements for both $b_1$ and $b_2$ are averaged over 100
728: network realizations. The XMC scheme for the $b_1$ quantity is ran at
729: 24 temperatures in parallel, between temperatures $0.01$ and
730: $2$. Other network parameters are $t_\mathrm{avg}=4\times 10^5$,
731: $t_\mathrm{measure}=4$, $t_\mathrm{quench}=20$ and
732: $t_\mathrm{exch}=1000$. These are more modest parameter values than we
733: will use for the real-world networks, but the test networks are also
734: much smaller, and since the distribution of $b_1$ and $b_2$ are
735: (likely) symmetric, the network average helps to reduce the error.
736:
737: \begin{figure}
738: \centering{\resizebox*{8cm}{!}{\includegraphics{mx.eps}}}
739: \caption{The bipartivity measures versus the model parameters of the
740: two models defined in Section~\protect\ref{sec:mod}. (a) shows the
741: result for Model 1, (b) shows the result for Model 2, and (c) shows
742: the result for Model 3. All error bars would be smaller than the
743: symbol size. The monotonous growth of the bipartivity measures shows
744: that the measures behaves expectedly.}
745: \label{fig:mx}
746: \end{figure}
747:
748: \begin{table*}
749: \caption{Sizes, clustering coefficients and bipartivity measures
750: $b_1$ and $b_2$ for real-world social networks.}
751: \label{tab:b}
752: \begin{ruledtabular}
753: \begin{tabular}{l|rrr|dddd|dddd}
754: network & $N$ & $M_\mathrm{dir}$ & $M$ & C_\mathrm{dir} & C &
755: D_\mathrm{dir} & D & b_1^\mathrm{dir} & b_1 &
756: b_2^\mathrm{dir} & b_2 \\\hline
757: all contacts & $29{\:}341$ & $174{\:}662$ & $115{\:}684$& 0.012 &
758: 0.0060 & 0.016 & 0.017 & 0.859 & 0.860 & 0.948 & 0.928\\
759: messages & $20{\:}691$ & $73{\:}346$ & $52{\:}435$& 0.0052& 0.0061 &
760: 0.0081 & 0.0061 & 0.897 & 0.892 & 0.984 & 0.964\\
761: guestbook & $21{\:}545$ & $76{\:}257$ & $55{\:}076$& 0.014 & 0.014 &
762: 0.015 & 0.021 & 0.863 & 0.889 & 0.943 & 0.965\\
763: nioki.com & $50{\:}259$ & $405{\:}742$ & $239{\:}452$& 0.0076 &
764: 0.0065 & 0.016 & 0.013 & 0.842 & 0.855 & 0.956 & 0.975\\
765: emails & 637 & 554 & 443& 0.11 & 0.16 & 0.071 & 0.14 & 0.944 & 0.944
766: & 0.971 & 0.941 \\
767: arxiv.org & $52{\:}909$ & $\times$ & 490{\:}600 & \times & 0.45 &
768: \times & 0.35 & \times & 0.630 & \times & 0.623\\
769: directors & 7${\:}475$ & $\times$ & 48{\:}899 & \times & 0.21 &
770: \times & 0.37 & \times & 0.549 & \times & 0.507\\
771: karate club & 34 & $\times$ & 78& \times & 0.26 & \times & 0.26 &
772: \times & 0.782 & \times & 0.782 \\
773: prison & 64 & 182 & 85& 0.19 & 0.31 & 0.089 & 0.14 & 0.786 & 0.878 &
774: 0.918 & 0.847
775: \end{tabular}
776: \end{ruledtabular}
777: \end{table*}
778:
779: \subsection{Real-world social networks\label{sec:real_res}}
780:
781: Now we turn to the result for the bipartivity measures of real-world
782: networks. The values are presented in Table~\ref{tab:b}. For
783: comparison we also give values for the clustering coefficient (density
784: of triangles) $C$ and the density of squares $D$ in both directed and
785: undirected versions~\cite{note:cd}. Undirected networks are constructed
786: by taking the reflexive closure. At first glance at the table
787: we arrive at the pleasing conclusion that the bipartivity for the
788: pussokram.com networks is very high (as expected from a network of
789: romantic interaction of mostly heterosexuals). But disappointingly,
790: the bipartivity measures show similarly high values for the nioki.com
791: and email networks. This can be explained by the fact that nioki.com,
792: just like the pussokram.com, data has very low $C$ and $D$ values, and
793: presumably very few circuits at all. Now branches (subgraphs without
794: circuits that can be isolated by cutting one edge) does not give a
795: positive contribution to either $b_1$ or $b_2$, no matter of the
796: gender of the agents. The email network do have a high clustering, but
797: still rather high bipartivity. The reason is that the email network is
798: rather heavily fragmented and contains many isolated subnetworks of
799: two vertices and one edge, and three vertices and two edges. Such
800: subnetworks does not affect the clustering coefficient but tends to
801: decrease the bipartivity measures~\cite{note:improvement}.
802:
803: The collaboration networks consist of a number of fully connected
804: clusters (corresponding to a specific collaboration) that are
805: interconnected. It is thus natural that we see low bipartivity and a
806: high density of short circuits. The lower bipartivity values for the
807: company director network can be explained by smaller average size of such
808: fully-connected clusters: The average number of vertices per
809: collaboration is $9.5$ for the corporate director network and $2.5$
810: for the scientific collaboration data~\cite{davis,newman3}.
811:
812: The two small networks constructed from field surveys (the ``karate
813: club'' and ``prison'' network of Table~\ref{tab:b}, discussed in
814: Section~\ref{sec:soc}) show mid-range bipartivities and relative high
815: values of $C$ and $D$. From the above discussion we can expect that
816: the bipartivity of large, real, acquaintance networks is somewhere
817: between those of the collaboration networks and the Internet community
818: networks (because they probably have higher clustering than Internet
819: community networks, and lower number of fully connected clusters than
820: the collaboration networks). Encouraging enough, this is exactly what
821: we see in Table~\ref{tab:b}. Of course, the very small systems sizes
822: might affect the results, but that the bipartivity measures of
823: real-world acquaintance measures would be close to either the upper or
824: lower limits seems hard to believe.
825:
826: We conclude this section by a note on the parameters for the XMC
827: optimization. The measurement of $b_1$ for all real-world network
828: (except the nioki.com data where we study the convergence more
829: carefully) are done just once with the following simulation
830: parameters, $N_T=24$ (with temperatures from $0.002$ to $5$)
831: $t_\mathrm{avg}=1\times 10^7$, $t_\mathrm{measure}=16$,
832: $t_\mathrm{quench}=40$ and $t_\mathrm{exch} = 2\times 10^4$.
833:
834: \section{Summary and discussion}
835:
836: This paper concerns the quantification of the network structure
837: `bipartivity'---how close to bipartite a given graph is. We propose
838: two measures for this quantity. One quantity $b_1$ based on the
839: optimal two-coloring of the network---or, equivalently, the ground
840: state of the antiferromagnetic Ising model on the network. The
841: exact value of this quantity (that has been used in different roles
842: elsewhere) is NP-complete and thus in general not feasible
843: to calculate exactly. Instead we seek an approximate solution by a
844: simulated annealing approach. The simulated annealing is based on the
845: exchange Monte Carlo scheme. We argue that this unorthodox
846: minimization method helps us avoid local minima of the energy
847: landscape of the antiferromagnetic Ising model. Furthermore we develop
848: a measure $b_2$ based on the count of odd circuits that, for almost
849: all networks, is calculable in polynomial time.
850:
851: We propose three different random graph test models where one can
852: interpolate between arguably non-bipartite and bipartite graphs by
853: tuning a control parameter. Both our bipartivity measures are shown to
854: increase monotonically with tuning the control parameters towards the
855: bipartite extreme. From this we conclude that the bipartivity measures
856: really quantify the notion of bipartivity.
857:
858: By considering example networks we infer that bipartivity is a
859: structure that cannot be measured by currently popular structural
860: measures, such as the clustering coefficient. At the same time any
861: sensible quantification of bipartivity probably has to have a positive
862: correlation with the clustering coefficient for most networks (with
863: exceptions for exotic cases like Fig.~\ref{fig:ex}(a))---so, in that
864: case bipartivity and clustering is not independent.
865:
866: We measure $b_1$ and $b_2$ of a number of real-world networks,
867: constructed from online interaction, professional collaborations, and
868: field surveys. As expected, we see high bipartivity values for data
869: from the Internet community pussokram.com, where romantic contacts
870: are encouraged, and hence a high degree of heterophilous interaction
871: expected. We also see the expected low bipartivity values for the
872: professional collaboration and empirical acquaintance networks we
873: study. Disappointingly we cannot use our bipartivity measures to
874: distinguish between the networks driven by romantic or friendship (or
875: professional) contacts. To do this other structures and the network
876: sizes has to be taken into account, in a more elaborate analysis (that
877: is out of the scope of this study).
878:
879: So far our examples of networks with high bipartivity has been
880: romantic networks and networks of sexual contacts. Network-based
881: studies of sexually transmitted diseases~\cite{lea} is a potentially
882: interesting area for bipartivity measures, as the transmission rates
883: for homosexual and heterosexual contacts differ~\cite{anma}. Apart
884: from romantic and sexual networks, there are other areas where the
885: bipartivity measure may prove useful: One can consider a trade network
886: where some agents are more or less pronounced sellers and others are
887: primarily buyers (cf.\ Ref.~\cite{white}), such networks would not
888: have a neutral bipartivity. Another application is for the
889: `genealogical' network of a disease outbreak: Some contagious diseases
890: have a relatively stable duration between when an individual is
891: infected and when he or she becomes infectious. Epidemics of these
892: types of diseases can therefore roughly be divided into different
893: generations of infected individuals~\cite{anma}. A network
894: consisting of possible edges of infections, for an outbreak of this
895: type of disease, should therefore have very few odd-length
896: circuits. The reason is that the infection is only transmitted between
897: succeeding generations, which generates only circuits of even length
898: (in the reflexive closure of the network). When reconstructing the
899: paths this kind of disease has taken in a population, a minimization
900: of the bipartivity measures can be a method for excluding redundant
901: infectious edges.
902:
903: We conclude by an analogy to linear algebra---we have identified a new
904: dimension (structure) and proposed base vectors (measures), that
905: unfortunately are not orthogonal to the other dimensions.
906:
907: \section*{Acknowledgements}
908: We would like to thank Niklas Angemyr, Stefan Bornholdt, Gerald Davis,
909: Holger Ebel, Michael Lokner, Stefan Praszalowicz, and Christian
910: Wollter for help with data acquisition; and Johan Giesecke, James
911: Moody, Mats Nyl\'{e}n, and Pontus Svenson, for comments and
912: suggestions. P.H.\ was partly supported by the Swedish Research
913: Council through contract no.\ 2002-4135. F.L.\ was supported
914: by the National Institute of Public Health. C.R.E.\ was supported by
915: the Bank of Sweden Tercentenary Foundation. B.J.K.\ was supported by
916: the Korea Science and Engineering Foundation through Grant No.\
917: R14-2002-062-01000-0.
918:
919: \appendix
920:
921: \begin{figure}
922: \centering{\resizebox*{8cm}{!}{\includegraphics{m2.eps}}}
923: \caption{Marking of edges (in matrix representation) while
924: calculating the $b_2$ quantity for a fully connected graph. `$-1$'
925: means that $\nu$ at that position is decreased by one unit, `$=0$'
926: means that $\nu=0$ at that position.
927: }
928: \label{fig:ma}
929: \end{figure}
930:
931: \section{The lower bound of the measure $b_2$\label{sec:bound}}
932:
933: In this Appendix we argue that, in the $N\rightarrow \infty$ limit,
934: the lower bound for $b_2$ is $1/2$ (just like $b_1$). First we
935: conjecture that the minimal value for $b_2$, just as for $b_1$, is
936: attained for complete graphs. (This will be further motivated below.)
937:
938: To assess $b_2$ for complete graphs, we note that~\cite{note:sigsum}
939: \begin{subequations}
940: \begin{eqnarray}
941: \Sigma(C_n)&=&\sum_{\mathrm{odd}\;3\leq i\leq n}
942: \frac{N!}{2(N-i)!}~\Rightarrow\label{eq:sigsum}\\
943: \Sigma(C_3)&=&\frac{N(N-1)(N-2)}{2}\geq\nonumber\\&\geq&
944: \frac{N(N-1)}{2}=M~,
945: \end{eqnarray}
946: \end{subequations}
947: so $\hat{n}=3$ which results in that $\nu=N-2$ for each edge.
948:
949: Now we apply the marking procedure of Sec.~\ref{sec:def}. Marking an
950: edge $(u,v)$ makes $\nu(u,v)= \nu(v,u)=0$. Furthermore, every edge
951: $(u,w)$ and $(v,w)$ ($w\neq u,v$) will be decreased by one since the
952: triangle $\{u,v,w\}$ now contains a marked edge. The discussion will
953: be simplified by considering a matrix representation of $\nu
954: (u,v)$. Marking $(u,v)$ sets $\nu(u,v)=\nu(v,u)=0$ and decreases the
955: $u$'th and $v$'th columns, and $u$'th and $v$'th rows by one (an
956: example is given in Fig~\ref{fig:ma}(a)). Marking another edge
957: $(u',v')$ ($u'$ and $v'$ are different from both $u$ and $v$,
958: otherwise $\nu(u',v')$ would not be maximal) will have the same effect
959: as marking the first. For positions like $(u,v')$ the original $\nu$
960: are decreased by 2 (see Fig.~\ref{fig:ma}(b)), since it has lost the
961: two passing triangles $\{u,u',v'\}$. and $\{v',u,v\}$. Continuing this
962: process we see that it takes $N/2+O(1)$ markings for $\nu$ of each
963: edge to be decreased by two units, and thus $m'=N^2/4+O(N)$ markings to
964: make $\nu=0$ for all edges. This gives $b_2=1/2$ in the $N\rightarrow
965: \infty$ limit. Since the appropriateness of $b_2$ as a bipartivity
966: measure is not really dependent on the limit values, we will not give
967: a rigorous proof that the correction is of a lower order for all
968: levels of the marking procedure (one level is the $N/2+O(1)$ edges
969: needed to be marked for $\nu$ to be decreased by at least two units
970: for each edge).
971:
972: \begin{figure}
973: \centering{\resizebox*{7.5cm}{!}{\includegraphics{co.eps}}}
974: \caption{The current value of $b_1$ (at the lowest-temperature level
975: of the cooling) as a function of running time for ten independent
976: measurements of the directed version of the nioki.com data.
977: }
978: \label{fig:co}
979: \end{figure}
980:
981: Now we argue that the $b_2$ takes its minimal value for complete
982: graphs. First we note that the number of circuits of length $n$ per
983: edge, for any $n$, is largest in a complete graph~\cite{review2}. So
984: if we set $\hat{n}$ arbitrarily and discard circuits of length $\leq
985: n$ in the calculation of $\nu(v)$, the fully connected graph would
986: give the highest $m'$ value and thus the lowest bipartivity
987: measure. The strongest candidate for a lower bipartivity measure than
988: that of a fully connected graph would thus be a graph such that the
989: $\Sigma(C_n)< 3M$ and $\Sigma(C_{n+2})$ is as big as possible for some
990: $n$. But the number edges needed to be removed from a fully connected
991: graph for $\Sigma(C_n)< 3M$ to hold, not only reduces the contribution
992: to $\nu$ from circuits of length $n$ but also from circuits of length
993: $n+2$ to a similar extent. If one performs the approximate marking
994: procedure outlined above for circuits of length five one starts from
995: $\nu=(N-2)(N-3)(N-4)$ and it takes $N/2+O(1)$ markings to decrease
996: every $\nu$ with at least $2N^2$. This means that the number of edges
997: needed to be marked to make $\nu = 0$ for every edge is the same if
998: circuits of length five is considered. It also means that a graph as
999: outlined above (with $\Sigma(C_n)< 3M$ and $\Sigma(C_{n+2})$ is as big
1000: as possible) probably do not have a lower $b_2$ than a complete graph.
1001:
1002: To epitomize, the $b_2$ measure lies in the interval $[1/2,1]$ in the
1003: $N\rightarrow \infty$ limit. The finite size corrections to $b_2$ for
1004: fully connected graphs, however, turns out to make $b_2$ slightly less
1005: than $1/2$.
1006:
1007: \section{Convergence of the simulated annealing\label{sec:simann}}
1008:
1009: To analyze the convergence of the simulated annealing scheme we run
1010: ten independent calculations of the $b_1$ quantity (with the same
1011: parameter values as in Sect.~\ref{sec:real_res}). The individual time
1012: evolutions of $b_1$ (at the lowest temperature $T=0.002$) for the
1013: different runs are shown in Fig.~\ref{fig:co}. We note that already
1014: after the first quench $b_1$ is only $3\%$ away from the value at the
1015: end of the run, and after 50 time steps $b_1$ is $0.5\%$ of the value
1016: after $1\times 10^7$ time steps. We note that there is no way of
1017: constructing a statistically valid confidence interval for the true
1018: $b_1$ value since an arbitrary complex energy landscape could have a
1019: global minimum with a basin of attraction of measure zero. There are
1020: however indications that this is seldom a major problem, at least not
1021: for the bisection problem~\cite{jerrum}.
1022:
1023: An interesting observation from Fig.~\ref{fig:co} is the step-like
1024: structure. This is a result of the exchange trials: After $t\approx
1025: 100$ the local minimum has been found, but at the temperature in
1026: question the system is in principle stuck in a confined part of the
1027: configuration space, and cannot enter lower lying energy
1028: valleys. In the time scale $t = 10^5$ there is another jump in the
1029: $b_1$ value. This is related to that other replicas from other parts
1030: of the configuration space reaches the lowest level. At around
1031: $t=10^6$ the current highest $b_1$ values (lowest energy) reaches
1032: another plateau. At this time, each replica should have covered the
1033: whole temperature range several times. This second plateau gives two
1034: encouraging implications: Firstly, that the correct value of $b_1$
1035: probably is not very far off the measured value. Secondly, that the
1036: exchange steps really are helpful. If one wants to run this algorithm
1037: more efficiently the $t_\mathrm{exch}$ we use is far too large (but
1038: beneficial for separating the time scales in the discussion
1039: above). Ideally $t_\mathrm{exch}$ should probably be chosen to be of
1040: the same order as the first jump (from the regular Monte Carlo
1041: steps)---in the nioki.com network (displayed in Fig.~\ref{fig:co})
1042: this would be $t\approx 100$.
1043:
1044: \begin{thebibliography}{99}
1045: \bibitem{review} S.~H.\ Strogatz, Nature (London) \textbf{410}, 268
1046: (2001); R.\ Albert and A.-L.\ Barab\'{a}si, Rev.\ Mod.\
1047: Phys.\ \textbf{74}, 47 (2002); S.~N.\ Dorogovtsev and J.~F.~F.\
1048: Mendes, Adv.\ Phys.\ \textbf{51}, 1079 (2002).
1049: \bibitem{review2} M.\ E.\ J.\ Newman, SIAM Rev., (to appear).
1050: \bibitem{WS} D.~J.\ Watts and S.~H.\ Strogatz, Nature (London) \textbf{393},
1051: 440 (1998).
1052: \bibitem{sf} R.\ Albert and A.-L.\ Barab\'{a}si, Science
1053: \textbf{286}, 509 (1999).
1054: \bibitem{assmix} M.\ E.\ J.\ Newman, Phys.\ Rev.\ Lett.\ \textbf{89},
1055: 208701 (2002).
1056: \bibitem{grid} G.\ Caldarelli, R.\ Pastor-Santorras, and A.\
1057: Vespignani, ``Cycles structure and local ordering in complex
1058: networks'' e-print arXiv:cond-mat/0212026 (unpublished).
1059: \bibitem{jeong} H.\ Jeong, B.\ Tombor, R.\ Albert, Z.\ N.\ Oltvai, and
1060: A.-L.\ Barab\'{a}si, Nature \textbf{407}, 651 (2000).
1061: \bibitem{liljeros} F.~Liljeros, C.~R.\ Edling, L.~A.~N.\ Amaral,
1062: H.~E.\ Stanley, and Y.~{\AA}berg, Nature (London) \textbf{411}, 907
1063: (2001).
1064: \bibitem{lea} F.~Liljeros, C.~R.\ Edling, L.~A.~N.\ Amaral, Microbes
1065: and Infection, (2003, to appear).
1066: \bibitem{partner} P.\ S.\ Bearman, J.\ Moody, and K.\ Stovel,
1067: ``Chains of affection: The structure of adolecent romantic and
1068: sexual networks'', Institute for Social and Economic Research and
1069: Policy, report no.\ 02-04 (2002, unpublished).
1070: \bibitem{freeman} When the type of every vertex is known, this
1071: structure can be measured by Freeman's segregation index $S$, which
1072: is (roughly speaking) the fraction cross-type edges missing in a
1073: graph, compared with a completely random graph---a graph that is
1074: close to bipartite would then have $S<0$. L.\ C.\ Freeman,
1075: Sociological Methods and Research \textbf{6}, 411 (1978). See also:
1076: J.\ C.\ Mitchell, Connections \textbf{2}, 9 (1978); L.\ C.\
1077: Freeman, Connections \textbf{2}, 13 (1978).
1078: \bibitem{HEL} P.\ Holme, C.\ R.\ Edling, and F.\ Liljeros, ``Structure
1079: and Time-Evolution of the Internet Community pussokram.com'',
1080: e-print arXiv:cond-mat/0210514 (unpublished).
1081: \bibitem{hope} It should be noted that many NP-hard optimization
1082: problems display phase transitions between ``easy'' and ``hard''
1083: regimes, e.g.\ the 3-coloring problem is known to be hard in the
1084: small-world regime of the WS model~\cite{WS}. T.\ Walsh, in
1085: \textit{Proceedings of the 16th International Joint Conference on
1086: Artificial Intelligence} edited by T.\ Dean (Morgan Kaufmann, San
1087: Francisco, 1999). For general references, see e.g.: P.\ Cheeseman,
1088: B.\ Kanefsky, and W.~M.\ Taylor, in \textit{Proceedings of IJCAI-91}
1089: edited by J.\ Mylopoulos and R.\ Reiter (Kaufmann, San Mateo, 1991),
1090: pp.\ 331-337; T.\ Hogg, B.\ A.\ Huberman, and C.\ P.\ Williams,
1091: Artificial Intelligence \textbf{88}, 1 (1996).
1092: \bibitem{jerrum} M.\ Jerrum and G.\ Sorkin, ECS-LFCS-93-260 (1993,
1093: unpublished).
1094: \bibitem{schreiber} G.\ R.\ Schreiber and O.\ C.\ Martin, SIAM J.\
1095: Optim.\ \textbf{10}, 231 (1999).
1096: \bibitem{FA} Y.\ Fu and P.~W.\ Anderson, J.\ Phys.\ A: Math.\ Gen.\
1097: \textbf{19} (1986) 1605.
1098: \bibitem{alava} M.~J.\ Alava, P.~M.\ Duxbury, C.~F.\ Moukarzel, and
1099: H.\ Rieger in \textit{Phase Transitions and Critical Phenomena},
1100: Vol.~18, edited by C.~Domb and J.~L.\ Lebowitz (Academic
1101: Press, London, 2001), pp.\ 143-317.
1102: \bibitem{karp} R.~M.\ Karp in \textit{Complexity of Computer
1103: Computations}, edited by R.~E.\ Miller and J.~W.\ Thatcher (Plenum
1104: Press, New York, 1972), pp.\ 85-103.
1105: \bibitem{bw} A.~Barrat and M.~Weigt, Eur.\ Phys.\ J.\ B \textbf{13},
1106: 547 (2000).
1107: \bibitem{spin}
1108: See e.g.:
1109: M.~Gitterman, J.\ Phys.\ A: Math.\ Gen.\ \textbf{33}, 8373 (2000);
1110: P.~Svenson, Phys.\ Rev.\ E \textbf{64}, 036122 (2001);
1111: B.~J.\ Kim, H.\ Hong, P.\ Holme, G.\ S.\ Jeon, P.\ Minnhagen, and
1112: M.\ Y.\ Choi, Phys.\ Rev.\ E \textbf{64}, 056135 (2001);
1113: C.~P.\ Herrero, Phys.\ Rev.\ E \textbf{65}, 066110 (2002); A.\
1114: Aleksiejuk, J.~A.\ Holyst, and D.\ Stauffer, Physica A
1115: \textbf{310}, 260 (2002); G.\ Bianconi, ``Mean field solution of the
1116: Ising model on a Barabasi-Albert network'', e-print
1117: arXiv:cond-mat/0204455 (unpublished); A.\ Aleksiejuk-Fronczak,
1118: ``Microscopic model for the logarithmic size effect on the Curie
1119: point in Barab\'{a}si-Albert networks'', e-print
1120: arXiv:cond-mat/0206027 (unpublished); D.\ Boyer and O.\ Miramontes,
1121: ``Interface Motion and Pinning in Small World Networks'', e-print
1122: arXiv:cond-mat/0210352 (unpublished); K.\ Medvedyeva, P.\ Holme, P.\
1123: Minnhagen, and B.\ J.\ Kim, ``Dynamic critical behavior of the
1124: \textsl{XY} model in small-world networks'', e-print
1125: arXiv:cond-mat/0301510 (unpublished).
1126: \bibitem{socstatmech}
1127: D.\ B.\ Bahr and E.\ Passerini, J.\ Math.\ Sociol.\ \textbf{23}, 1
1128: (1998); D.\ B.\ Bahr and E.\ Passerini, J.\ Math.\ Sociol.\
1129: \textbf{23}, 29 (1998); S.\ N.\ Durlauf, Proc.\ Natl.\ Acad.\ Sci.\
1130: USA \textbf{96}, 10582 (1999); H.\ P.\ Young, in \textit{The Economy
1131: as an Evolving Complex System} edited by L.\ E.\ Blume and S.\ N.\
1132: Durlauf, (Oxford University Press, Oxford, 2003).
1133: \bibitem{barahona} For an interesting discussion on this problem in a
1134: the somewhat more complex Ising spin glass model, see: F.~Barahona,
1135: J.\ Phys.\ A: Math.\ Gen.\ \textbf{15}, 3241 (1982).
1136: \bibitem{simann} S.\ Kirkpatrick, C.\ D.\ Gelatt, and M.\ P.\ Vecchi,
1137: Science \textbf{220}, 671 (1983).
1138: \bibitem{xmc} K.~Hukushima and K.~Nemoto, J.\ Phys.\ Soc.\ Jpn.\
1139: \textbf{65}, 1604 (1996).
1140: \bibitem{intro} See any introductory text on graph theory, for
1141: example: A.\ Tucker, \textit{Applied Combinatorics} 3 ed.\ (Wiley,
1142: New York, 1995), p.~31.
1143: \bibitem{ww} L.~R.\ Walker and R.\ E.\ Walstedt, Phys.\ Rev.\ B
1144: \textbf{22}, 3816 (1980).
1145: \bibitem{johnson} D.\ B.\ Johnson, SIAM J.\ Comput.\ \textbf{4}, 77
1146: (1975).
1147: \bibitem{note:alt} The intuitive way to find a least upper bound might
1148: be to search at two levels simultaneously ($\bar{n}$ and
1149: $\bar{n}-2$) and decrease the bound $\bar{n}\mapsto\bar{n} -2$ when
1150: $\Sigma_{\bar{n} -2}\geq M$ ($\Sigma_{\bar{n} -2}$ denotes the sum
1151: of the length of all circuits shorter than or equal to $\bar{n}
1152: -2$. This would slow down the computation considerably since our
1153: modified Johnson's algorithm mostly finds circuits of the length of
1154: the search depth $\bar{n}$, and thus it takes a long time to
1155: increase $\Sigma_{\bar{n} -2}$.
1156: \bibitem{note:bip} Consider $K_{N/2,N/2}=(V,U,E)$ where $V$ and $U$
1157: are the two vertex sets. We write a circuit as a $k$-tuple
1158: $(v_1,u_1,\cdots,v_{k/2},u_{k/2})$ where $v\in V$ and $u\in U$. Then
1159: there are $(N/2)\,(N/2)\,\cdots\, [N/2-(k/2-1)]\,[N/2-(k/2-1)] =
1160: [(N/2)!/ (N/2-k/2)!]^2$ distinct $k$-tuples. As for circuits, the
1161: choice of start-vertex $v_1$ does not matter, neither does the
1162: direction matter. To compensate for this we divide by $1/2k$ to get
1163: the right number of circuits of length $k$ in $K_{N/2,N/2}$.
1164: \bibitem{SCC} A.\ V.\ Aho, J.\ E.\ Hopcroft, and J.\ D.\ Ullman,
1165: \textit{The Design and Analysis of Computer Algorithms}
1166: (Addison-Wesley, Reading, 1974), pp.\ 189-195.
1167: \bibitem{ER} P.~Erd\"{o}s and A.~R\'{e}nyi, Publ.\ Math.\ Inst.\
1168: Hung.\ Acad.\ Sci.\ \textbf{5}, 17 (1960).
1169: \bibitem{nws} M.\ E.\ J.\ Newman, S.\ H.\ Strogatz, and D.\ J.\ Watts,
1170: Phys.\ Rev.\ E \textbf{64}, 026118 (2001).
1171: \bibitem{WAHO} G.~H.\ Wannier, Phys.\ Rev.\ \textbf{79}, 357 (1950);
1172: R.~M.~F.\ Houtappel, Physica (Amsterdam) \textbf{16}, 425 (1950).
1173: \bibitem{smith} R.\ Smith, ``Instant Messaging as a Scale-Free
1174: Network'', e-print arXiv:cond-mat/0206378 (unpublished).
1175: \bibitem{ebel} H.\ Ebel, L.\ I.\ Mielsch, and S.\ Bornholdt, Phys.\
1176: Rev.\ E \textbf{66}, 035103 (2002).
1177: \bibitem{newman3} M.\ E.\ J.\ Newman, Phys.\ Rev.\ E \textbf{64},
1178: 016131 (2001).
1179: \bibitem{davis} G.\ F.\ Davis, M.\ Yoo, and W.\ E.\ Baker, ``The small
1180: world of the corporate elite''; preprint, University of Michigan
1181: Business School (2001).
1182: \bibitem{newman1} M.\ E.\ J.\ Newman, Phys.\ Rev.\ E \textbf{64},
1183: 025102 (2001).
1184: \bibitem{zach} W.\ Zachary, Journal of Anthropological Research
1185: \textbf{33}, 452 (1977).
1186: \bibitem{prison} J.\ MacRae, Sociometry \textbf{23}, 360 (1960).
1187: \bibitem{note:not_really} Actually the $b_1$ value for Model 1 is
1188: $0.2\%$ lower (around three standard deviations) for $r_1=0.1$ than
1189: for $r_1=0$. We will not speculate in the reason for this since the
1190: effect is small and the overall picture is clear.
1191: \bibitem{note:cd} If one denotes the number of representations of
1192: (directed or undirected) circuits of length $n$ by $c(n)$ and the
1193: number of representations of paths of length $n$ by $p(n)$. (By
1194: representations we mean different ways of listing adjacent vertices;
1195: so, for example, a triangle has six representations.) Then we can
1196: can define $C=c(3)/p(3)$ and $D=c(4)/p(4)$. For more detailed
1197: definitions, see Refs.~\cite{bw} and \cite{HEL}.
1198: \bibitem{note:improvement} A potential improvement would be to measure
1199: $b_1$ and $b_2$ on the 2-core (the maximal subgraph with minimal
1200: degree 2) of $G$. This would eliminate circuit-free subgraphs that
1201: contains no information about the degree of heterophilous preference
1202: among the agents forming the network.
1203: \bibitem{anma} R.\ M.\ Anderson and R.\ M.\ May, \textit{Infectious
1204: diseases of humans} (Oxford University Press, Oxford, 1991).
1205: \bibitem{white} H.\ C.\ White, American Journal of Sociology
1206: \textbf{87}, 517 (1981).
1207: \bibitem{note:sigsum} In the $K_N$, the number of circuits of length
1208: $i$ is the $i$-permutations $N!/(N-i)!$ divided by $2i$ (a factor
1209: $i$ to compensate for the over-counting since a circuit is
1210: independent of starting vertex; a factor 2 to compensate for the
1211: double counting of the two directions). For $K_N$ the contribution
1212: of circuits of length $i$ to the sum is $i$ times the number of
1213: them, this gives Eq.~(\ref{eq:sigsum}).
1214: \end{thebibliography}
1215: \end{document}
1216: