cond-mat0505245/text.tex
1: \documentclass[12pt,final]{iopart}
2: \usepackage{graphicx}% Include figure files
3: \usepackage{rotate}
4: 
5: \begin{document}
6: 
7: \title{Comparing community structure identification}
8: 
9: \author{
10:   Leon Danon\dag\ddag,\  
11:   Albert D\'{i}az-Guilera,\dag\  
12:   Jordi Duch\ddag,\ and
13:   Alex Arenas\ddag 
14: } 
15:   
16: \address{\dag\ Departament de Fisica
17:   Fonamental,Universitat de Barcelona, Marti i Franques 1 08086
18:   Barcelona, Spain} 
19: \address{\ddag\ Departament d'Enginyeria
20:   Inform\`{a}tica i Matem\`{a}tiques, Universitat Rovira i Virgili,
21:   Campus Sescelades, 43007 Tarragona, Spain}
22: \ead{\tt leon.danon@urv.net}
23: 
24: \begin{abstract}
25: We compare recent approaches to community structure identification in
26: terms of sensitivity and computational cost. The recently proposed
27: modularity measure is revisited and the performance of the methods as
28: applied to {\em ad hoc} networks with known community structure, is
29: compared. We find that the most accurate methods tend to be more
30: computationally expensive, and that both aspects need to be considered
31: when choosing a method for practical purposes. The work is intended as
32: an introduction as well as a proposal for a standard benchmark test of
33: community detection methods.
34: \end{abstract}
35: 
36: \section{Introduction}
37: 
38: The study of complex networks has received an enormous amount of
39: attention from the scientific community in recent years
40: \cite{BARev,NRev,DMRev,Strogatz01,BookBorn,Sitges}. Physicists in
41: particular have become interested in the study of networks describing
42: the topologies of a wide variety of systems, such as the world wide
43: web, social and communication networks, biochemical networks and many
44: more.  An important open problem is the analysis of modular structure
45: found in many networks \cite{Newman04}. Distinct modules or
46: communities within networks can loosely be defined as subsets of nodes
47: which are more densely linked, when compared to the rest of the
48: network. Such communities have been observed in different kinds of
49: networks, most notably in social networks, but also in networks of
50: other origin such as metabolic or economic networks
51: \cite{Thurner04,Ravasz02,Guimera05,Holme03}. As a result, the problem of
52: identification of communities has been the focus of many recent
53: efforts.
54: 
55: Community detection in large networks is potentially very
56: useful. Nodes belonging to a tight-knit community are more than likely
57: to have other properties in common. For instance, in the world wide
58: web, community analysis has uncovered thematic clusters
59: \cite{Flake02,Eckmann02}. In biochemical or neural networks,
60: communities may be functional groups \cite{Zhou05}, and separating the
61: network into such groups could simplify functional analysis
62: considerably.
63: 
64: The problem of community detection is quite challenging and has been
65: the subject of discussion in various disciplines. A simpler version of
66: this problem, the graph bi-partitioning problem (GBP) has been the
67: topic of study in the realm of computer science for decades. Here, one
68: looks to separate the graph into two densely connected communities of
69: equal size, which are connected with the minimum number of links. This
70: is an NP complete problem\footnote{In computational complexity theory,
71: NP (`Non-deterministic Polynomial time') is the set of decision
72: problems solvable in polynomial time on a non-deterministic Turing
73: machine. NP-complete problems are the most difficult problems in NP.}
74: \cite{Garey79}, however several methods have been proposed to reduce
75: the complexity of the task
76: \cite{KernighanLin,Fiedler73,Boettcher01a,Pothen90}. In real complex
77: networks we often have no idea how many communities we wish to
78: discover, but in general it is more than two. This makes the process
79: all the more costly. What is more, communities may also be
80: hierarchical, that is communities may be further divided into
81: sub-communities and so on
82: \cite{Guimera03b,Gleiser03,Arenas03,Newman04a}.
83: 
84: Nevertheless, many attempts to tackle these problems have been
85: proposed recently. The proposed methods vary considerably in terms of
86: approach and application, which makes them difficult to
87: compare. Community identification is potentially very useful and
88: researchers from a number of fields may be interested in using one or
89: several of the methods for their own purposes. But which? In order for
90: the reader to be able to make an informed decision as to which method
91: is most appropriate for which purpose, we distil information from the
92: literature and compare the performance of those methods which lend
93: themselves to objective comparison.
94: 
95: To this end, this paper is organised as follows. In section 2
96: we revisit the modularity measure designed to evaluate how good a
97: particular partition of a network is. Then, we describe how to measure
98: the sensitivity of the various methods and suggest the use of a more
99: accurate representation of algorithm sensitivity based on information
100: theory. We then compare the methods from a computational cost
101: perspective and compare their sensitivity when applied to {\it ad hoc}
102: networks with community structure. Finally, we suggest appropriate
103: choices of community identification methods for a few different
104: problems.
105: 
106: \section{Evaluating community identification}
107: \label{Q}
108: 
109: A question that has been raised in recent years is how a given
110: partition of a network into communities can be evaluated. A simple
111: approach that has become widely accepted was proposed in \cite{NG}. It
112: is based on the intuitive idea that random networks do not exhibit
113: community structure. Let us imagine that we have an arbitrary network
114: and an arbitrary partition of that network into $n_c$ communities. It
115: is then possible to define a $n_c \times n_c$ size matrix ${\mathbf
116: e}$ where the elements $e_{ij}$ represent the fraction of total links
117: starting at a node in partition $i$ and ending at a node in partition
118: $j$.  Then, the sum of any row (or column) of ${\mathbf e}$, $a_i
119: = \sum_j e_{ij}$ corresponds to the fraction of links connected to
120: $i$.
121: 
122: If the network does not exhibit community structure, or if the
123: partitions are allocated without any regard to the underlying
124: structure, the expected value of the fraction of links within
125: partitions can be estimated. It is simply the probability that a link
126: begins at a node in $i$, $a_i$, multiplied by the fraction of links
127: that end at a node in $i$, $a_i$. So the expected number of
128: intra-community links is just $a_ia_i$. On the other hand we know that
129: the {\it real} fraction of links exclusively within a partition is
130: $e_{ii}$. So, we can compare the two directly and sum over all the
131: partitions in the graph.
132: 
133: \begin{equation}
134: Q\equiv\sum_i(e_{ii} - a_i^2)
135: \end{equation}
136: 
137: This is a measure known as {\it modularity}. As an example, let us
138: consider a network comprised of $n_c$ fully connected components with
139: no links between them. If we then have $n_c$ partitions, corresponding
140: exactly to the components, modularity will have a value of
141: $1-1/n_c$. As $n_c$ gets large, this value tends to $1$. On the other
142: hand, for particularly ``bad'' partitions, for example, when all the
143: nodes are in a community of their own, the value of modularity can
144: take negative values. This is due to the fact that when nodes are
145: alone in partitions there can be no internal links.  To avoid this
146: issue, Massen \& Doye propose an alternative measure \cite{Massen04}.
147: 
148: It is tempting to think that random networks exhibit very small values
149: of modularity. As Guimer\`{a} {\it et al.} show, this is not the case
150: \cite{Guimera04}. It is possible to find a partition which not only
151: has a nonzero value of modularity for random networks of finite size,
152: but that this value is quite high, for example a network of $128$
153: nodes and $1024$ links has a maximum modularity of 0.208. This
154: suggests that these networks that cannot have a modular structure
155: actually appear to have one due to fluctuations.
156: 
157: \section{Comparative evaluation}
158: \label{comparison}
159: 
160: The methods that have been presented recently are extremely varied,
161: and are based on a range of different ideas. In a longer article, we
162: describe the methods in more detail and classify them according to the
163: type of approach they present \cite{Danon05book}. Also, the full
164: description of each can be found in the respective references. Here we
165: concentrate on comparing the methods in terms of performance. In order
166: for the reader to be able to compare the algorithms, both in terms of
167: their speed and sensitivity, we would like to present a qualitative
168: comparison for all the methods presented until now.  However, this is
169: not possible as they are very varied, both conceptually and in their
170: applications.
171: 
172: One way that has been employed to test sensitivity in many cases is to
173: see how well a particular method performs when applied to {\it ad hoc}
174: networks with a well known, fixed community structure \cite{NG}. Such
175: networks are typically generated with $n = 128$ nodes, split into four
176: communities containing 32 nodes each. Pairs of nodes belonging to the
177: same community are linked with probability $p_{in}$ whereas pairs
178: belonging to different communities are joined with probability
179: $p_{out}$. The value of $p_{out}$ is taken so that the average number
180: of links a node has to members of any other community, $z_{out}$, can
181: be controlled. While $p_{out}$ (and therefore $z_{out}$) is varied
182: freely, the value of $p_{in}$ is chosen to keep the total average node
183: degree, $k$ constant, and set to 16. As $z_{out}$ is increased
184: from zero, the communities become more and more diffuse and harder to
185: identify, (Figure \ref{fig_ad_hoc}). Since the ``real'' community
186: structure is well known in this case, it is possible to measure the
187: number of nodes correctly classified by the method of community
188: identification.
189: 
190: In \cite{Newman04a}, the author describes a method to calculate this
191: value. The largest group found within each of the four ``real''
192: communities is considered correctly classified. If more than one
193: original community is clustered together by the algorithm, all nodes
194: in that cluster are considered incorrectly classified. For example,
195: for the case when $z_{out}/k$ is small, if a method finds three
196: communities, two of which correspond exactly to two original
197: communities, and a third, which corresponds to the other two clustered
198: together, this measure would consider half the nodes correctly
199: classified. As the author notes, this measure is quite harsh, and some
200: nodes which one may consider to be correctly clustered are not
201: counted. On the other end of the spectrum, as $z_{out}/k$ becomes
202: large, and the networks become essentially random networks, this
203: method rewards the identification of smaller clusters found within
204: each of the original communities, which could be misleading.
205: 
206: We suggest that a more discriminatory measure is more appropriate, and
207: propose the use of the {\it normalised mutual information} measure, as
208: described in \cite{Kuncheva04,Fred03}. It is based on defining a {\it
209: confusion matrix} $\bf{N}$, where the rows correspond to the ``real''
210: communities, and the columns correspond to the ``found''
211: communities. The element of $\bf{N}$, $N_{ij}$ is the number of nodes
212: in the real community $i$ that appear in the found community $j$. A
213: measure of similarity between the partitions, based on information
214: theory, is then:
215: 
216: \begin{equation}
217: I(A,B)=\frac{-2\sum^{c_A}_{i=1}\sum^{c_B}_{j=1}
218: N_{ij}\log\left(\frac{N_{ij}N}{N_{i.}N_{.j}}\right)}
219: {\sum^{c_A}_{i=1}N_{i.}\log\left(\frac{N_{i.}}{N}\right)
220:  + \sum^{c_B}_{j=1}N_{.j}\log\left(\frac{N_{.j}}{N}\right)}
221: \end{equation}
222: 
223: where the number of real communities is denoted $c_A$ and the number
224: of found communities is denoted $c_B$, the sum over row $i$ of matrix
225: $N_{ij}$ is denoted $N_{i.}$ and the sum over column $j$ is denoted
226: $N_{.j}$
227: 
228: 
229: If the found partitions are identical to the real communities, then
230: $I(A,B)$ takes its maximum value of 1. If the partition found by the
231: algorithm is totally independent of the real partition, for example
232: when the entire network is found to be one community, $I(A,B)= 0$.
233: 
234: Both measures of accuracy give a good idea of how a method
235: performs. However, the measure we propose for use here is more
236: representative of sensitivity if the performance is dubious, since it
237: measures the amount of information correctly extracted by the
238: algorithm explicitly. As an example, for small $z_{out}$, where two
239: original communities are clustered together by the algorithm, this
240: measure does not punish the algorithm as severely, taking into account
241: the ability to extract at least some information about the community
242: structure. On the other hand, for large $z_{out}$, this method is able
243: to detect that the clusters found by the algorithm have little to do
244: with the original communities, and $I(A,B) \rightarrow 0$.
245: 
246: \begin{table}
247:   \centering
248: 
249: \begin{tabular}{|c|c|c|c|}
250: 
251:   % after \\: \hline or \cline{col1-col2} \cline{col3-col4} ...
252: 
253:   \hline
254: 
255:   Author &Ref. & Label & Order \\
256: 
257:   \hline
258: 
259:   \hline
260: 
261:   Eckmann \& Moses&\cite{Eckmann02}&  EM & $O(m\langle k^2\rangle)$ \\
262: 
263:   Zhou \& Lipowsky &\cite{Zhou05} & ZL & $O(n^3)$ \\
264: 
265:   Latapy \& Pons & \cite{Latapy04} & LP & $O(n^3)$ \\
266: 
267:   Newman &\cite{Newman04a} & NF & $O(n\log^2n)$ \\
268: 
269:   Newman \& Girvan &\cite{NG} &  NG & $O(m^2n)$ \\
270: 
271:   Girvan \& Newman &\cite{GN} &  GN & $O(n^2m)$ \\
272: 
273:   Guimer\`{a} et al. & \cite{Guimera04,Guimera05b} & SA & parameter dependent\\
274: 
275:   Duch \& Arenas &\cite{Duch05} & DA & $O(n^2\log n)$ \\
276: 
277:   Fortunato et al. &\cite{Fortunato04} & FLM & $O(n^4)$ \\
278: 
279:   Radicchi et al. &\cite{Radicchi04} & RCCLP & $O(n^2)$ \\
280: 
281:   Donetti \& Mu\~noz&\cite{Donetti04,Donetti05} & DM/DMN & $O(n^3)$ \\
282: 
283:   Bagrow \& Bollt &\cite{Bagrow04}&  BB & $O(n^3)$ \\
284: 
285: 
286:   Capocci et al. &\cite{Capocci04}& CSCC & $O(n^2)$ \\
287: 
288:   Wu \& Huberman &\cite{Wu03}& WH & $O(n+m)$ \\
289: 
290:   Palla et al. & \cite{Palla05} & PK & $O(\exp(n))$\\
291: 
292:   Reichardt \& Bornholdt &\cite{Reichardt04} & RB & parameter dependent\\
293: 
294:   
295: 
296:   \hline
297: 
298: \end{tabular}
299: 
300: \caption{Table summarising how the computational cost of different
301: approaches scales with number of nodes $n$, number of links $m$ and
302: average degree $\langle k \rangle$ \cite{Dijkstra}. The labels shown
303: here are used in Figures \ref{fig_compare} and \ref{678}.}
304: \label{Table_Orders}
305: 
306: \end{table}
307: 
308: 
309: \begin{figure}
310: \centerline{\includegraphics*[width=0.7\columnwidth]{mi_compare}}
311: \caption{Algorithm sensitivity as applied to ad hoc networks with $n =
312:  128$, the network divided into four communities of $32$ nodes each
313:  and total average degree $z_{out}$ fixed to $16$. For low $z_{out}/k$
314:  the communities are easily distinguished. For higher $z_{out}/k$ this
315:  becomes more complicated. Both measures of comparing original
316:  communities to ones found by the detection method are shown. The
317:  normalised mutual information measure is more discriminatory and
318:  appears more sensitive to errors in the community identification
319:  procedure. The results are shown for Newman's fast algorithm
320:  \cite{Newman04a} and the extremal optimisation algorithm
321:  \cite{Duch05}.}
322: \label{fig_ad_hoc}
323: \end{figure}
324: 
325: \begin{figure}[!h]
326:  \centerline{\includegraphics*[width=0.9\columnwidth]{compare2}}
327: \caption{Comparing algorithm sensitivity using ad hoc networks with
328:  predetermined community structure. The $x$-axis is the proportion of
329:  connections to outside communities $z_{out}/k$ and the
330:  $y$-axis is the fraction of nodes correctly identified by the method
331:  measure as described in \cite{Newman04a}. The labels here correspond
332:  to the different methods and are listed in Table \ref{Table_Orders}.}
333: \label{fig_compare}
334: \end{figure}
335: 
336: \begin{figure}
337: \centerline{\includegraphics*[width=0.74\columnwidth]{final_678}}
338: %%\includegraphics*[width=0.9\columnwidth]{final_678}
339: \caption{The fraction of correctly identified nodes at three specific
340:   values of $z_{out}$, $6$, $7$ and $8$ for all available methods and
341:   for networks with fixed $k=16$. Note that for the FLM method,
342:   the data for $z_{out}=8$ were not available. Here we can see that
343:   most of the methods are very good at finding the ``correct''
344:   community structure for values of $z_{out}$ up to $6$. At $z_{out} =
345:   7$ some methods begin to falter but most still identify more than
346:   half of the nodes correctly. At $z_{out} = 8$, when on average half
347:   the links are external, two methods are still able to identify over 80
348:   \% of the nodes correctly.}
349: \label{678}
350: \end{figure}
351: 
352: In Figure \ref{fig_compare} we show the sensitivity of all methods we
353: have been able to gather. The percentage of correctly identified nodes
354: is calculated using the method described in \cite{Newman04a}, since
355: this is the method employed by the various authors. We can see that
356: accuracy varies in a similar way across the different methods as
357: $z_{out}$ increases and the communities become more diffuse. So, it
358: remains difficult to compare the performance by looking at the methods
359: separately, even with a reference performance. 
360: 
361: To summarise the large amount of information, in Figure \ref{678} we
362: plot the fraction of correctly identified nodes for only three values
363: of $z_{out}$ (6, 7 and 8), corresponding to $z_{out}/k = $ 0.375,
364: 0.4375 and 0.5 respectively, for each method. From this we can see
365: that most of the methods perform very well for $z_{out}=6$
366: ($z_{out}/k=0.375$), and even for $z_{out}=7$ ($z_{out}/k=0.4375$)
367: most can identify more than half the nodes correctly. For $z_{out}=8$
368: ($z_{out}/k=0.5$) two methods are still able to identify more than 80
369: $\%$ of the nodes correctly\footnote{One might expect that as the
370: proportion of out links approaches $0.5$ community structure no longer
371: exist. However since the external links are distributed among the
372: other three communities, individual nodes remain more strongly
373: connected to their own community than to other communities, even at
374: this high value of $z_{out}/k$.}.
375: 
376: 
377: 
378: 
379: While accuracy is an essential consideration when choosing a method,
380: it is just as important to consider the computational effort needed to
381: perform the analysis \cite{Dijkstra}. For some of the approaches
382: described in the literature, we have collected estimates of how the
383: cost scales with network observables. For networks with $n$ nodes and
384: $m$ links, the methods scale between $O(m+n)$ for the fastest, and
385: $O(\exp(n))$ for the slowest (Table \ref{Table_Orders}). Such
386: diversity is due to the different approaches taken by the authors. The
387: faster methods tend to be approximate and less accurate, while the
388: slower methods have other advantages (see \cite{Danon05book} for a
389: more detailed discussion). Differences in speed only become important
390: when dealing with larger networks.
391: 
392: \section{Choosing an algorithm}
393: 
394: 
395: One has to take many factors into account when choosing an algorithm
396: to use. The above comparison ought to give the reader an idea as to
397: which algorithm is most appropriate for a given problem. In many
398: cases, a compromise must be reached between accuracy and running time,
399: especially for larger networks. To clarify this further, here are a
400: few examples of real networks, and our suggestion for the
401: appropriate community identification algorithm.
402: 
403: Say we want to analyse a relatively small network, for example the
404: metabolic network of the worm {\it Caenorhabditis elegans}, which has
405: 453 nodes. Since the network is small, and current desktop computer
406: technology is reasonably fast, the speed of the algorithm should pose
407: no restriction, and one is free to chose the slower, more accurate
408: methods. In this case the Simulated Annealing (SA) method would be the
409: most appropriate choice, since it gives the most accurate partitions,
410: especially if the system is allowed to cool slowly (see
411: \cite{Guimera04,Massen04,Guimera05b} for more details).
412: 
413: Larger networks, with the number of nodes in the order of $10^5$
414: become intractable with the more accurate methods. For example, when
415: attempting to study the community structure of the actor collaboration
416: network with 374511 nodes, we estimate that the SA algorithm would
417: take a few months of uninterrupted computation. However, a reasonable
418: implementation of the fast algorithm would be able to perform this
419: analysis in just a few hours \cite{Clauset04}, making it the
420: appropriate choice, even if it's accuracy is not the best.
421: 
422: Let us consider an intermediate sized network such as the Pretty Good
423: Privacy (PGP) web of trust social network \cite{Guardiola02},
424: containing 10680 nodes. Although the SA algorithm would run in a
425: reasonable time, it may be a better choice to compromise and employ a
426: faster running algorithm. The EO method is not quite as accurate as
427: SA, but the saving in computational effort for a network of this size
428: is considerable. It is more accurate than the fast algorithm however,
429: and so would make it a better choice.
430: 
431: \section{Conclusion}
432: 
433: In this work we have given a brief overview and comparison of the
434: modern approaches to community identification in complex networks. A
435: large amount of knowledge has been collected in the field, and real
436: progress has been made, both in the identification of communities and
437: their characterisation. Some questions do remain open, and it is these
438: that we would suggest for further study. Despite these efforts, the
439: cost involved in computing communities in complex network remains
440: significant. The fastest algorithm runs in linear time, but this
441: particular method needs a priori knowledge of the number of expected
442: communities, and assumes that all communities are of similar size
443: \cite{Wu03}. At present, the fastest method for finding an unknown
444: number of communities of unknown sizes has a cost which scales as
445: $O(n\log^2n)$ with network size. While this makes the analysis of
446: extremely large networks feasible, this algorithm does not guarantee
447: that the partition found is the best possible one. Other algorithms
448: which are more computationally expensive have other merits, such as
449: accuracy or the ability to identify overlapping communities. So, when
450: choosing a method one must consider carefully the context of its
451: use. Ideally, one would like to have a method which guarantees
452: accuracy and is fast at the same time, but finding such a method is
453: challenging. The search for faster and more accurate methods is an
454: important one and we would suggest this for further study.
455: 
456: 
457: \ack The authors are grateful to Luca Donetti, Haijun Zhou, Mark
458: Newman, Santo Fortunato, J\"org Reichardt, Claudio Castellano,
459: Matthieu Latapy, Jean-Pierre Eckmann and Roger Guimer\`{a} for providing
460: their data and Sam Seaver for useful comments. This work has been
461: supported by DGES of the Spanish Government Grant No. BFM-2003-08258
462: and EC-FET Open Project No. IST-2001-33555. LD gratefully acknowledges
463: the funding of Generalitat de Catalunya.
464: 
465: \section*{References}
466: \begin{thebibliography}{50}
467: 
468: \bibitem{BARev}
469: Barab\'{a}si A~L and Albert R, 2002, {\em Rev. Mod. Phys.}, {\bf 74},  47.
470: 
471: \bibitem{NRev}
472: Newman M~E~J, 2003, {\em SIAM Review}, {\bf 45},  167.
473: 
474: \bibitem{DMRev}
475: Dorogovtsev S~N and Mendes J~F~F, 2003, {\em Evolution of Networks: From
476:   biological nets to the internet and WWW}, (Oxford University Press, Oxford).
477: 
478: \bibitem{Strogatz01}
479: Strogatz S~H, 2001, {\em Nature}, {\bf 410}, 268.
480: 
481: \bibitem{BookBorn}
482: Bornholdt S and Schuster H~G eds. 2002, {\em Handbook of Graphs and Networks - From the Genome to the Internet}, (Wiley-VCH, Berlin).
483: 
484: \bibitem{Sitges} 
485: Pastor-Satorras R, Rub\'{i} M and
486: D\'{i}az-Guilera A eds. 2003, {\em Statistical Mechanics of Complex
487: Networks}, (Springer).
488: 
489: \bibitem{Newman04}
490: Newman M~E~J, 2004, {\em Eur. Phys. J. B}, {\bf 38}, 321.
491: 
492: \bibitem{Thurner04}
493: Boss M, Elsinger H, Summer M and Thurner S, 2003, Preprint cond-mat/0309582.
494: 
495: \bibitem{Ravasz02}
496: Ravasz E, Somera A L, Mongru D A, Olvai Z N and Barab\'{a}si A L, 2002, {\em Science}, {\bf 297},  1551.
497: 
498: \bibitem{Guimera05}
499: Guimer\`{a} R, Amaral L A N, 2005, {\em Nature}, {\bf 433}, 895-900.
500: 
501: \bibitem{Holme03}
502: Holme P, Huss M and Jeong H, 2003, {\em Bioinformatics}, {\bf 19},  532.
503: 
504: \bibitem{Flake02} 
505: Flake G~W, Lawrence S, Giles C~L and Coetzee F~M, 2002, {\em IEEE Computer}, {\bf 35}, 66.
506: 
507: \bibitem{Eckmann02}
508: Eckmann J-P and Moses E, 2002, {\em Proc. Natl. Acad. Sci.}, {\bf 99},  5825.
509: 
510: \bibitem{Zhou05}
511: Zhou H and Lipowsky R, 2004, {\em Lecture Notes Comput. Sci.} {\bf 3038}, 1062 - 1069.
512: 
513: \bibitem{Latapy04}
514: Latapy M, Pons P, 2004, Preprint cond-mat/0412568.
515: 
516: \bibitem{Garey79} 
517: Garey M~R and  Johnson D~S, 1979, {\em Computers
518: and Intractability, A Guide to the Theory of NP-Completeness} (W. H
519: Freeman, New York).
520: 
521: \bibitem{KernighanLin}
522: Kernighan B W and Lin S, 1970, {\em The Bell System Tech. J.}, {\bf 49},  291.
523: 
524: \bibitem{Fiedler73}
525: Fiedler M, 1973, {\em Czech, Math. J.}, {\bf 23},  298.
526: 
527: \bibitem{Boettcher01a}
528: Boettcher S and  Percus A~G, 2001, {\em Phys. Rev. E}, {\bf 64} 026114.
529: 
530: \bibitem{Pothen90} 
531: Pothen A, Simon H and Liou K-P, 1990, {\em SIAM J. Matrix Anal. Appl.}, {\bf 11}, 430.
532: 
533: \bibitem{Guimera03b}
534: Guimer\`{a} R, Danon L, D\'{i}az-Guilera A, Giralt F and Arenas A, 2003, {\em Phys. Rev. E}, {\bf 68},065103.
535: 
536: \bibitem{Gleiser03}
537: Gleiser P and Danon L, 2003, {\em Adv. Complex Systems}, {\bf 6},  565.
538: 
539: \bibitem{Arenas03} 
540: Arenas A, Danon L, D\'{i}az-Guilera A, Gleiser P M
541: and Guimer\`{a} R, 2004, {\em Eur. Phys. J. B}, {\bf 38}, 373.
542: 
543: \bibitem{Newman04a}
544: Newman M~E~J, 2004, {\em Phys. Rev. E}, {\bf 69}, 066133.
545: 
546: \bibitem{NG}
547: Newman M~E~J and Girvan M, 2004, {\em Phys. Rev. E}, {\bf 69}, 026113.
548: 
549: \bibitem{Massen04}
550: Massen C~P and Doye J~P~K, 2005, {\em Phys. Rev. E}, {\bf 71}, 046101.
551: 
552: \bibitem{Guimera04}
553: Guimer\`{a} R, Sales M and Amaral L~A~N, 2004, {\em Phys. Rev. E}, {\bf 70},
554:  025101.
555: 
556: \bibitem{Danon05book} Danon L, Duch J, Arenas A and D\'{i}az-Guilera
557: A, to appear in COSIN book, Preprint cond-mat/0505245.
558: 
559: \bibitem{Kuncheva04} 
560: Kuncheva L~I and Hadjitodorov S~T, Systems, 2004, {\em Man and
561: Cybernetics, 2004 IEEE International Conference}, {\bf 2}, 1214.
562: \bibitem{Fred03}
563: Fred A~L~N and Jain A~K, 2003, {\em Proc. IEEE Computer Society Conference on
564:   Computer Vision and Pattern Recognition}, p. II-128-133.
565: 
566: \bibitem{Duch05}
567: Duch J and Arenas A, 2005, \PR {\em E}, {\bf 72}, 027104.
568: 
569: \bibitem{GN}
570: Girvan M and Newman M~E~J, 2002, {\em Proc. Natl. Acad. Sci.}, {\bf 99}, 7821.
571: 
572: \bibitem{Fortunato04}
573: Fortunato S, Latora V and Marchiori M, 2004, {\em \PR E}, {\bf 70}, 056104.
574: 
575: \bibitem{Radicchi04} 
576: Radicchi F, Castellano C, Cecconi F, Loreto V and Parisi D, 2004,
577: {\em Proc. Natl. Acad. Sci.}, {\bf 101}, 2658.
578: 
579: \bibitem{Donetti04}
580: Donetti L and \protect{Mu\~{n}oz} M~A, 2004, {\em J. Stat. Mech}, P10012.
581: 
582: \bibitem{Donetti05}
583: Donetti L and \protect{Mu\~{n}oz} M~A, 2005, Preprint physics/0504059.
584: 
585: \bibitem{Bagrow04}
586: Bagrow J~P and Bollt E~M, 2004, Preprint cond-mat/0412482.
587: 
588: \bibitem{Capocci04}
589: Capocci A, Servedio V, Colaiori F and Caldarelli G, 2004, {\em Lecture Notes Comput. Sci.}, {\bf 3243}, 181-188.
590: 
591: \bibitem{Wu03}
592: Wu F and Huberman B, 2004, {\em Eur. Phys. J. B}, {\bf 38}, 331.
593: 
594: \bibitem{Palla05} 
595: Palla G, Derenyi I, Farkas I and Vicsek T, 2005, {\em Nature}, {\bf 435}, 814.
596: 
597: \bibitem{Reichardt04}
598: Reichardt J and Bornholdt S, 2004, {\em Phys. Rev. Lett.} {\bf 93}, 218701.
599: 
600: \bibitem{Dijkstra} 
601: Dijkstra E W, 1976, {\em A Discipline of
602: Programming}, (Prentice-Hall, New Jersey).
603: 
604: \bibitem{Guimera05b}
605: Guimer\`{a} R and Amaral L~A~N, 2005, {\em J. Stat. Mech.}, P02001.
606: 
607: \bibitem{Clauset04}
608: Clauset A, Newman M~E~J and Moore C, 2004, {\em \PR E}, {\bf 70},  066111.
609: 
610: \bibitem{Guardiola02} Guardiola X, Guimer\`{a} R, Arenas A,
611: D\'{i}az-Guilera A, Streib D and Amaral L~A~N, 2002, Preprint
612: cond-mat/0206240.
613: 
614: \end{thebibliography}
615: 
616: \end{document}
617: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
618: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
619: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
620: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
621: