0505:cond-mat0505245/text.tex

1: \documentclass[12pt,final]{iopart}

2: \usepackage{graphicx}% Include figure files

3: \usepackage{rotate}

4:

5: \begin{document}

6:

7: \title{Comparing community structure identification}

8:

9: \author{

10:   Leon Danon\dag\ddag,\

11:   Albert D\'{i}az-Guilera,\dag\

12:   Jordi Duch\ddag,\ and

13:   Alex Arenas\ddag

14: }

15:

16: \address{\dag\ Departament de Fisica

17:   Fonamental,Universitat de Barcelona, Marti i Franques 1 08086

18:   Barcelona, Spain}

19: \address{\ddag\ Departament d'Enginyeria

20:   Inform\`{a}tica i Matem\`{a}tiques, Universitat Rovira i Virgili,

21:   Campus Sescelades, 43007 Tarragona, Spain}

22: \ead{\tt leon.danon@urv.net}

23:

24: \begin{abstract}

25: We compare recent approaches to community structure identification in

26: terms of sensitivity and computational cost. The recently proposed

27: modularity measure is revisited and the performance of the methods as

28: applied to {\em ad hoc} networks with known community structure, is

29: compared. We find that the most accurate methods tend to be more

30: computationally expensive, and that both aspects need to be considered

31: when choosing a method for practical purposes. The work is intended as

32: an introduction as well as a proposal for a standard benchmark test of

33: community detection methods.

34: \end{abstract}

35:

36: \section{Introduction}

37:

38: The study of complex networks has received an enormous amount of

39: attention from the scientific community in recent years

40: \cite{BARev,NRev,DMRev,Strogatz01,BookBorn,Sitges}. Physicists in

41: particular have become interested in the study of networks describing

42: the topologies of a wide variety of systems, such as the world wide

43: web, social and communication networks, biochemical networks and many

44: more.  An important open problem is the analysis of modular structure

45: found in many networks \cite{Newman04}. Distinct modules or

46: communities within networks can loosely be defined as subsets of nodes

47: which are more densely linked, when compared to the rest of the

48: network. Such communities have been observed in different kinds of

49: networks, most notably in social networks, but also in networks of

50: other origin such as metabolic or economic networks

51: \cite{Thurner04,Ravasz02,Guimera05,Holme03}. As a result, the problem of

52: identification of communities has been the focus of many recent

53: efforts.

54:

55: Community detection in large networks is potentially very

56: useful. Nodes belonging to a tight-knit community are more than likely

57: to have other properties in common. For instance, in the world wide

58: web, community analysis has uncovered thematic clusters

59: \cite{Flake02,Eckmann02}. In biochemical or neural networks,

60: communities may be functional groups \cite{Zhou05}, and separating the

61: network into such groups could simplify functional analysis

62: considerably.

63:

64: The problem of community detection is quite challenging and has been

65: the subject of discussion in various disciplines. A simpler version of

66: this problem, the graph bi-partitioning problem (GBP) has been the

67: topic of study in the realm of computer science for decades. Here, one

68: looks to separate the graph into two densely connected communities of

69: equal size, which are connected with the minimum number of links. This

70: is an NP complete problem\footnote{In computational complexity theory,

71: NP (`Non-deterministic Polynomial time') is the set of decision

72: problems solvable in polynomial time on a non-deterministic Turing

73: machine. NP-complete problems are the most difficult problems in NP.}

74: \cite{Garey79}, however several methods have been proposed to reduce

75: the complexity of the task

76: \cite{KernighanLin,Fiedler73,Boettcher01a,Pothen90}. In real complex

77: networks we often have no idea how many communities we wish to

78: discover, but in general it is more than two. This makes the process

79: all the more costly. What is more, communities may also be

80: hierarchical, that is communities may be further divided into

81: sub-communities and so on

82: \cite{Guimera03b,Gleiser03,Arenas03,Newman04a}.

83:

84: Nevertheless, many attempts to tackle these problems have been

85: proposed recently. The proposed methods vary considerably in terms of

86: approach and application, which makes them difficult to

87: compare. Community identification is potentially very useful and

88: researchers from a number of fields may be interested in using one or

89: several of the methods for their own purposes. But which? In order for

90: the reader to be able to make an informed decision as to which method

91: is most appropriate for which purpose, we distil information from the

92: literature and compare the performance of those methods which lend

93: themselves to objective comparison.

94:

95: To this end, this paper is organised as follows. In section 2

96: we revisit the modularity measure designed to evaluate how good a

97: particular partition of a network is. Then, we describe how to measure

98: the sensitivity of the various methods and suggest the use of a more

99: accurate representation of algorithm sensitivity based on information

100: theory. We then compare the methods from a computational cost

101: perspective and compare their sensitivity when applied to {\it ad hoc}

102: networks with community structure. Finally, we suggest appropriate

103: choices of community identification methods for a few different

104: problems.

105:

106: \section{Evaluating community identification}

107: \label{Q}

108:

109: A question that has been raised in recent years is how a given

110: partition of a network into communities can be evaluated. A simple

111: approach that has become widely accepted was proposed in \cite{NG}. It

112: is based on the intuitive idea that random networks do not exhibit

113: community structure. Let us imagine that we have an arbitrary network

114: and an arbitrary partition of that network into $n_c$ communities. It

115: is then possible to define a $n_c \times n_c$ size matrix ${\mathbf

116: e}$ where the elements $e_{ij}$ represent the fraction of total links

117: starting at a node in partition $i$ and ending at a node in partition

118: $j$.  Then, the sum of any row (or column) of ${\mathbf e}$, $a_i

119: = \sum_j e_{ij}$ corresponds to the fraction of links connected to

120: $i$.

121:

122: If the network does not exhibit community structure, or if the

123: partitions are allocated without any regard to the underlying

124: structure, the expected value of the fraction of links within

125: partitions can be estimated. It is simply the probability that a link

126: begins at a node in $i$, $a_i$, multiplied by the fraction of links

127: that end at a node in $i$, $a_i$. So the expected number of

128: intra-community links is just $a_ia_i$. On the other hand we know that

129: the {\it real} fraction of links exclusively within a partition is

130: $e_{ii}$. So, we can compare the two directly and sum over all the

131: partitions in the graph.

132:

133: \begin{equation}

134: Q\equiv\sum_i(e_{ii} - a_i^2)

135: \end{equation}

136:

137: This is a measure known as {\it modularity}. As an example, let us

138: consider a network comprised of $n_c$ fully connected components with

139: no links between them. If we then have $n_c$ partitions, corresponding

140: exactly to the components, modularity will have a value of

141: $1-1/n_c$. As $n_c$ gets large, this value tends to $1$. On the other

142: hand, for particularly ``bad'' partitions, for example, when all the

143: nodes are in a community of their own, the value of modularity can

144: take negative values. This is due to the fact that when nodes are

145: alone in partitions there can be no internal links.  To avoid this

146: issue, Massen \& Doye propose an alternative measure \cite{Massen04}.

147:

148: It is tempting to think that random networks exhibit very small values

149: of modularity. As Guimer\`{a} {\it et al.} show, this is not the case

150: \cite{Guimera04}. It is possible to find a partition which not only

151: has a nonzero value of modularity for random networks of finite size,

152: but that this value is quite high, for example a network of $128$

153: nodes and $1024$ links has a maximum modularity of 0.208. This

154: suggests that these networks that cannot have a modular structure

155: actually appear to have one due to fluctuations.

156:

157: \section{Comparative evaluation}

158: \label{comparison}

159:

160: The methods that have been presented recently are extremely varied,

161: and are based on a range of different ideas. In a longer article, we

162: describe the methods in more detail and classify them according to the

163: type of approach they present \cite{Danon05book}. Also, the full

164: description of each can be found in the respective references. Here we

165: concentrate on comparing the methods in terms of performance. In order

166: for the reader to be able to compare the algorithms, both in terms of

167: their speed and sensitivity, we would like to present a qualitative

168: comparison for all the methods presented until now.  However, this is

169: not possible as they are very varied, both conceptually and in their

170: applications.

171:

172: One way that has been employed to test sensitivity in many cases is to

173: see how well a particular method performs when applied to {\it ad hoc}

174: networks with a well known, fixed community structure \cite{NG}. Such

175: networks are typically generated with $n = 128$ nodes, split into four

176: communities containing 32 nodes each. Pairs of nodes belonging to the

177: same community are linked with probability $p_{in}$ whereas pairs

178: belonging to different communities are joined with probability

179: $p_{out}$. The value of $p_{out}$ is taken so that the average number

180: of links a node has to members of any other community, $z_{out}$, can

181: be controlled. While $p_{out}$ (and therefore $z_{out}$) is varied

182: freely, the value of $p_{in}$ is chosen to keep the total average node

183: degree, $k$ constant, and set to 16. As $z_{out}$ is increased

184: from zero, the communities become more and more diffuse and harder to

185: identify, (Figure \ref{fig_ad_hoc}). Since the ``real'' community

186: structure is well known in this case, it is possible to measure the

187: number of nodes correctly classified by the method of community

188: identification.

189:

190: In \cite{Newman04a}, the author describes a method to calculate this

191: value. The largest group found within each of the four ``real''

192: communities is considered correctly classified. If more than one

193: original community is clustered together by the algorithm, all nodes

194: in that cluster are considered incorrectly classified. For example,

195: for the case when $z_{out}/k$ is small, if a method finds three

196: communities, two of which correspond exactly to two original

197: communities, and a third, which corresponds to the other two clustered

198: together, this measure would consider half the nodes correctly

199: classified. As the author notes, this measure is quite harsh, and some

200: nodes which one may consider to be correctly clustered are not

201: counted. On the other end of the spectrum, as $z_{out}/k$ becomes

202: large, and the networks become essentially random networks, this

203: method rewards the identification of smaller clusters found within

204: each of the original communities, which could be misleading.

205:

206: We suggest that a more discriminatory measure is more appropriate, and

207: propose the use of the {\it normalised mutual information} measure, as

208: described in \cite{Kuncheva04,Fred03}. It is based on defining a {\it

209: confusion matrix} $\bf{N}$, where the rows correspond to the ``real''

210: communities, and the columns correspond to the ``found''

211: communities. The element of $\bf{N}$, $N_{ij}$ is the number of nodes

212: in the real community $i$ that appear in the found community $j$. A

213: measure of similarity between the partitions, based on information

214: theory, is then:

215:

216: \begin{equation}

217: I(A,B)=\frac{-2\sum^{c_A}_{i=1}\sum^{c_B}_{j=1}

218: N_{ij}\log\left(\frac{N_{ij}N}{N_{i.}N_{.j}}\right)}

219: {\sum^{c_A}_{i=1}N_{i.}\log\left(\frac{N_{i.}}{N}\right)

220:  + \sum^{c_B}_{j=1}N_{.j}\log\left(\frac{N_{.j}}{N}\right)}

221: \end{equation}

222:

223: where the number of real communities is denoted $c_A$ and the number

224: of found communities is denoted $c_B$, the sum over row $i$ of matrix

225: $N_{ij}$ is denoted $N_{i.}$ and the sum over column $j$ is denoted

226: $N_{.j}$

227:

228:

229: If the found partitions are identical to the real communities, then

230: $I(A,B)$ takes its maximum value of 1. If the partition found by the

231: algorithm is totally independent of the real partition, for example

232: when the entire network is found to be one community, $I(A,B)= 0$.

233:

234: Both measures of accuracy give a good idea of how a method

235: performs. However, the measure we propose for use here is more

236: representative of sensitivity if the performance is dubious, since it

237: measures the amount of information correctly extracted by the

238: algorithm explicitly. As an example, for small $z_{out}$, where two

239: original communities are clustered together by the algorithm, this

240: measure does not punish the algorithm as severely, taking into account

241: the ability to extract at least some information about the community

242: structure. On the other hand, for large $z_{out}$, this method is able

243: to detect that the clusters found by the algorithm have little to do

244: with the original communities, and $I(A,B) \rightarrow 0$.

245:

246: \begin{table}

247:   \centering

248:

249: \begin{tabular}{|c|c|c|c|}

250:

251:   % after \\: \hline or \cline{col1-col2} \cline{col3-col4} ...

252:

253:   \hline

254:

255:   Author &Ref. & Label & Order \\

256:

257:   \hline

258:

259:   \hline

260:

261:   Eckmann \& Moses&\cite{Eckmann02}&  EM & $O(m\langle k^2\rangle)$ \\

262:

263:   Zhou \& Lipowsky &\cite{Zhou05} & ZL & $O(n^3)$ \\

264:

265:   Latapy \& Pons & \cite{Latapy04} & LP & $O(n^3)$ \\

266:

267:   Newman &\cite{Newman04a} & NF & $O(n\log^2n)$ \\

268:

269:   Newman \& Girvan &\cite{NG} &  NG & $O(m^2n)$ \\

270:

271:   Girvan \& Newman &\cite{GN} &  GN & $O(n^2m)$ \\

272:

273:   Guimer\`{a} et al. & \cite{Guimera04,Guimera05b} & SA & parameter dependent\\

274:

275:   Duch \& Arenas &\cite{Duch05} & DA & $O(n^2\log n)$ \\

276:

277:   Fortunato et al. &\cite{Fortunato04} & FLM & $O(n^4)$ \\

278:

279:   Radicchi et al. &\cite{Radicchi04} & RCCLP & $O(n^2)$ \\

280:

281:   Donetti \& Mu\~noz&\cite{Donetti04,Donetti05} & DM/DMN & $O(n^3)$ \\

282:

283:   Bagrow \& Bollt &\cite{Bagrow04}&  BB & $O(n^3)$ \\

284:

285:

286:   Capocci et al. &\cite{Capocci04}& CSCC & $O(n^2)$ \\

287:

288:   Wu \& Huberman &\cite{Wu03}& WH & $O(n+m)$ \\

289:

290:   Palla et al. & \cite{Palla05} & PK & $O(\exp(n))$\\

291:

292:   Reichardt \& Bornholdt &\cite{Reichardt04} & RB & parameter dependent\\

293:

294:

295:

296:   \hline

297:

298: \end{tabular}

299:

300: \caption{Table summarising how the computational cost of different

301: approaches scales with number of nodes $n$, number of links $m$ and

302: average degree $\langle k \rangle$ \cite{Dijkstra}. The labels shown

303: here are used in Figures \ref{fig_compare} and \ref{678}.}

304: \label{Table_Orders}

305:

306: \end{table}

307:

308:

309: \begin{figure}

310: \centerline{\includegraphics*[width=0.7\columnwidth]{mi_compare}}

311: \caption{Algorithm sensitivity as applied to ad hoc networks with $n =

312:  128$, the network divided into four communities of $32$ nodes each

313:  and total average degree $z_{out}$ fixed to $16$. For low $z_{out}/k$

314:  the communities are easily distinguished. For higher $z_{out}/k$ this

315:  becomes more complicated. Both measures of comparing original

316:  communities to ones found by the detection method are shown. The

317:  normalised mutual information measure is more discriminatory and

318:  appears more sensitive to errors in the community identification

319:  procedure. The results are shown for Newman's fast algorithm

320:  \cite{Newman04a} and the extremal optimisation algorithm

321:  \cite{Duch05}.}

322: \label{fig_ad_hoc}

323: \end{figure}

324:

325: \begin{figure}[!h]

326:  \centerline{\includegraphics*[width=0.9\columnwidth]{compare2}}

327: \caption{Comparing algorithm sensitivity using ad hoc networks with

328:  predetermined community structure. The $x$-axis is the proportion of

329:  connections to outside communities $z_{out}/k$ and the

330:  $y$-axis is the fraction of nodes correctly identified by the method

331:  measure as described in \cite{Newman04a}. The labels here correspond

332:  to the different methods and are listed in Table \ref{Table_Orders}.}

333: \label{fig_compare}

334: \end{figure}

335:

336: \begin{figure}

337: \centerline{\includegraphics*[width=0.74\columnwidth]{final_678}}

338: %%\includegraphics*[width=0.9\columnwidth]{final_678}

339: \caption{The fraction of correctly identified nodes at three specific

340:   values of $z_{out}$, $6$, $7$ and $8$ for all available methods and

341:   for networks with fixed $k=16$. Note that for the FLM method,

342:   the data for $z_{out}=8$ were not available. Here we can see that

343:   most of the methods are very good at finding the ``correct''

344:   community structure for values of $z_{out}$ up to $6$. At $z_{out} =

345:   7$ some methods begin to falter but most still identify more than

346:   half of the nodes correctly. At $z_{out} = 8$, when on average half

347:   the links are external, two methods are still able to identify over 80

348:   \% of the nodes correctly.}

349: \label{678}

350: \end{figure}

351:

352: In Figure \ref{fig_compare} we show the sensitivity of all methods we

353: have been able to gather. The percentage of correctly identified nodes

354: is calculated using the method described in \cite{Newman04a}, since

355: this is the method employed by the various authors. We can see that

356: accuracy varies in a similar way across the different methods as

357: $z_{out}$ increases and the communities become more diffuse. So, it

358: remains difficult to compare the performance by looking at the methods

359: separately, even with a reference performance.

360:

361: To summarise the large amount of information, in Figure \ref{678} we

362: plot the fraction of correctly identified nodes for only three values

363: of $z_{out}$ (6, 7 and 8), corresponding to $z_{out}/k = $ 0.375,

364: 0.4375 and 0.5 respectively, for each method. From this we can see

365: that most of the methods perform very well for $z_{out}=6$

366: ($z_{out}/k=0.375$), and even for $z_{out}=7$ ($z_{out}/k=0.4375$)

367: most can identify more than half the nodes correctly. For $z_{out}=8$

368: ($z_{out}/k=0.5$) two methods are still able to identify more than 80

369: $\%$ of the nodes correctly\footnote{One might expect that as the

370: proportion of out links approaches $0.5$ community structure no longer

371: exist. However since the external links are distributed among the

372: other three communities, individual nodes remain more strongly

373: connected to their own community than to other communities, even at

374: this high value of $z_{out}/k$.}.

375:

376:

377:

378:

379: While accuracy is an essential consideration when choosing a method,

380: it is just as important to consider the computational effort needed to

381: perform the analysis \cite{Dijkstra}. For some of the approaches

382: described in the literature, we have collected estimates of how the

383: cost scales with network observables. For networks with $n$ nodes and

384: $m$ links, the methods scale between $O(m+n)$ for the fastest, and

385: $O(\exp(n))$ for the slowest (Table \ref{Table_Orders}). Such

386: diversity is due to the different approaches taken by the authors. The

387: faster methods tend to be approximate and less accurate, while the

388: slower methods have other advantages (see \cite{Danon05book} for a

389: more detailed discussion). Differences in speed only become important

390: when dealing with larger networks.

391:

392: \section{Choosing an algorithm}

393:

394:

395: One has to take many factors into account when choosing an algorithm

396: to use. The above comparison ought to give the reader an idea as to

397: which algorithm is most appropriate for a given problem. In many

398: cases, a compromise must be reached between accuracy and running time,

399: especially for larger networks. To clarify this further, here are a

400: few examples of real networks, and our suggestion for the

401: appropriate community identification algorithm.

402:

403: Say we want to analyse a relatively small network, for example the

404: metabolic network of the worm {\it Caenorhabditis elegans}, which has

405: 453 nodes. Since the network is small, and current desktop computer

406: technology is reasonably fast, the speed of the algorithm should pose

407: no restriction, and one is free to chose the slower, more accurate

408: methods. In this case the Simulated Annealing (SA) method would be the

409: most appropriate choice, since it gives the most accurate partitions,

410: especially if the system is allowed to cool slowly (see

411: \cite{Guimera04,Massen04,Guimera05b} for more details).

412:

413: Larger networks, with the number of nodes in the order of $10^5$

414: become intractable with the more accurate methods. For example, when

415: attempting to study the community structure of the actor collaboration

416: network with 374511 nodes, we estimate that the SA algorithm would

417: take a few months of uninterrupted computation. However, a reasonable

418: implementation of the fast algorithm would be able to perform this

419: analysis in just a few hours \cite{Clauset04}, making it the

420: appropriate choice, even if it's accuracy is not the best.

421:

422: Let us consider an intermediate sized network such as the Pretty Good

423: Privacy (PGP) web of trust social network \cite{Guardiola02},

424: containing 10680 nodes. Although the SA algorithm would run in a

425: reasonable time, it may be a better choice to compromise and employ a

426: faster running algorithm. The EO method is not quite as accurate as

427: SA, but the saving in computational effort for a network of this size

428: is considerable. It is more accurate than the fast algorithm however,

429: and so would make it a better choice.

430:

431: \section{Conclusion}

432:

433: In this work we have given a brief overview and comparison of the

434: modern approaches to community identification in complex networks. A

435: large amount of knowledge has been collected in the field, and real

436: progress has been made, both in the identification of communities and

437: their characterisation. Some questions do remain open, and it is these

438: that we would suggest for further study. Despite these efforts, the

439: cost involved in computing communities in complex network remains

440: significant. The fastest algorithm runs in linear time, but this

441: particular method needs a priori knowledge of the number of expected

442: communities, and assumes that all communities are of similar size

443: \cite{Wu03}. At present, the fastest method for finding an unknown

444: number of communities of unknown sizes has a cost which scales as

445: $O(n\log^2n)$ with network size. While this makes the analysis of

446: extremely large networks feasible, this algorithm does not guarantee

447: that the partition found is the best possible one. Other algorithms

448: which are more computationally expensive have other merits, such as

449: accuracy or the ability to identify overlapping communities. So, when

450: choosing a method one must consider carefully the context of its

451: use. Ideally, one would like to have a method which guarantees

452: accuracy and is fast at the same time, but finding such a method is

453: challenging. The search for faster and more accurate methods is an

454: important one and we would suggest this for further study.

455:

456:

457: \ack The authors are grateful to Luca Donetti, Haijun Zhou, Mark

458: Newman, Santo Fortunato, J\"org Reichardt, Claudio Castellano,

459: Matthieu Latapy, Jean-Pierre Eckmann and Roger Guimer\`{a} for providing

460: their data and Sam Seaver for useful comments. This work has been

461: supported by DGES of the Spanish Government Grant No. BFM-2003-08258

462: and EC-FET Open Project No. IST-2001-33555. LD gratefully acknowledges

463: the funding of Generalitat de Catalunya.

464:

465: \section*{References}

466: \begin{thebibliography}{50}

467:

468: \bibitem{BARev}

469: Barab\'{a}si A~L and Albert R, 2002, {\em Rev. Mod. Phys.}, {\bf 74},  47.

470:

471: \bibitem{NRev}

472: Newman M~E~J, 2003, {\em SIAM Review}, {\bf 45},  167.

473:

474: \bibitem{DMRev}

475: Dorogovtsev S~N and Mendes J~F~F, 2003, {\em Evolution of Networks: From

476:   biological nets to the internet and WWW}, (Oxford University Press, Oxford).

477:

478: \bibitem{Strogatz01}

479: Strogatz S~H, 2001, {\em Nature}, {\bf 410}, 268.

480:

481: \bibitem{BookBorn}

482: Bornholdt S and Schuster H~G eds. 2002, {\em Handbook of Graphs and Networks - From the Genome to the Internet}, (Wiley-VCH, Berlin).

483:

484: \bibitem{Sitges}

485: Pastor-Satorras R, Rub\'{i} M and

486: D\'{i}az-Guilera A eds. 2003, {\em Statistical Mechanics of Complex

487: Networks}, (Springer).

488:

489: \bibitem{Newman04}

490: Newman M~E~J, 2004, {\em Eur. Phys. J. B}, {\bf 38}, 321.

491:

492: \bibitem{Thurner04}

493: Boss M, Elsinger H, Summer M and Thurner S, 2003, Preprint cond-mat/0309582.

494:

495: \bibitem{Ravasz02}

496: Ravasz E, Somera A L, Mongru D A, Olvai Z N and Barab\'{a}si A L, 2002, {\em Science}, {\bf 297},  1551.

497:

498: \bibitem{Guimera05}

499: Guimer\`{a} R, Amaral L A N, 2005, {\em Nature}, {\bf 433}, 895-900.

500:

501: \bibitem{Holme03}

502: Holme P, Huss M and Jeong H, 2003, {\em Bioinformatics}, {\bf 19},  532.

503:

504: \bibitem{Flake02}

505: Flake G~W, Lawrence S, Giles C~L and Coetzee F~M, 2002, {\em IEEE Computer}, {\bf 35}, 66.

506:

507: \bibitem{Eckmann02}

508: Eckmann J-P and Moses E, 2002, {\em Proc. Natl. Acad. Sci.}, {\bf 99},  5825.

509:

510: \bibitem{Zhou05}

511: Zhou H and Lipowsky R, 2004, {\em Lecture Notes Comput. Sci.} {\bf 3038}, 1062 - 1069.

512:

513: \bibitem{Latapy04}

514: Latapy M, Pons P, 2004, Preprint cond-mat/0412568.

515:

516: \bibitem{Garey79}

517: Garey M~R and  Johnson D~S, 1979, {\em Computers

518: and Intractability, A Guide to the Theory of NP-Completeness} (W. H

519: Freeman, New York).

520:

521: \bibitem{KernighanLin}

522: Kernighan B W and Lin S, 1970, {\em The Bell System Tech. J.}, {\bf 49},  291.

523:

524: \bibitem{Fiedler73}

525: Fiedler M, 1973, {\em Czech, Math. J.}, {\bf 23},  298.

526:

527: \bibitem{Boettcher01a}

528: Boettcher S and  Percus A~G, 2001, {\em Phys. Rev. E}, {\bf 64} 026114.

529:

530: \bibitem{Pothen90}

531: Pothen A, Simon H and Liou K-P, 1990, {\em SIAM J. Matrix Anal. Appl.}, {\bf 11}, 430.

532:

533: \bibitem{Guimera03b}

534: Guimer\`{a} R, Danon L, D\'{i}az-Guilera A, Giralt F and Arenas A, 2003, {\em Phys. Rev. E}, {\bf 68},065103.

535:

536: \bibitem{Gleiser03}

537: Gleiser P and Danon L, 2003, {\em Adv. Complex Systems}, {\bf 6},  565.

538:

539: \bibitem{Arenas03}

540: Arenas A, Danon L, D\'{i}az-Guilera A, Gleiser P M

541: and Guimer\`{a} R, 2004, {\em Eur. Phys. J. B}, {\bf 38}, 373.

542:

543: \bibitem{Newman04a}

544: Newman M~E~J, 2004, {\em Phys. Rev. E}, {\bf 69}, 066133.

545:

546: \bibitem{NG}

547: Newman M~E~J and Girvan M, 2004, {\em Phys. Rev. E}, {\bf 69}, 026113.

548:

549: \bibitem{Massen04}

550: Massen C~P and Doye J~P~K, 2005, {\em Phys. Rev. E}, {\bf 71}, 046101.

551:

552: \bibitem{Guimera04}

553: Guimer\`{a} R, Sales M and Amaral L~A~N, 2004, {\em Phys. Rev. E}, {\bf 70},

554:  025101.

555:

556: \bibitem{Danon05book} Danon L, Duch J, Arenas A and D\'{i}az-Guilera

557: A, to appear in COSIN book, Preprint cond-mat/0505245.

558:

559: \bibitem{Kuncheva04}

560: Kuncheva L~I and Hadjitodorov S~T, Systems, 2004, {\em Man and

561: Cybernetics, 2004 IEEE International Conference}, {\bf 2}, 1214.

562: \bibitem{Fred03}

563: Fred A~L~N and Jain A~K, 2003, {\em Proc. IEEE Computer Society Conference on

564:   Computer Vision and Pattern Recognition}, p. II-128-133.

565:

566: \bibitem{Duch05}

567: Duch J and Arenas A, 2005, \PR {\em E}, {\bf 72}, 027104.

568:

569: \bibitem{GN}

570: Girvan M and Newman M~E~J, 2002, {\em Proc. Natl. Acad. Sci.}, {\bf 99}, 7821.

571:

572: \bibitem{Fortunato04}

573: Fortunato S, Latora V and Marchiori M, 2004, {\em \PR E}, {\bf 70}, 056104.

574:

575: \bibitem{Radicchi04}

576: Radicchi F, Castellano C, Cecconi F, Loreto V and Parisi D, 2004,

577: {\em Proc. Natl. Acad. Sci.}, {\bf 101}, 2658.

578:

579: \bibitem{Donetti04}

580: Donetti L and \protect{Mu\~{n}oz} M~A, 2004, {\em J. Stat. Mech}, P10012.

581:

582: \bibitem{Donetti05}

583: Donetti L and \protect{Mu\~{n}oz} M~A, 2005, Preprint physics/0504059.

584:

585: \bibitem{Bagrow04}

586: Bagrow J~P and Bollt E~M, 2004, Preprint cond-mat/0412482.

587:

588: \bibitem{Capocci04}

589: Capocci A, Servedio V, Colaiori F and Caldarelli G, 2004, {\em Lecture Notes Comput. Sci.}, {\bf 3243}, 181-188.

590:

591: \bibitem{Wu03}

592: Wu F and Huberman B, 2004, {\em Eur. Phys. J. B}, {\bf 38}, 331.

593:

594: \bibitem{Palla05}

595: Palla G, Derenyi I, Farkas I and Vicsek T, 2005, {\em Nature}, {\bf 435}, 814.

596:

597: \bibitem{Reichardt04}

598: Reichardt J and Bornholdt S, 2004, {\em Phys. Rev. Lett.} {\bf 93}, 218701.

599:

600: \bibitem{Dijkstra}

601: Dijkstra E W, 1976, {\em A Discipline of

602: Programming}, (Prentice-Hall, New Jersey).

603:

604: \bibitem{Guimera05b}

605: Guimer\`{a} R and Amaral L~A~N, 2005, {\em J. Stat. Mech.}, P02001.

606:

607: \bibitem{Clauset04}

608: Clauset A, Newman M~E~J and Moore C, 2004, {\em \PR E}, {\bf 70},  066111.

609:

610: \bibitem{Guardiola02} Guardiola X, Guimer\`{a} R, Arenas A,

611: D\'{i}az-Guilera A, Streib D and Amaral L~A~N, 2002, Preprint

612: cond-mat/0206240.

613:

614: \end{thebibliography}

615:

616: \end{document}

617: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

618: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

619: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

620: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

621: