0502:q-bio0502035/paper.tex

1: \documentclass[12pt]{article}

2:

3: \usepackage{graphicx}

4: \usepackage{scicite}

5: \usepackage{times}

6: \usepackage{natbib}

7: \usepackage{fancyhdr}

8:

9: \topmargin 0.0cm

10: \oddsidemargin 0.2cm

11: \textwidth 16cm

12: \textheight 21cm

13: \footskip 1.0cm

14:

15: \pagestyle{fancy}

16:

17: \lhead{\sffamily Letter to Nature}

18: \chead{}

19: \rhead{\sffamily Guimer\`a and Amaral}

20:

21: \lfoot{}

22: \cfoot{\thepage}

23: \rfoot{}

24:

25: \renewcommand{\headrulewidth}{0.5pt}

26: \renewcommand{\footrulewidth}{0pt}

27:

28: \newenvironment{sciabstract}{%

29: \begin{quote} \bf}

30: {\end{quote}}

31:

32: %

33: \bibliographystyle{nature}

34: %

35:

36: %

37: %

38: \title{Functional cartography\\of complex metabolic networks}

39: %

40: \author{Roger Guimer\`a and Lu\'{\i}s A. Nunes Amaral\\

41: %%

42: \normalsize{NICO and Dept. Chemical and Biological Engineering} \\

43: \normalsize{Northwestern University, Evanston, IL 60208, USA}\\ \\

44: %

45: }

46:

47: \date{}

48:

49:

50: %%%% Double space the manuscript

51: \renewcommand{\baselinestretch}{1.5}

52:

53: \begin{document}

54:

55: \maketitle

56: %

57:

58: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

59: %%%%%%%%%%%%%%%%%%% ABSTRACT %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

60: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

61:

62: \begin{sciabstract}

63: High-throughput techniques are leading to an explosive growth in the

64: size of biological databases and creating the opportunity to

65: revolutionize our understanding of life and disease. Interpretation of

66: these data remains, however, a major scientific challenge. Here, we

67: propose a methodology that enables us to extract and display

68: information contained in complex

69: networks~\cite{amaral00,albert02,amaral04}. Specifically, we

70: demonstrate that one can (i) find functional

71: modules\cite{hartwell99,girvan02} in complex networks, and (ii)

72: classify nodes into universal roles according to their pattern of

73: intra- and inter-module connections. The method thus yields a

74: ``cartographic representation'' of complex networks. Metabolic

75: networks \cite{jeong00,wagner01,ma03} are among the most challenging

76: biological networks and, arguably, the ones with more potential for

77: immediate applicability\cite{hatzimanikatis04}. We use our method to

78: analyze the metabolic networks of twelve organisms from three

79: different super-kingdoms. We find that, typically, 80\% of the nodes

80: are only connected to other nodes within their respective modules, and

81: that nodes with different roles are affected by different evolutionary

82: constraints and pressures. Remarkably, we find that low-degree

83: metabolites that connect different modules are more conserved than

84: hubs whose links are mostly within a single module.

85: \end{sciabstract}

86:

87:

88: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

89: %%%%%%%%%%%%%%%%%%% BODY OF THE PAPER %%%%%%%%%%%%%%%%%%%%%%

90: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

91:

92:

93: If one is to extract the significant information from the topology of

94: a large complex network, the knowledge of the role of each node is of

95: crucial importance. A cartographic analogy is helpful to illustrate

96: this point. Consider the network formed by all cities and towns in a

97: country---the nodes---and all the roads that connect them---the

98: links. It is clear that a map in which each city and town is

99: represented by a circle of fixed size and each road is represented by

100: a line of fixed width is hardly useful. Rather, real maps emphasize

101: capitals and important communication lines so that one can obtain

102: scale-specific information at a glance. Similarly, it is difficult, if

103: not impossible, to obtain information from a network with hundreds or

104: thousands of nodes and links, unless the information about nodes and

105: links is conveniently summarized. This is particularly true for

106: biological networks.

107:

108: Here, we propose a methodology, which is based on the connectivity of

109: the nodes, that yields a ``cartographic representation'' of a complex

110: network.  The first step in our method is to identify the functional

111: modules\cite{hartwell99,girvan02} in the network. In the cartographic

112: picture, modules are analogous to countries or regions, and enable a

113: coarse-grained, and thus simplified, description of the network. Then,

114: we classify the nodes in the network into a small number of {\it

115: system-independent\/} ``universal roles.''

116:

117:

118: \bigskip

119: \noindent

120: {\it Modules.}

121: %

122: It is a matter of common experience that social networks have

123: communities of highly interconnected nodes that are less connected to

124: nodes in other communities. Such modular structures have been reported

125: not only in social

126: networks~\cite{girvan02,guimera03,newman03,arenas04}, but also in food

127: webs~\cite{krause03} and biochemical

128: networks~\cite{hartwell99,ravasz02,holme03,papin04}. It is widely

129: believed that the modular structure of complex networks plays a

130: critical role in their

131: functionality~\cite{hartwell99,ravasz02,papin04}. There is therefore a

132: clear need to develop algorithms to identify modules

133: accurately~\cite{girvan02,newman03,eriksen03,newman04,radicchi04,donetti04}.

134:

135: We identify modules by maximizing the network's {\it

136: modularity}~\cite{newman03,newman04,guimera04c} using simulated

137: annealing~\cite{kirkpatrick83} (see Methods). Simulated annealing

138: enables us to carry out an exhaustive search and to minimize the

139: problem of finding sub-optimal partitions. It is noteworthy that, in

140: our method, one does not need to specify a priori the number of

141: modules; rather, this number is an outcome of the algorithm. Our

142: algorithm, which significantly outperforms the best algorithm in the

143: literature, is able to reliably identify modules in a network whose

144: nodes have as many as 50\% of their connections outside their own

145: module (Fig.~\ref{f-perf-mod}).

146:

147:

148: \bigskip

149: \noindent

150: {\it Roles in modular networks.}

151: %

152: It is plausible to surmise that the nodes in a network are connected

153: according to the {\it role\/} they fulfill. This fact has been long

154: recognized in the analysis of social

155: networks~\cite{wasserman94}. For example, in a classical

156: hierarchical organization, the CEO is not directly connected to plant

157: employees but is connected to the members of the board of

158: directors. Importantly, such a statement holds for virtually any

159: organization, that is, the role of CEO is defined irrespective of the

160: particular organization one considers.

161:

162: We propose a new method to determine the role of a node in a complex

163: network. Our approach is based on the idea that nodes with the same

164: role should have similar topological properties\cite{guimera??e} (see

165: Supplementary Information for a discussion on how our approach relates

166: to previous work). We hypothesize that the role of a node can be

167: determined, to a great extent, by its {\it within-module degree} and

168: its {\it participation coefficient}, which define how the node is

169: positioned in its own module and with respect to other

170: modules\cite{rives03,han04} (see Methods). These two properties are

171: easily computed once the modules of a network are known.

172:

173: The within-module degree $z_i$ measures how ``well-connected'' node

174: $i$ is to other nodes in the module. High values of $z_i$ indicate

175: high within-module degrees and vice versa. The participation

176: coefficient $P_i$ measures how ``well-distributed'' the links of node

177: $i$ are among different modules. The participation coefficient $P_i$

178: is close to one if its links are uniformly distributed among all the

179: modules and zero if all its links are within its own module.

180:

181: We define heuristically seven different ``universal roles,'' each

182: defined by a different region in the $zP$ parameters-space

183: (Fig.~\ref{f-roledef}). According to the within-module degree, we

184: classify nodes with $z \ge 2.5$ as module hubs and nodes $z<2.5$ as

185: non-hubs.  Both hub and non-hub nodes are then more finely

186: characterized by using the values of the participation coefficient

187: (see Supplementary Information for a detailed justification of this

188: classification scheme, and for a discussion on possible alternatives).

189:

190: We find that non-hub nodes can be naturally divided into four

191: different roles: (R1) {\it ultra-peripheral nodes}, i.e., nodes with

192: all its links within their module ($P \le 0.05$); (R2) {\it peripheral

193: nodes}, i.e., nodes with most links within their module ($0.05<P \le

194: 0.62$); (R3) {\it non-hub connector nodes}, i.e., nodes with many

195: links to other modules ($0.62<P \le 0.80$); and (R4) {\it non-hub

196: kinless nodes}, i.e., nodes with links homogeneously distributed among

197: all modules ($P>0.80$). We find that hub nodes can be naturally

198: divided into three different roles: (R5) {\it provincial hubs}. i.e.,

199: hub nodes with the vast majority of links within their module ($P \le

200: 0.30$); (R6) {\it connector hubs}, i.e., hubs with many links to most

201: of the other modules ($0.30<P \le 0.75$); and (R7) {\it kinless hubs},

202: i.e., hubs with links homogeneously distributed among all modules

203: ($P>0.75$).

204:

205:

206: \bigskip

207: \noindent

208: {\it Cartographic representation of metabolic networks.}

209: %

210: To test the applicability of our approach to complex biological

211: networks, we consider the metabolic

212: network~\cite{jeong00,wagner01,ravasz02,ma03,hatzimanikatis04} of

213: twelve organisms: four bacteria ({\it E. coli}, {\it B. subtilis},

214: {\it L. lactis}, and {\it T. elongatus}), four eukaryotes ({\it

215: S. cerevisiae}, {\it C. elegans}, {\it P. falciparum}, and {\it

216: H. sapiens}), and four archaea ({\it P. furiosus}, {\it A. pernix},

217: {\it A. fulgidus}, and {\it S. solfataricus}). In metabolic networks,

218: nodes represent metabolites and two nodes $i$ and $j$ are connected by

219: a link if there is a chemical reaction in which $i$ is a substrate and

220: $j$ a product, or vice versa. In our analysis, we use the database

221: developed by Ma and Zeng~\cite{ma03} (MZ) from the Kyoto Encyclopedia

222: of Genes and Genomes~\cite{kanehisa00} (KEGG). Importantly, the

223: results we report are not altered if we consider the complete KEGG

224: database instead (Figs.~\ref{f-roledef}c and \ref{f-conservation}b,

225: and Supplementary Information).

226:

227: First, we identify the functional modules in the different metabolic

228: networks (Fig.~\ref{f-metab}). Finding modules in metabolic networks

229: based on purely topological properties is an extremely important

230: task. For example, Schuster {\it et al.} have reported on the

231: impossibility of obtaining elementary flux modes \cite{schuster00}

232: from complete metabolic networks due to the combinatorial explosion of

233: the number of such modes \cite{schuster02}. Our algorithm identifies

234: an average of 15 different modules in each metabolic network---with a

235: maximum of 19 for {\it E. coli} and {\it H. sapiens}, and a minimum of

236: 11 for {\it A. fulgidus}. As expected, the density of links within

237: each of the modules is significantly larger than between modules,

238: typically 100-1000 times larger (see Supplementary Information).

239:

240: To assess how each of the modules is related to the pathways

241: traditionally defined in biology, we use the classification scheme

242: proposed in KEGG, which includes nine major pathways: carbohydrate

243: metabolism, energy metabolism, lipid metabolism, nucleotide

244: metabolism, amino acid metabolism, glycan biosynthesis and metabolism,

245: metabolism of cofactors and vitamins, biosynthesis of secondary

246: metabolites, and biodegradation of xenobiotics. Each metabolite in the

247: KEGG database is assigned to, at least, one pathway; thus, we can

248: determine to which pathways the metabolites in a given module

249: belong. We find that most modules contain metabolites mostly from one

250: major pathway. For example, in 17 of the 19 modules identified for

251: {\it E. coli}, more than one third of the metabolites belong to a

252: single pathway. Interestingly, some other modules---two in the case of

253: {\it E. coli}---cannot be trivially associated with a single

254: traditional pathway. These modules are typically central in the

255: metabolism and contain, mostly, metabolites that are classified in

256: KEGG as belonging to carbohydrate and amino acid metabolism.

257:

258: Next, we identify the role of each metabolite. In

259: Fig.~\ref{f-roledef}b we show the roles identified in the metabolic

260: network of {\it E. coli}. Remarkably, other organisms display a

261: similar distribution of the nodes in the different roles, even though

262: they correspond to organisms that are very distant from an

263: evolutionary standpoint (see Supplementary Information). Role R1,

264: which contains ultra-peripheral metabolites with small degree and no

265: between-module links, comprises 76-86\% of all the metabolites in the

266: networks. This considerably simplifies the coarse-grained

267: representation of the network as these nodes do not need to be

268: identified separately. Note that this finding alone represents an

269: important step towards the goal of extracting scale-specific

270: information from complex networks.

271:

272: \bigskip

273: \noindent

274: {\it Metabolite role and inter-species conservation.}

275: %

276: The information about modules and roles enables us to build a

277: ``cartographic representation'' of the metabolic network of, for

278: example, {\it E. coli} (Fig.~\ref{f-metab}). This representation

279: enables us to recover relevant biological information. For instance,

280: we find that the metabolism is mostly organized around the module

281: containing pyruvate, which, in turn, is strongly connected to the

282: module whose hub is acetyl-CoA. These two molecules are key to connect

283: the metabolism of carbohydrates, amino acids, and lipids to the TCA

284: cycle from which ATP is obtained. These two modules are connected to

285: more peripheral ones by key metabolites such as D-glyceraldehyde

286: 3-phosphate and D-fructose 6-phosphate (which connect to the glucose

287: and galactose metabolisms), D-ribose 5-phosphate (which connects to

288: the metabolism of certain nucleotides), and glycerone phosphate (which

289: connects to the metabolism of certain lipids).

290:

291: Importantly, our analysis also uncovers nodes with key connector roles

292: that take part in only a small but fundamental set of reactions. For

293: example, N-carbamoyl-L-aspartate takes part in only three reactions

294: but is vital because it connects the pyrimidine metabolism, whose hub

295: is uracil, to the core of the metabolism through the alanine and

296: aspartate metabolism. The potential importance of such non-hub

297: connectors points to another consideration. It is a plausible {\it

298: hypothesis\/} that nodes with different roles are under different

299: evolutionary constraints and pressures. In particular, one expects

300: that nodes with structurally relevant roles are more necessary and

301: therefore more conserved across species.

302:

303: To quantify the relation between roles and conservation, we define the

304: loss rate $p_{\rm lost}(R)$ (see Methods). Structurally relevant roles

305: are expected to have low values of $p_{\rm lost}(R)$ and vice

306: versa. Remarkably, we find that the different roles have, indeed,

307: different loss rates (Fig.~\ref{f-conservation}). As expected,

308: ultra-peripheral nodes (role R1) have the highest loss rate while

309: connector hubs (role R6) are the most conserved across all species

310: considered.

311:

312: The results for the comparison of $p_{\rm lost}(R)$ for

313: ultra-peripheral nodes and connector hubs is illustrative, but hardly

314: surprising. The comparison of $p_{\rm lost}(R)$ for non-hub connectors

315: (role R3) and provincial hubs (role R5), however, yields a surprising

316: and remarkable finding. The metabolites in the provincial hubs class

317: have many within-module connections, sometimes as much as five

318: standard deviations more connections than the average node in the

319: module. Conversely, non-hub connector metabolites have few links

320: relative to other nodes in their modules---and fewer total connections

321: than the metabolites in role R5 (see Supplementary Figs.~S12b,c). The

322: links of non-hub connectors, however, are distributed among {\it

323: several different modules}, while the links of provincial hubs are

324: mainly within their modules. We find that non-hub connectors are

325: systematically and significantly more conserved than provincial hub

326: metabolites (Fig.~\ref{f-conservation}).

327:

328: A possible explanation for the high degree of conservation of non-hub

329: connectors is the following. Connector nodes are responsible for

330: inter-module fluxes. These modules are, otherwise, poorly connected or

331: not connected at all to each other, so the elimination of connector

332: metabolites will likely have a large impact on the global structure of

333: fluxes in the network. On the contrary, the pathways in which

334: provincial hubs are involved may be backed up within the module, in

335: such a way that elimination of these metabolites may have a

336: comparatively smaller impact, which, in addition, would likely be

337: confined to the module containing the provincial hub.

338:

339: Our results therefore point to the need to consider each complex

340: biological network as a whole, instead of focusing on local

341: properties. In protein networks, for example, it has been reported

342: that hubs are more essential than non-hubs

343: \cite{jeong01}. Notwithstanding the relevance of such a finding, our

344: results suggest that the global role of nodes in the network might be

345: a better indicator of their importance than degree~\cite{han04}.

346:

347: Our ``cartography'' provides a scale-specific method to process the

348: information contained in the structure of complex networks, and to

349: extract knowledge about the function carried out by the network and

350: its constituents. An open question is how to adapt current

351: module-detection algorithms to networks with a hierarchical structure.

352:

353: For metabolic networks, a comparatively well studied and well

354: understood case, our method allows us to recover firmly established

355: biological facts, and to uncover important new results, such as the

356: significant conservation of non-hub connector metabolites. Similar

357: results can be expected when our method is applied to other complex

358: networks that are not as well studied as metabolic networks. Among

359: those, protein interaction and gene regulation networks may be the

360: most significant.

361:

362:

363: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

364: %%%%%%%%%%%%%%%%%%%%% METHODS %%%%%%%%%%%%%%%%%%%%%%%%%%%%

365: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

366: %

367: %

368: \section*{Methods}

369:

370: \subsection*{Modularity}

371:

372: For a given partition of the nodes of a network into modules, the

373: modularity $M$ of this partition

374: is~\cite{newman03,newman04,guimera04c}

375: %

376: \begin{equation}

377: M\equiv\sum_{s=1}^{N_M}\left[\frac{l_{s}}{L}-

378: \left(\frac{d_s}{2L}\right)^2\right]\,,

379: \label{e-modularity}

380: \end{equation}

381: %

382: where $N_M$ is the number of modules, $L$ is the number of links in

383: the network, $l_{s}$ is the number of links between nodes in module

384: $s$, and $d_s$ is the sum of the degrees of the nodes in module

385: $s$. The rationale for this definition of modularity is the

386: following. A good partition of a network into modules must comprise

387: many within-module links and as few as possible between-module

388: links. However, if one just tries to minimize the number of

389: between-module links (or, equivalently, maximize the number of

390: within-module links) the optimal partition consists of a single module

391: and no between-module links. Equation (\ref{e-modularity}) addresses

392: this difficulty by imposing that $M=0$ if nodes are placed at random

393: into modules {\it or} if all nodes are in the same

394: cluster~\cite{newman03,newman04,guimera04c}.

395:

396: The objective of a module identification algorithm is to find the

397: partition with largest modularity, and several methods have been

398: proposed to attain such a goal. Most of them rely on heuristic

399: procedures and use $M$---or a similar measure---only to assess their

400: performance. In contrast, we use simulated

401: annealing~\cite{kirkpatrick83} to find the partition with the largest

402: modularity.

403:

404: \subsection*{Simulated annealing for module identification}

405:

406: Simulated annealing\cite{kirkpatrick83} is a stochastic optimization

407: technique that enables one to find ``low cost'' configurations without

408: getting trapped in ``high-cost'' local minima. This is achieved by

409: introducing a {\it computational temperature} $T$. When $T$ is high,

410: the system can explore configurations of high cost while at low $T$

411: the system only explores low cost regions. By starting at high $T$ and

412: slowly decreasing $T$, the system descends gradually toward deep

413: minima, eventually overcoming small cost barriers.

414:

415: When identifying modules, the objective is to maximize the modularity

416: and, thus, the cost is $C=-M$, where $M$ is the modularity as defined

417: in Eq.~(\ref{e-modularity}). At each temperature, we perform a number

418: of random updates and accept them with probability

419: %

420: \begin{equation}

421: p=\left\{ \begin{array}{lcl}

422:         1		&       \quad\mbox{if} &  C_f \le C_i\\

423:         \exp{\left(-\frac{C_f-C_i}{T}\right)} & \quad\mbox{if} &       C_f > C_i

424:         \end{array}\right.

425: \end{equation}

426: %

427: where $C_f$ is the cost after the update and $C_i$ is the cost before

428: the update.

429:

430: Specifically, at each $T$ we propose $n_i=fS^2$ individual node

431: movements from one module to another, where $S$ is the number of nodes

432: in the network. We also propose $n_c=fS$ collective movements, which

433: involve either the merging two modules or splitting a module. For $f$

434: we typically choose $f=1$. After the movements are evaluated at a

435: certain $T$, the system is cooled down to $T'=cT$, with $c=0.995$.

436:

437: \subsection*{Within-module degree and participation coefficient}

438:

439: Each module can be organized in very different ways, ranging from

440: totally centralized---with one or a few nodes connected to all the

441: others---to totally decentralized---with all nodes having similar

442: connectivities. Nodes with similar roles are expected to have similar

443: relative within-module connectivity. If $\kappa_{i}$ is the number of

444: links of node $i$ to other nodes in its module $s_i$,

445: $\overline{\kappa}_{s_i}$ is the average of $\kappa$ over all the

446: nodes in $s_i$, and $\sigma_{\kappa_{s_i}}$ is the standard deviation

447: of $\kappa$ in $s_i$, then

448: %

449: \begin{equation}

450: z_i = \frac{\kappa_i - \overline{\kappa}_{s_i}}{\sigma_{\kappa_{s_i}}}

451: \end{equation}

452: %

453: is the so-called $z$-score. The within-module degree $z$-score

454: measures how ``well-connected'' node $i$ is to other nodes in the

455: module.

456:

457: Different roles can also arise because of the connections of a node to

458: modules other than its own. For example, two nodes with the same

459: $z$-score will play different roles if one of them is connected to

460: several nodes in other modules while the other is not. We define the

461: participation coefficient $P_i$ of node $i$ as

462: %

463: \begin{equation}

464: P_i=1-\sum_{s=1}^{N_M}\left(\frac{\kappa_{is}}{k_i} \right)^2

465: \end{equation}

466: %

467: where $\kappa_{is}$ is the number of links of node $i$ to nodes in

468: module $s$, and $k_i$ is the total degree of node $i$. The

469: participation coefficient of a node is therefore close to one if its

470: links are uniformly distributed among all the modules and zero if all

471: its links are within its own module.

472:

473:

474:

475: \subsection*{Loss rate}

476:

477: To quantify the relation between roles and conservation, we calculate

478: to which extent metabolites are conserved in the different species

479: depending on the role they play. Specifically, for a pair of species,

480: $A$ and $B$, we define the loss rate as the probability

481: $p(R_A=0|R_B=R) \equiv p_{\rm lost}(R)$ that a metabolite is not

482: present in one of the species ($R_A=0$) given that it plays role $R$

483: in the other species ($R_B=R$). Structurally relevant roles are

484: expected to have low values of $p_{\rm lost}(R)$ and vice versa.

485:

486:

487:

488: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

489: %%%%%%%%%%%%%%%%%% REFERENCES %%%%%%%%%%%%%%%%%%%%%%%%%%%%

490: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

491: %

492: \begin{thebibliography}{30}

493: \expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi

494: \expandafter\ifx\csname url\endcsname\relax

495:   \def\url#1{\texttt{#1}}\fi

496: \expandafter\ifx\csname urlprefix\endcsname\relax\def\urlprefix{URL }\fi

497:

498: \bibitem[{Amaral \emph{et~al.}(2000)Amaral, Scala, Barthel\'emy \&

499:   Stanley}]{amaral00}

500: Amaral, L. A.~N., Scala, A., Barthel\'emy, M. \& Stanley, H.~E.

501: \newblock Classes of small-world networks.

502: \newblock \emph{Proc. Natl. Acad. Sci. USA} \textbf{97}, 11149--11152 (2000).

503:

504: \bibitem[{Albert \& Barab\'asi(2002)}]{albert02}

505: Albert, R. \& Barab\'asi, A.-L.

506: \newblock Statistical mechanics of complex networks.

507: \newblock \emph{Rev. Mod. Phys.} \textbf{74}, 47--97 (2002).

508:

509: \bibitem[{Amaral \& Ottino(2004)}]{amaral04}

510: Amaral, L. A.~N. \& Ottino, J.

511: \newblock Complex networks: \protect{Augmenting} the framework for the study of

512:   complex systems.

513: \newblock \emph{Eur. Phys. J. B} \textbf{38}, 147--162 (2004).

514:

515: \bibitem[{Hartwell \emph{et~al.}(1999)Hartwell, Hopfield, Leibler \&

516:   Murray}]{hartwell99}

517: Hartwell, L.~H., Hopfield, J.~J., Leibler, S. \& Murray, A.~W.

518: \newblock From molecular to modular biology.

519: \newblock \emph{Nature} \textbf{402}, C47--C52 (1999).

520:

521: \bibitem[{Girvan \& Newman(2002)}]{girvan02}

522: Girvan, M. \& Newman, M. E.~J.

523: \newblock Community structure in social and biological networks.

524: \newblock \emph{Proc. Natl. Acad. Sci. USA} \textbf{99}, 7821--7826 (2002).

525:

526: \bibitem[{Jeong \emph{et~al.}(2000)Jeong, Tombor, Albert, Oltvai \&

527:   Barab\'asi}]{jeong00}

528: Jeong, H., Tombor, B., Albert, R., Oltvai, Z.~N. \& Barab\'asi, A.~L.

529: \newblock The large-scale organization of metabolic networks.

530: \newblock \emph{Nature} \textbf{407}, 651--654 (2000).

531:

532: \bibitem[{Wagner \& Fell(2001)}]{wagner01}

533: Wagner, A. \& Fell, D.~A.

534: \newblock The small world inside large metabolical networks.

535: \newblock \emph{Proc. Roy. Soc. B} \textbf{268}, 1803--1810 (2001).

536:

537: \bibitem[{Ma \& Zeng(2003)}]{ma03}

538: Ma, H. \& Zeng, A.-P.

539: \newblock Reconstruction of metabolic networks from genome data and analysis of

540:   their global structure for various organisms.

541: \newblock \emph{Bioinformatics} \textbf{19}, 270--277 (2003).

542:

543: \bibitem[{Hatzimanikatis \emph{et~al.}(2004)Hatzimanikatis, Li, Ionita \&

544:   Broadbelt}]{hatzimanikatis04}

545: Hatzimanikatis, V., Li, C., Ionita, J.~A. \& Broadbelt, L.

546: \newblock Metabolic networks: enzyme function and metabolite structure.

547: \newblock \emph{Curr. Opin. Struc. Biol.} \textbf{14}, 300--306 (2004).

548:

549: \bibitem[{Guimer\`a \emph{et~al.}(2003)Guimer\`a, Danon, D\'{\i}az-Guilera,

550:   Giralt \& Arenas}]{guimera03}

551: Guimer\`a, R., Danon, L., D\'{\i}az-Guilera, A., Giralt, F. \& Arenas, A.

552: \newblock Self-similar community structure in a network of human interactions.

553: \newblock \emph{Phys. Rev. E} \textbf{68}, art. no. 065103 (2003).

554:

555: \bibitem[{Newman \& Girvan(2004)}]{newman03}

556: Newman, M. E.~J. \& Girvan, M.

557: \newblock Finding and evaluating community structure in networks.

558: \newblock \emph{Phys. Rev. E} \textbf{69}, art. no. 026113 (2004).

559:

560: \bibitem[{Arenas \emph{et~al.}(2004)Arenas, Danon, D\'{\i}az-Guilera, Gleiser

561:   \& Guimer\`a}]{arenas04}

562: Arenas, A., Danon, L., D\'{\i}az-Guilera, A., Gleiser, P.~M. \& Guimer\`a, R.

563: \newblock Community analysis in social networks.

564: \newblock \emph{Eur. Phys. J. B} \textbf{38}, 373--380 (2004).

565:

566: \bibitem[{Krause \emph{et~al.}(2003)Krause, Frank, Mason, Ulanowicz \&

567:   Taylor}]{krause03}

568: Krause, A.~E., Frank, K.~A., Mason, D.~M., Ulanowicz, R.~E. \& Taylor, W.~W.

569: \newblock Compartments revealed in food-web structure.

570: \newblock \emph{Nature} \textbf{426}, 282--285 (2003).

571:

572: \bibitem[{Ravasz \emph{et~al.}(2002)Ravasz, Somera, Mongru, Oltvai \&

573:   Barab\'asi}]{ravasz02}

574: Ravasz, E., Somera, A.~L., Mongru, D.~A., Oltvai, Z.~N. \& Barab\'asi, A.-L.

575: \newblock Hierarchical organization of modularity in metabolic networks.

576: \newblock \emph{Science} \textbf{297}, 1551--1555 (2002).

577:

578: \bibitem[{Holme \& Huss(2003)}]{holme03}

579: Holme, P. \& Huss, M.

580: \newblock Subnetwork hierarchies of biochemical pathways.

581: \newblock \emph{Bioinformatics} \textbf{19}, 532--538 (2003).

582:

583: \bibitem[{Papin \emph{et~al.}(2004)Papin, Reed \& Palsson}]{papin04}

584: Papin, J.~A., Reed, J.~L. \& Palsson, B.~O.

585: \newblock Hierarchical thinking in network biology: the unbiased modularization

586:   of biochemical networks.

587: \newblock \emph{Trends Biochem. Sci.} \textbf{29}, 641--647 (2004).

588:

589: \bibitem[{Eriksen \emph{et~al.}(2003)Eriksen, Simonsen, Maslov \&

590:   Sneppen}]{eriksen03}

591: Eriksen, K.~A., Simonsen, I., Maslov, S. \& Sneppen, K.

592: \newblock Modularity and extreme edges of the \protect{Internet}.

593: \newblock \emph{Phys. Rev. Lett.} \textbf{90}, art. no. 148701 (2003).

594:

595: \bibitem[{Newman(2004)}]{newman04}

596: Newman, M. E.~J.

597: \newblock Fast algorithm for detecting community structure in networks.

598: \newblock \emph{Phys. Rev. E} \textbf{69}, art. no. 066133 (2004).

599:

600: \bibitem[{Radicchi \emph{et~al.}(2004)Radicchi, Castellano, Cecconi, Loreto \&

601:   Parisi}]{radicchi04}

602: Radicchi, F., Castellano, C., Cecconi, F., Loreto, V. \& Parisi, D.

603: \newblock Defining and identifying communities in networks.

604: \newblock \emph{Proc. Natl. Acad. Sci. USA} \textbf{101}, 2658--2663 (2004).

605:

606: \bibitem[{Donetti \& \protect{Mu\~{n}oz}(2004)}]{donetti04}

607: Donetti, L. \& \protect{Mu\~{n}oz}, M.~A.

608: \newblock Detecting network communities: \protect{A} new systematic and

609:   efficient algorithm.

610: \newblock \emph{J. Stat. Mech. Theor. Exp.} P10012 (2004).

611:

612: \bibitem[{Guimer\`a \emph{et~al.}(2004)Guimer\`a, Sales-Pardo \&

613:   Amaral}]{guimera04c}

614: Guimer\`a, R., Sales-Pardo, M. \& Amaral, L. A.~N.

615: \newblock Modularity from fluctuations in random graphs and complex networks.

616: \newblock \emph{Phys. Rev. E} \textbf{70}, art. no. 025101 (2004).

617:

618: \bibitem[{Kirkpatrick \emph{et~al.}(1983)Kirkpatrick, Gelatt \&

619:   Vecchi}]{kirkpatrick83}

620: Kirkpatrick, S., Gelatt, C.~D. \& Vecchi, M.~P.

621: \newblock Optimization by simulated annealing.

622: \newblock \emph{Science} \textbf{220}, 671--680 (1983).

623:

624: \bibitem[{Wasserman \& Faust(1994)}]{wasserman94}

625: Wasserman, S. \& Faust, K.

626: \newblock \emph{Social Network Analysis} (Cambridge University Press,

627:   Cambridge, U.K., 1994).

628:

629: \bibitem[{Guimer\`a \& Amaral(2004)}]{guimera??e}

630: Guimer\`a, R. \& Amaral, L. A.~N.

631: \newblock \emph{J. Stat. Mech. Theor. Exp.} submitted (2004).

632:

633: \bibitem[{Rives \& Galitski(2003)}]{rives03}

634: Rives, A.~W. \& Galitski, T.

635: \newblock Modular organization of cellular networks.

636: \newblock \emph{Proc. Natl. Acad. Sci. USA} \textbf{100}, 1128--1133 (2003).

637:

638: \bibitem[{Han \emph{et~al.}(2004)}]{han04}

639: Han, J.-D.~J. \emph{et~al.}

640: \newblock Evidence for dinamically organized modularity in the yeast

641:   protein-protein interaction network.

642: \newblock \emph{Nature} \textbf{430}, 88--93 (2004).

643:

644: \bibitem[{Kanehisa \& Goto(2000)}]{kanehisa00}

645: Kanehisa, M. \& Goto, S.

646: \newblock KEGG: Kyoto Encyclopedia of Genes and Genomes.

647: \newblock \emph{Nucleic Acids Res.} \textbf{28}, 27--30 (2000).

648:

649: \bibitem[{Schuster \emph{et~al.}(2000)Schuster, Fell \& Dandekar}]{schuster00}

650: Schuster, S., Fell, D.~A. \& Dandekar, T.

651: \newblock A general definition of metabolic pathways useful for systematic

652:   organization and analysis of complex metabolic networks.

653: \newblock \emph{Nat. Biotechnol.} \textbf{18}, 326--332 (2000).

654:

655: \bibitem[{Schuster \emph{et~al.}(2002)Schuster, Pfeiffer, Moldenhauer, Koch \&

656:   Dandekar}]{schuster02}

657: Schuster, S., Pfeiffer, T., Moldenhauer, F., Koch, I. \& Dandekar, T.

658: \newblock Exploring the pathway structure of metabolism: decomposition into

659:   subnetworks and application to {\it Microplasma pneumoniae}.

660: \newblock \emph{Bioinformatics} \textbf{18}, 351--361 (2002).

661:

662: \bibitem[{Jeong \emph{et~al.}(2001)Jeong, Mason, Barab\'asi \&

663:   Oltvai}]{jeong01}

664: Jeong, H., Mason, S.~P., Barab\'asi, A.-L. \& Oltvai, Z.~N.

665: \newblock Lethality and centrality in protein networks.

666: \newblock \emph{Nature} 41--42 (2001).

667:

668: \end{thebibliography}

669:

670:

671: \bigskip

672: \noindent

673: {\bf Acknowledgments}~~We thank L. Broadbelt, V. Hatzimanikatis,

674: A.~A. Moreira, E. T. Papoutsakis, M. Sales-Pardo, and D.~B. Stouffer

675: for stimulating discussions and helpful suggestions, and H. Ma and

676: A. P. Zeng for providing us with their metabolic networks'

677: database. R.G. thanks the Fulbright Program and the Spanish Ministry

678: of Education, Culture \& Sports. L.A.N.A. gratefully acknowledges the

679: support of a Searle Leadership Fund Award and of a NIH/NIGMS K-25

680: award.

681:

682: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

683: %%%%%%%%%%%%%%%%%%%%% FIGURES %%%%%%%%%%%%%%%%%%%%%%%%%%%%

684: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

685: \clearpage

686:

687: \begin{figure}

688: \centerline{

689: %

690: \includegraphics*[height=.7\textwidth]{modules}

691: }

692: \renewcommand{\baselinestretch}{1.0}

693: \caption{

694: %

695: Performance of module identification methods.

696: %

697: To test the performance of the method, we build ``random networks''

698: with known module structure. Each test network comprises 128 nodes

699: divided into 4 modules of 32 nodes. Each node is connected to the

700: other nodes in its module with probability $p_{i}$, and to nodes in

701: other modules with probability $p_{o}<p_{i}$. On average, thus, each

702: node is connected to $k_{out}=96\,p_{o}$ nodes in other modules and to

703: $k_{in}=31\,p_{i}$ in the same module. Additionally, $p_{i}$ and

704: $p_{o}$ are selected so that the average degree of the nodes is

705: $k=16$. We display networks with: {\bf a,} $k_{in}=15$ and

706: $k_{out}=1$; {\bf b,} $k_{in}=11$ and $k_{out}=5$; and {\bf c,}

707: $k_{in}=k_{out}=8$.

708: %

709: {\bf d,} The performance of a module identification algorithm is

710: typically defined as the fraction of correctly classified nodes. We

711: compare our algorithm to the Girvan-Newman

712: algorithm~\cite{girvan02,newman04}, which is the reference algorithm

713: for module identification ~\cite{newman03,newman04,radicchi04}. Note

714: that our method is 90\% accurate even when half of a node's links are

715: to nodes in outside modules.

716: %

717: {\bf e,} Our module-identification algorithm is stochastic, so

718: different runs yield, in principle, different partitions. To test the

719: robustness of the algorithm, we obtain 100 partitions of the network

720: depicted in {\bf c} and plot, for each pair of nodes in the network,

721: the fraction of times that they are classified in the same module. As

722: shown in the figure, most pairs of nodes are either always classified

723: in the same module (red) or never classified in the same module (dark

724: blue), which indicates that the solution is robust.

725: %

726: }

727: \label{f-perf-mod}

728: \end{figure}

729:

730:

731:

732:

733: \begin{figure}

734: \centerline{

735: %

736: \includegraphics*[width=\textwidth]{role-regions}

737: %

738: }

739: %

740: \renewcommand{\baselinestretch}{1.0}

741: \caption{

742: %

743: Roles and regions in the $zP$ parameters-space. {\bf a,} Each node in

744: a network can be characterized by its within-module degree and its

745: participation coefficient (see Methods for definitions.) We classify

746: nodes with $z\ge 2.5$ as module hubs and nodes $z<2.5$ as non-hubs. We

747: find that non-hub nodes can be naturally assigned into four different

748: roles: (R1) {\it ultra-peripheral nodes}, i.e., nodes with all its

749: links within their module; (R2) {\it peripheral nodes}, i.e., nodes

750: with most links within their module; (R3) {\it non-hub connector

751: nodes}, i.e., nodes with many links to other modules; and (R4) {\it

752: non-hub kinless nodes}, i.e., nodes with links homogeneously

753: distributed among all modules. We find that hub nodes can be naturally

754: assigned into three different roles: (R5) {\it provincial hubs}. i.e.,

755: hub nodes with the vast majority of links within their module; (R6)

756: {\it connector hubs}, i.e., hubs with many links to most of the other

757: modules; and (R7) {\it kinless hubs}, i.e., hubs with links

758: homogeneously distributed among all modules. (Supplementary

759: Information.)

760: %

761: {\bf b,} Metabolite role determination for the metabolic network {\it

762: E. coli}, as obtained from the MZ database. Each metabolite is

763: represented as a point in the $zP$ parameters-space, and is colored

764: according to its role.

765: %

766: {\bf c,} Same as {\bf b} but for the complete KEGG database.

767: %

768: }

769: \label{f-roledef}

770: \end{figure}

771:

772:

773:

774: \begin{figure}

775: \centerline{\includegraphics*[width=\textwidth]{modules-roles}}

776: %

777: \renewcommand{\baselinestretch}{1.0}

778: \caption{

779: %

780: ``Cartographic representation'' of the metabolic network of {\it

781: E. coli}.

782: %

783: Each circle represents a module and is colored according to the KEGG

784: pathway classification of the metabolites it contains. Certain

785: important nodes are depicted as triangles (non-hub connectors),

786: hexagons (connector hubs), and squares (provincial hubs). Interactions

787: between modules and nodes are depicted using lines, whith thickness

788: proportional to the number of actual links.

789: %

790: (Inset) Pajek-obtained representation of the entire metabolic network

791: of {\it E. coli} contains 473 metabolites and 574 links. Each node is

792: colored according to the ``main'' color of its module, as obtained

793: from the ``cartographic representation.''

794: %

795: }

796: \label{f-metab}

797: \end{figure}

798:

799:

800:

801: \begin{figure}

802: \centerline{

803: %

804: \includegraphics*[width=0.45\textwidth]{roleconserv-Ma}\quad

805: %

806: \includegraphics*[width=0.45\textwidth]{roleconserv}

807: }

808: %

809: \renewcommand{\baselinestretch}{1.0}

810: \caption{

811: %

812: Roles of metabolites and inter-species conservation. To quantify the

813: relation between roles and conservation, we calculate the loss rate

814: $p_{\rm lost}(R)$ of each metabolite (see Methods).

815: %

816: Each thin line in the graph corresponds to a comparison between two

817: species. Since we are interested in metabolites that are present in

818: some species but missing in others, metabolic networks of species

819: within the same super-kingdom---bacteria, eukaryotes, and

820: archaea---are usually too similar to provide statistically sound

821: information, especially for roles containing only a few

822: metabolites. Therefore, we consider in our analysis only pairs of

823: species that belong to different super-kingdoms. The thick line is the

824: average over all pairs of species.

825: %

826: The loss rate $p_{\rm lost}(R)$ is maximum for ultra-peripheral (R1)

827: nodes and minimum for connector hubs (R6). Remarkably, provincial hubs

828: (R5) have a significantly and consistently higher $p_{\rm lost}(R)$

829: than non-hub connectors (R3), even though the within-module degree and

830: the total degree of provincial hubs is larger.

831: %

832: Note that, out of the total 48 pair comparisons, only in two cases

833: $p_{\rm lost}(R)$ is lower for provincial hubs than for non-hub

834: connectors, while the opposite is true in 44 cases.

835: %

836: {\bf a,} Results obtained for the MZ database and {\bf b,} the

837: complete KEGG database.

838: %

839: }

840: \label{f-conservation}

841: \end{figure}

842:

843:

844: \end{document}

845:

846: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

847: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

848: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

849: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

850:

851: