q-bio0502035/paper.tex
1: \documentclass[12pt]{article}
2: 
3: \usepackage{graphicx}
4: \usepackage{scicite}
5: \usepackage{times}
6: \usepackage{natbib}
7: \usepackage{fancyhdr}
8: 
9: \topmargin 0.0cm
10: \oddsidemargin 0.2cm
11: \textwidth 16cm 
12: \textheight 21cm
13: \footskip 1.0cm
14: 
15: \pagestyle{fancy}
16: 
17: \lhead{\sffamily Letter to Nature}
18: \chead{}
19: \rhead{\sffamily Guimer\`a and Amaral}
20: 
21: \lfoot{}
22: \cfoot{\thepage}
23: \rfoot{}
24: 
25: \renewcommand{\headrulewidth}{0.5pt}
26: \renewcommand{\footrulewidth}{0pt}
27: 
28: \newenvironment{sciabstract}{%
29: \begin{quote} \bf}
30: {\end{quote}}
31: 
32: %
33: \bibliographystyle{nature}
34: %
35: 
36: %
37: %
38: \title{Functional cartography\\of complex metabolic networks}
39: %
40: \author{Roger Guimer\`a and Lu\'{\i}s A. Nunes Amaral\\
41: %%
42: \normalsize{NICO and Dept. Chemical and Biological Engineering} \\
43: \normalsize{Northwestern University, Evanston, IL 60208, USA}\\ \\
44: %
45: }
46: 
47: \date{}
48: 
49: 
50: %%%% Double space the manuscript
51: \renewcommand{\baselinestretch}{1.5}
52: 
53: \begin{document}
54: 
55: \maketitle
56: %
57: 
58: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
59: %%%%%%%%%%%%%%%%%%% ABSTRACT %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
60: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
61: 
62: \begin{sciabstract}
63: High-throughput techniques are leading to an explosive growth in the
64: size of biological databases and creating the opportunity to
65: revolutionize our understanding of life and disease. Interpretation of
66: these data remains, however, a major scientific challenge. Here, we
67: propose a methodology that enables us to extract and display
68: information contained in complex
69: networks~\cite{amaral00,albert02,amaral04}. Specifically, we
70: demonstrate that one can (i) find functional
71: modules\cite{hartwell99,girvan02} in complex networks, and (ii)
72: classify nodes into universal roles according to their pattern of
73: intra- and inter-module connections. The method thus yields a
74: ``cartographic representation'' of complex networks. Metabolic
75: networks \cite{jeong00,wagner01,ma03} are among the most challenging
76: biological networks and, arguably, the ones with more potential for
77: immediate applicability\cite{hatzimanikatis04}. We use our method to
78: analyze the metabolic networks of twelve organisms from three
79: different super-kingdoms. We find that, typically, 80\% of the nodes
80: are only connected to other nodes within their respective modules, and
81: that nodes with different roles are affected by different evolutionary
82: constraints and pressures. Remarkably, we find that low-degree
83: metabolites that connect different modules are more conserved than
84: hubs whose links are mostly within a single module.
85: \end{sciabstract}
86: 
87: 
88: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
89: %%%%%%%%%%%%%%%%%%% BODY OF THE PAPER %%%%%%%%%%%%%%%%%%%%%%
90: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
91: 
92: 
93: If one is to extract the significant information from the topology of
94: a large complex network, the knowledge of the role of each node is of
95: crucial importance. A cartographic analogy is helpful to illustrate
96: this point. Consider the network formed by all cities and towns in a
97: country---the nodes---and all the roads that connect them---the
98: links. It is clear that a map in which each city and town is
99: represented by a circle of fixed size and each road is represented by
100: a line of fixed width is hardly useful. Rather, real maps emphasize
101: capitals and important communication lines so that one can obtain
102: scale-specific information at a glance. Similarly, it is difficult, if
103: not impossible, to obtain information from a network with hundreds or
104: thousands of nodes and links, unless the information about nodes and
105: links is conveniently summarized. This is particularly true for
106: biological networks.
107: 
108: Here, we propose a methodology, which is based on the connectivity of
109: the nodes, that yields a ``cartographic representation'' of a complex
110: network.  The first step in our method is to identify the functional
111: modules\cite{hartwell99,girvan02} in the network. In the cartographic
112: picture, modules are analogous to countries or regions, and enable a
113: coarse-grained, and thus simplified, description of the network. Then,
114: we classify the nodes in the network into a small number of {\it
115: system-independent\/} ``universal roles.''
116: 
117: 
118: \bigskip
119: \noindent
120: {\it Modules.}
121: %
122: It is a matter of common experience that social networks have
123: communities of highly interconnected nodes that are less connected to
124: nodes in other communities. Such modular structures have been reported
125: not only in social
126: networks~\cite{girvan02,guimera03,newman03,arenas04}, but also in food
127: webs~\cite{krause03} and biochemical
128: networks~\cite{hartwell99,ravasz02,holme03,papin04}. It is widely
129: believed that the modular structure of complex networks plays a
130: critical role in their
131: functionality~\cite{hartwell99,ravasz02,papin04}. There is therefore a
132: clear need to develop algorithms to identify modules
133: accurately~\cite{girvan02,newman03,eriksen03,newman04,radicchi04,donetti04}.
134: 
135: We identify modules by maximizing the network's {\it
136: modularity}~\cite{newman03,newman04,guimera04c} using simulated
137: annealing~\cite{kirkpatrick83} (see Methods). Simulated annealing
138: enables us to carry out an exhaustive search and to minimize the
139: problem of finding sub-optimal partitions. It is noteworthy that, in
140: our method, one does not need to specify a priori the number of
141: modules; rather, this number is an outcome of the algorithm. Our
142: algorithm, which significantly outperforms the best algorithm in the
143: literature, is able to reliably identify modules in a network whose
144: nodes have as many as 50\% of their connections outside their own
145: module (Fig.~\ref{f-perf-mod}).
146: 
147: 
148: \bigskip
149: \noindent
150: {\it Roles in modular networks.}
151: %
152: It is plausible to surmise that the nodes in a network are connected
153: according to the {\it role\/} they fulfill. This fact has been long
154: recognized in the analysis of social
155: networks~\cite{wasserman94}. For example, in a classical
156: hierarchical organization, the CEO is not directly connected to plant
157: employees but is connected to the members of the board of
158: directors. Importantly, such a statement holds for virtually any
159: organization, that is, the role of CEO is defined irrespective of the
160: particular organization one considers.
161: 
162: We propose a new method to determine the role of a node in a complex
163: network. Our approach is based on the idea that nodes with the same
164: role should have similar topological properties\cite{guimera??e} (see
165: Supplementary Information for a discussion on how our approach relates
166: to previous work). We hypothesize that the role of a node can be
167: determined, to a great extent, by its {\it within-module degree} and
168: its {\it participation coefficient}, which define how the node is
169: positioned in its own module and with respect to other
170: modules\cite{rives03,han04} (see Methods). These two properties are
171: easily computed once the modules of a network are known.
172: 
173: The within-module degree $z_i$ measures how ``well-connected'' node
174: $i$ is to other nodes in the module. High values of $z_i$ indicate
175: high within-module degrees and vice versa. The participation
176: coefficient $P_i$ measures how ``well-distributed'' the links of node
177: $i$ are among different modules. The participation coefficient $P_i$
178: is close to one if its links are uniformly distributed among all the
179: modules and zero if all its links are within its own module.
180: 
181: We define heuristically seven different ``universal roles,'' each
182: defined by a different region in the $zP$ parameters-space
183: (Fig.~\ref{f-roledef}). According to the within-module degree, we
184: classify nodes with $z \ge 2.5$ as module hubs and nodes $z<2.5$ as
185: non-hubs.  Both hub and non-hub nodes are then more finely
186: characterized by using the values of the participation coefficient
187: (see Supplementary Information for a detailed justification of this
188: classification scheme, and for a discussion on possible alternatives).
189: 
190: We find that non-hub nodes can be naturally divided into four
191: different roles: (R1) {\it ultra-peripheral nodes}, i.e., nodes with
192: all its links within their module ($P \le 0.05$); (R2) {\it peripheral
193: nodes}, i.e., nodes with most links within their module ($0.05<P \le
194: 0.62$); (R3) {\it non-hub connector nodes}, i.e., nodes with many
195: links to other modules ($0.62<P \le 0.80$); and (R4) {\it non-hub
196: kinless nodes}, i.e., nodes with links homogeneously distributed among
197: all modules ($P>0.80$). We find that hub nodes can be naturally
198: divided into three different roles: (R5) {\it provincial hubs}. i.e.,
199: hub nodes with the vast majority of links within their module ($P \le
200: 0.30$); (R6) {\it connector hubs}, i.e., hubs with many links to most
201: of the other modules ($0.30<P \le 0.75$); and (R7) {\it kinless hubs},
202: i.e., hubs with links homogeneously distributed among all modules
203: ($P>0.75$).
204: 
205: 
206: \bigskip
207: \noindent
208: {\it Cartographic representation of metabolic networks.}
209: %
210: To test the applicability of our approach to complex biological
211: networks, we consider the metabolic
212: network~\cite{jeong00,wagner01,ravasz02,ma03,hatzimanikatis04} of
213: twelve organisms: four bacteria ({\it E. coli}, {\it B. subtilis},
214: {\it L. lactis}, and {\it T. elongatus}), four eukaryotes ({\it
215: S. cerevisiae}, {\it C. elegans}, {\it P. falciparum}, and {\it
216: H. sapiens}), and four archaea ({\it P. furiosus}, {\it A. pernix},
217: {\it A. fulgidus}, and {\it S. solfataricus}). In metabolic networks,
218: nodes represent metabolites and two nodes $i$ and $j$ are connected by
219: a link if there is a chemical reaction in which $i$ is a substrate and
220: $j$ a product, or vice versa. In our analysis, we use the database
221: developed by Ma and Zeng~\cite{ma03} (MZ) from the Kyoto Encyclopedia
222: of Genes and Genomes~\cite{kanehisa00} (KEGG). Importantly, the
223: results we report are not altered if we consider the complete KEGG
224: database instead (Figs.~\ref{f-roledef}c and \ref{f-conservation}b,
225: and Supplementary Information).
226: 
227: First, we identify the functional modules in the different metabolic
228: networks (Fig.~\ref{f-metab}). Finding modules in metabolic networks
229: based on purely topological properties is an extremely important
230: task. For example, Schuster {\it et al.} have reported on the
231: impossibility of obtaining elementary flux modes \cite{schuster00}
232: from complete metabolic networks due to the combinatorial explosion of
233: the number of such modes \cite{schuster02}. Our algorithm identifies
234: an average of 15 different modules in each metabolic network---with a
235: maximum of 19 for {\it E. coli} and {\it H. sapiens}, and a minimum of
236: 11 for {\it A. fulgidus}. As expected, the density of links within
237: each of the modules is significantly larger than between modules,
238: typically 100-1000 times larger (see Supplementary Information).
239: 
240: To assess how each of the modules is related to the pathways
241: traditionally defined in biology, we use the classification scheme
242: proposed in KEGG, which includes nine major pathways: carbohydrate
243: metabolism, energy metabolism, lipid metabolism, nucleotide
244: metabolism, amino acid metabolism, glycan biosynthesis and metabolism,
245: metabolism of cofactors and vitamins, biosynthesis of secondary
246: metabolites, and biodegradation of xenobiotics. Each metabolite in the
247: KEGG database is assigned to, at least, one pathway; thus, we can
248: determine to which pathways the metabolites in a given module
249: belong. We find that most modules contain metabolites mostly from one
250: major pathway. For example, in 17 of the 19 modules identified for
251: {\it E. coli}, more than one third of the metabolites belong to a
252: single pathway. Interestingly, some other modules---two in the case of
253: {\it E. coli}---cannot be trivially associated with a single
254: traditional pathway. These modules are typically central in the
255: metabolism and contain, mostly, metabolites that are classified in
256: KEGG as belonging to carbohydrate and amino acid metabolism.
257: 
258: Next, we identify the role of each metabolite. In
259: Fig.~\ref{f-roledef}b we show the roles identified in the metabolic
260: network of {\it E. coli}. Remarkably, other organisms display a
261: similar distribution of the nodes in the different roles, even though
262: they correspond to organisms that are very distant from an
263: evolutionary standpoint (see Supplementary Information). Role R1,
264: which contains ultra-peripheral metabolites with small degree and no
265: between-module links, comprises 76-86\% of all the metabolites in the
266: networks. This considerably simplifies the coarse-grained
267: representation of the network as these nodes do not need to be
268: identified separately. Note that this finding alone represents an
269: important step towards the goal of extracting scale-specific
270: information from complex networks.
271: 
272: \bigskip
273: \noindent
274: {\it Metabolite role and inter-species conservation.}
275: %
276: The information about modules and roles enables us to build a
277: ``cartographic representation'' of the metabolic network of, for
278: example, {\it E. coli} (Fig.~\ref{f-metab}). This representation
279: enables us to recover relevant biological information. For instance,
280: we find that the metabolism is mostly organized around the module
281: containing pyruvate, which, in turn, is strongly connected to the
282: module whose hub is acetyl-CoA. These two molecules are key to connect
283: the metabolism of carbohydrates, amino acids, and lipids to the TCA
284: cycle from which ATP is obtained. These two modules are connected to
285: more peripheral ones by key metabolites such as D-glyceraldehyde
286: 3-phosphate and D-fructose 6-phosphate (which connect to the glucose
287: and galactose metabolisms), D-ribose 5-phosphate (which connects to
288: the metabolism of certain nucleotides), and glycerone phosphate (which
289: connects to the metabolism of certain lipids).
290: 
291: Importantly, our analysis also uncovers nodes with key connector roles
292: that take part in only a small but fundamental set of reactions. For
293: example, N-carbamoyl-L-aspartate takes part in only three reactions
294: but is vital because it connects the pyrimidine metabolism, whose hub
295: is uracil, to the core of the metabolism through the alanine and
296: aspartate metabolism. The potential importance of such non-hub
297: connectors points to another consideration. It is a plausible {\it
298: hypothesis\/} that nodes with different roles are under different
299: evolutionary constraints and pressures. In particular, one expects
300: that nodes with structurally relevant roles are more necessary and
301: therefore more conserved across species.
302: 
303: To quantify the relation between roles and conservation, we define the
304: loss rate $p_{\rm lost}(R)$ (see Methods). Structurally relevant roles
305: are expected to have low values of $p_{\rm lost}(R)$ and vice
306: versa. Remarkably, we find that the different roles have, indeed,
307: different loss rates (Fig.~\ref{f-conservation}). As expected,
308: ultra-peripheral nodes (role R1) have the highest loss rate while
309: connector hubs (role R6) are the most conserved across all species
310: considered.
311: 
312: The results for the comparison of $p_{\rm lost}(R)$ for
313: ultra-peripheral nodes and connector hubs is illustrative, but hardly
314: surprising. The comparison of $p_{\rm lost}(R)$ for non-hub connectors
315: (role R3) and provincial hubs (role R5), however, yields a surprising
316: and remarkable finding. The metabolites in the provincial hubs class
317: have many within-module connections, sometimes as much as five
318: standard deviations more connections than the average node in the
319: module. Conversely, non-hub connector metabolites have few links
320: relative to other nodes in their modules---and fewer total connections
321: than the metabolites in role R5 (see Supplementary Figs.~S12b,c). The
322: links of non-hub connectors, however, are distributed among {\it
323: several different modules}, while the links of provincial hubs are
324: mainly within their modules. We find that non-hub connectors are
325: systematically and significantly more conserved than provincial hub
326: metabolites (Fig.~\ref{f-conservation}).
327: 
328: A possible explanation for the high degree of conservation of non-hub
329: connectors is the following. Connector nodes are responsible for
330: inter-module fluxes. These modules are, otherwise, poorly connected or
331: not connected at all to each other, so the elimination of connector
332: metabolites will likely have a large impact on the global structure of
333: fluxes in the network. On the contrary, the pathways in which
334: provincial hubs are involved may be backed up within the module, in
335: such a way that elimination of these metabolites may have a
336: comparatively smaller impact, which, in addition, would likely be
337: confined to the module containing the provincial hub.
338: 
339: Our results therefore point to the need to consider each complex
340: biological network as a whole, instead of focusing on local
341: properties. In protein networks, for example, it has been reported
342: that hubs are more essential than non-hubs
343: \cite{jeong01}. Notwithstanding the relevance of such a finding, our
344: results suggest that the global role of nodes in the network might be
345: a better indicator of their importance than degree~\cite{han04}.
346: 
347: Our ``cartography'' provides a scale-specific method to process the
348: information contained in the structure of complex networks, and to
349: extract knowledge about the function carried out by the network and
350: its constituents. An open question is how to adapt current
351: module-detection algorithms to networks with a hierarchical structure.
352: 
353: For metabolic networks, a comparatively well studied and well
354: understood case, our method allows us to recover firmly established
355: biological facts, and to uncover important new results, such as the
356: significant conservation of non-hub connector metabolites. Similar
357: results can be expected when our method is applied to other complex
358: networks that are not as well studied as metabolic networks. Among
359: those, protein interaction and gene regulation networks may be the
360: most significant.
361: 
362: 
363: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
364: %%%%%%%%%%%%%%%%%%%%% METHODS %%%%%%%%%%%%%%%%%%%%%%%%%%%%
365: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
366: %
367: %
368: \section*{Methods}
369: 
370: \subsection*{Modularity}
371: 
372: For a given partition of the nodes of a network into modules, the
373: modularity $M$ of this partition
374: is~\cite{newman03,newman04,guimera04c}
375: %
376: \begin{equation}
377: M\equiv\sum_{s=1}^{N_M}\left[\frac{l_{s}}{L}-
378: \left(\frac{d_s}{2L}\right)^2\right]\,,
379: \label{e-modularity}
380: \end{equation}
381: %
382: where $N_M$ is the number of modules, $L$ is the number of links in
383: the network, $l_{s}$ is the number of links between nodes in module
384: $s$, and $d_s$ is the sum of the degrees of the nodes in module
385: $s$. The rationale for this definition of modularity is the
386: following. A good partition of a network into modules must comprise
387: many within-module links and as few as possible between-module
388: links. However, if one just tries to minimize the number of
389: between-module links (or, equivalently, maximize the number of
390: within-module links) the optimal partition consists of a single module
391: and no between-module links. Equation (\ref{e-modularity}) addresses
392: this difficulty by imposing that $M=0$ if nodes are placed at random
393: into modules {\it or} if all nodes are in the same
394: cluster~\cite{newman03,newman04,guimera04c}.
395: 
396: The objective of a module identification algorithm is to find the
397: partition with largest modularity, and several methods have been
398: proposed to attain such a goal. Most of them rely on heuristic
399: procedures and use $M$---or a similar measure---only to assess their
400: performance. In contrast, we use simulated
401: annealing~\cite{kirkpatrick83} to find the partition with the largest
402: modularity.
403: 
404: \subsection*{Simulated annealing for module identification}
405: 
406: Simulated annealing\cite{kirkpatrick83} is a stochastic optimization
407: technique that enables one to find ``low cost'' configurations without
408: getting trapped in ``high-cost'' local minima. This is achieved by
409: introducing a {\it computational temperature} $T$. When $T$ is high,
410: the system can explore configurations of high cost while at low $T$
411: the system only explores low cost regions. By starting at high $T$ and
412: slowly decreasing $T$, the system descends gradually toward deep
413: minima, eventually overcoming small cost barriers.
414: 
415: When identifying modules, the objective is to maximize the modularity
416: and, thus, the cost is $C=-M$, where $M$ is the modularity as defined
417: in Eq.~(\ref{e-modularity}). At each temperature, we perform a number
418: of random updates and accept them with probability
419: %
420: \begin{equation}
421: p=\left\{ \begin{array}{lcl}
422:         1		&       \quad\mbox{if} &  C_f \le C_i\\
423:         \exp{\left(-\frac{C_f-C_i}{T}\right)} & \quad\mbox{if} &       C_f > C_i
424:         \end{array}\right. 
425: \end{equation}
426: %
427: where $C_f$ is the cost after the update and $C_i$ is the cost before
428: the update.
429: 
430: Specifically, at each $T$ we propose $n_i=fS^2$ individual node
431: movements from one module to another, where $S$ is the number of nodes
432: in the network. We also propose $n_c=fS$ collective movements, which
433: involve either the merging two modules or splitting a module. For $f$
434: we typically choose $f=1$. After the movements are evaluated at a
435: certain $T$, the system is cooled down to $T'=cT$, with $c=0.995$.
436: 
437: \subsection*{Within-module degree and participation coefficient}
438: 
439: Each module can be organized in very different ways, ranging from
440: totally centralized---with one or a few nodes connected to all the
441: others---to totally decentralized---with all nodes having similar
442: connectivities. Nodes with similar roles are expected to have similar
443: relative within-module connectivity. If $\kappa_{i}$ is the number of
444: links of node $i$ to other nodes in its module $s_i$,
445: $\overline{\kappa}_{s_i}$ is the average of $\kappa$ over all the
446: nodes in $s_i$, and $\sigma_{\kappa_{s_i}}$ is the standard deviation
447: of $\kappa$ in $s_i$, then
448: %
449: \begin{equation}
450: z_i = \frac{\kappa_i - \overline{\kappa}_{s_i}}{\sigma_{\kappa_{s_i}}}
451: \end{equation}
452: %
453: is the so-called $z$-score. The within-module degree $z$-score
454: measures how ``well-connected'' node $i$ is to other nodes in the
455: module.
456: 
457: Different roles can also arise because of the connections of a node to
458: modules other than its own. For example, two nodes with the same
459: $z$-score will play different roles if one of them is connected to
460: several nodes in other modules while the other is not. We define the
461: participation coefficient $P_i$ of node $i$ as
462: %
463: \begin{equation}
464: P_i=1-\sum_{s=1}^{N_M}\left(\frac{\kappa_{is}}{k_i} \right)^2
465: \end{equation}
466: %
467: where $\kappa_{is}$ is the number of links of node $i$ to nodes in
468: module $s$, and $k_i$ is the total degree of node $i$. The
469: participation coefficient of a node is therefore close to one if its
470: links are uniformly distributed among all the modules and zero if all
471: its links are within its own module.
472: 
473: 
474: 
475: \subsection*{Loss rate}
476: 
477: To quantify the relation between roles and conservation, we calculate
478: to which extent metabolites are conserved in the different species
479: depending on the role they play. Specifically, for a pair of species,
480: $A$ and $B$, we define the loss rate as the probability
481: $p(R_A=0|R_B=R) \equiv p_{\rm lost}(R)$ that a metabolite is not
482: present in one of the species ($R_A=0$) given that it plays role $R$
483: in the other species ($R_B=R$). Structurally relevant roles are
484: expected to have low values of $p_{\rm lost}(R)$ and vice versa.
485: 
486: 
487: 
488: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
489: %%%%%%%%%%%%%%%%%% REFERENCES %%%%%%%%%%%%%%%%%%%%%%%%%%%%
490: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
491: %
492: \begin{thebibliography}{30}
493: \expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi
494: \expandafter\ifx\csname url\endcsname\relax
495:   \def\url#1{\texttt{#1}}\fi
496: \expandafter\ifx\csname urlprefix\endcsname\relax\def\urlprefix{URL }\fi
497: 
498: \bibitem[{Amaral \emph{et~al.}(2000)Amaral, Scala, Barthel\'emy \&
499:   Stanley}]{amaral00}
500: Amaral, L. A.~N., Scala, A., Barthel\'emy, M. \& Stanley, H.~E.
501: \newblock Classes of small-world networks.
502: \newblock \emph{Proc. Natl. Acad. Sci. USA} \textbf{97}, 11149--11152 (2000).
503: 
504: \bibitem[{Albert \& Barab\'asi(2002)}]{albert02}
505: Albert, R. \& Barab\'asi, A.-L.
506: \newblock Statistical mechanics of complex networks.
507: \newblock \emph{Rev. Mod. Phys.} \textbf{74}, 47--97 (2002).
508: 
509: \bibitem[{Amaral \& Ottino(2004)}]{amaral04}
510: Amaral, L. A.~N. \& Ottino, J.
511: \newblock Complex networks: \protect{Augmenting} the framework for the study of
512:   complex systems.
513: \newblock \emph{Eur. Phys. J. B} \textbf{38}, 147--162 (2004).
514: 
515: \bibitem[{Hartwell \emph{et~al.}(1999)Hartwell, Hopfield, Leibler \&
516:   Murray}]{hartwell99}
517: Hartwell, L.~H., Hopfield, J.~J., Leibler, S. \& Murray, A.~W.
518: \newblock From molecular to modular biology.
519: \newblock \emph{Nature} \textbf{402}, C47--C52 (1999).
520: 
521: \bibitem[{Girvan \& Newman(2002)}]{girvan02}
522: Girvan, M. \& Newman, M. E.~J.
523: \newblock Community structure in social and biological networks.
524: \newblock \emph{Proc. Natl. Acad. Sci. USA} \textbf{99}, 7821--7826 (2002).
525: 
526: \bibitem[{Jeong \emph{et~al.}(2000)Jeong, Tombor, Albert, Oltvai \&
527:   Barab\'asi}]{jeong00}
528: Jeong, H., Tombor, B., Albert, R., Oltvai, Z.~N. \& Barab\'asi, A.~L.
529: \newblock The large-scale organization of metabolic networks.
530: \newblock \emph{Nature} \textbf{407}, 651--654 (2000).
531: 
532: \bibitem[{Wagner \& Fell(2001)}]{wagner01}
533: Wagner, A. \& Fell, D.~A.
534: \newblock The small world inside large metabolical networks.
535: \newblock \emph{Proc. Roy. Soc. B} \textbf{268}, 1803--1810 (2001).
536: 
537: \bibitem[{Ma \& Zeng(2003)}]{ma03}
538: Ma, H. \& Zeng, A.-P.
539: \newblock Reconstruction of metabolic networks from genome data and analysis of
540:   their global structure for various organisms.
541: \newblock \emph{Bioinformatics} \textbf{19}, 270--277 (2003).
542: 
543: \bibitem[{Hatzimanikatis \emph{et~al.}(2004)Hatzimanikatis, Li, Ionita \&
544:   Broadbelt}]{hatzimanikatis04}
545: Hatzimanikatis, V., Li, C., Ionita, J.~A. \& Broadbelt, L.
546: \newblock Metabolic networks: enzyme function and metabolite structure.
547: \newblock \emph{Curr. Opin. Struc. Biol.} \textbf{14}, 300--306 (2004).
548: 
549: \bibitem[{Guimer\`a \emph{et~al.}(2003)Guimer\`a, Danon, D\'{\i}az-Guilera,
550:   Giralt \& Arenas}]{guimera03}
551: Guimer\`a, R., Danon, L., D\'{\i}az-Guilera, A., Giralt, F. \& Arenas, A.
552: \newblock Self-similar community structure in a network of human interactions.
553: \newblock \emph{Phys. Rev. E} \textbf{68}, art. no. 065103 (2003).
554: 
555: \bibitem[{Newman \& Girvan(2004)}]{newman03}
556: Newman, M. E.~J. \& Girvan, M.
557: \newblock Finding and evaluating community structure in networks.
558: \newblock \emph{Phys. Rev. E} \textbf{69}, art. no. 026113 (2004).
559: 
560: \bibitem[{Arenas \emph{et~al.}(2004)Arenas, Danon, D\'{\i}az-Guilera, Gleiser
561:   \& Guimer\`a}]{arenas04}
562: Arenas, A., Danon, L., D\'{\i}az-Guilera, A., Gleiser, P.~M. \& Guimer\`a, R.
563: \newblock Community analysis in social networks.
564: \newblock \emph{Eur. Phys. J. B} \textbf{38}, 373--380 (2004).
565: 
566: \bibitem[{Krause \emph{et~al.}(2003)Krause, Frank, Mason, Ulanowicz \&
567:   Taylor}]{krause03}
568: Krause, A.~E., Frank, K.~A., Mason, D.~M., Ulanowicz, R.~E. \& Taylor, W.~W.
569: \newblock Compartments revealed in food-web structure.
570: \newblock \emph{Nature} \textbf{426}, 282--285 (2003).
571: 
572: \bibitem[{Ravasz \emph{et~al.}(2002)Ravasz, Somera, Mongru, Oltvai \&
573:   Barab\'asi}]{ravasz02}
574: Ravasz, E., Somera, A.~L., Mongru, D.~A., Oltvai, Z.~N. \& Barab\'asi, A.-L.
575: \newblock Hierarchical organization of modularity in metabolic networks.
576: \newblock \emph{Science} \textbf{297}, 1551--1555 (2002).
577: 
578: \bibitem[{Holme \& Huss(2003)}]{holme03}
579: Holme, P. \& Huss, M.
580: \newblock Subnetwork hierarchies of biochemical pathways.
581: \newblock \emph{Bioinformatics} \textbf{19}, 532--538 (2003).
582: 
583: \bibitem[{Papin \emph{et~al.}(2004)Papin, Reed \& Palsson}]{papin04}
584: Papin, J.~A., Reed, J.~L. \& Palsson, B.~O.
585: \newblock Hierarchical thinking in network biology: the unbiased modularization
586:   of biochemical networks.
587: \newblock \emph{Trends Biochem. Sci.} \textbf{29}, 641--647 (2004).
588: 
589: \bibitem[{Eriksen \emph{et~al.}(2003)Eriksen, Simonsen, Maslov \&
590:   Sneppen}]{eriksen03}
591: Eriksen, K.~A., Simonsen, I., Maslov, S. \& Sneppen, K.
592: \newblock Modularity and extreme edges of the \protect{Internet}.
593: \newblock \emph{Phys. Rev. Lett.} \textbf{90}, art. no. 148701 (2003).
594: 
595: \bibitem[{Newman(2004)}]{newman04}
596: Newman, M. E.~J.
597: \newblock Fast algorithm for detecting community structure in networks.
598: \newblock \emph{Phys. Rev. E} \textbf{69}, art. no. 066133 (2004).
599: 
600: \bibitem[{Radicchi \emph{et~al.}(2004)Radicchi, Castellano, Cecconi, Loreto \&
601:   Parisi}]{radicchi04}
602: Radicchi, F., Castellano, C., Cecconi, F., Loreto, V. \& Parisi, D.
603: \newblock Defining and identifying communities in networks.
604: \newblock \emph{Proc. Natl. Acad. Sci. USA} \textbf{101}, 2658--2663 (2004).
605: 
606: \bibitem[{Donetti \& \protect{Mu\~{n}oz}(2004)}]{donetti04}
607: Donetti, L. \& \protect{Mu\~{n}oz}, M.~A.
608: \newblock Detecting network communities: \protect{A} new systematic and
609:   efficient algorithm.
610: \newblock \emph{J. Stat. Mech. Theor. Exp.} P10012 (2004).
611: 
612: \bibitem[{Guimer\`a \emph{et~al.}(2004)Guimer\`a, Sales-Pardo \&
613:   Amaral}]{guimera04c}
614: Guimer\`a, R., Sales-Pardo, M. \& Amaral, L. A.~N.
615: \newblock Modularity from fluctuations in random graphs and complex networks.
616: \newblock \emph{Phys. Rev. E} \textbf{70}, art. no. 025101 (2004).
617: 
618: \bibitem[{Kirkpatrick \emph{et~al.}(1983)Kirkpatrick, Gelatt \&
619:   Vecchi}]{kirkpatrick83}
620: Kirkpatrick, S., Gelatt, C.~D. \& Vecchi, M.~P.
621: \newblock Optimization by simulated annealing.
622: \newblock \emph{Science} \textbf{220}, 671--680 (1983).
623: 
624: \bibitem[{Wasserman \& Faust(1994)}]{wasserman94}
625: Wasserman, S. \& Faust, K.
626: \newblock \emph{Social Network Analysis} (Cambridge University Press,
627:   Cambridge, U.K., 1994).
628: 
629: \bibitem[{Guimer\`a \& Amaral(2004)}]{guimera??e}
630: Guimer\`a, R. \& Amaral, L. A.~N.
631: \newblock \emph{J. Stat. Mech. Theor. Exp.} submitted (2004).
632: 
633: \bibitem[{Rives \& Galitski(2003)}]{rives03}
634: Rives, A.~W. \& Galitski, T.
635: \newblock Modular organization of cellular networks.
636: \newblock \emph{Proc. Natl. Acad. Sci. USA} \textbf{100}, 1128--1133 (2003).
637: 
638: \bibitem[{Han \emph{et~al.}(2004)}]{han04}
639: Han, J.-D.~J. \emph{et~al.}
640: \newblock Evidence for dinamically organized modularity in the yeast
641:   protein-protein interaction network.
642: \newblock \emph{Nature} \textbf{430}, 88--93 (2004).
643: 
644: \bibitem[{Kanehisa \& Goto(2000)}]{kanehisa00}
645: Kanehisa, M. \& Goto, S.
646: \newblock KEGG: Kyoto Encyclopedia of Genes and Genomes.
647: \newblock \emph{Nucleic Acids Res.} \textbf{28}, 27--30 (2000).
648: 
649: \bibitem[{Schuster \emph{et~al.}(2000)Schuster, Fell \& Dandekar}]{schuster00}
650: Schuster, S., Fell, D.~A. \& Dandekar, T.
651: \newblock A general definition of metabolic pathways useful for systematic
652:   organization and analysis of complex metabolic networks.
653: \newblock \emph{Nat. Biotechnol.} \textbf{18}, 326--332 (2000).
654: 
655: \bibitem[{Schuster \emph{et~al.}(2002)Schuster, Pfeiffer, Moldenhauer, Koch \&
656:   Dandekar}]{schuster02}
657: Schuster, S., Pfeiffer, T., Moldenhauer, F., Koch, I. \& Dandekar, T.
658: \newblock Exploring the pathway structure of metabolism: decomposition into
659:   subnetworks and application to {\it Microplasma pneumoniae}.
660: \newblock \emph{Bioinformatics} \textbf{18}, 351--361 (2002).
661: 
662: \bibitem[{Jeong \emph{et~al.}(2001)Jeong, Mason, Barab\'asi \&
663:   Oltvai}]{jeong01}
664: Jeong, H., Mason, S.~P., Barab\'asi, A.-L. \& Oltvai, Z.~N.
665: \newblock Lethality and centrality in protein networks.
666: \newblock \emph{Nature} 41--42 (2001).
667: 
668: \end{thebibliography}
669: 
670: 
671: \bigskip
672: \noindent
673: {\bf Acknowledgments}~~We thank L. Broadbelt, V. Hatzimanikatis,
674: A.~A. Moreira, E. T. Papoutsakis, M. Sales-Pardo, and D.~B. Stouffer
675: for stimulating discussions and helpful suggestions, and H. Ma and
676: A. P. Zeng for providing us with their metabolic networks'
677: database. R.G. thanks the Fulbright Program and the Spanish Ministry
678: of Education, Culture \& Sports. L.A.N.A. gratefully acknowledges the
679: support of a Searle Leadership Fund Award and of a NIH/NIGMS K-25
680: award.
681: 
682: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
683: %%%%%%%%%%%%%%%%%%%%% FIGURES %%%%%%%%%%%%%%%%%%%%%%%%%%%%
684: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
685: \clearpage
686: 
687: \begin{figure} 
688: \centerline{
689: %
690: \includegraphics*[height=.7\textwidth]{modules}
691: }
692: \renewcommand{\baselinestretch}{1.0}
693: \caption{
694: %
695: Performance of module identification methods.
696: %
697: To test the performance of the method, we build ``random networks''
698: with known module structure. Each test network comprises 128 nodes
699: divided into 4 modules of 32 nodes. Each node is connected to the
700: other nodes in its module with probability $p_{i}$, and to nodes in
701: other modules with probability $p_{o}<p_{i}$. On average, thus, each
702: node is connected to $k_{out}=96\,p_{o}$ nodes in other modules and to
703: $k_{in}=31\,p_{i}$ in the same module. Additionally, $p_{i}$ and
704: $p_{o}$ are selected so that the average degree of the nodes is
705: $k=16$. We display networks with: {\bf a,} $k_{in}=15$ and
706: $k_{out}=1$; {\bf b,} $k_{in}=11$ and $k_{out}=5$; and {\bf c,}
707: $k_{in}=k_{out}=8$.
708: %
709: {\bf d,} The performance of a module identification algorithm is
710: typically defined as the fraction of correctly classified nodes. We
711: compare our algorithm to the Girvan-Newman
712: algorithm~\cite{girvan02,newman04}, which is the reference algorithm
713: for module identification ~\cite{newman03,newman04,radicchi04}. Note
714: that our method is 90\% accurate even when half of a node's links are
715: to nodes in outside modules.
716: %
717: {\bf e,} Our module-identification algorithm is stochastic, so
718: different runs yield, in principle, different partitions. To test the
719: robustness of the algorithm, we obtain 100 partitions of the network
720: depicted in {\bf c} and plot, for each pair of nodes in the network,
721: the fraction of times that they are classified in the same module. As
722: shown in the figure, most pairs of nodes are either always classified
723: in the same module (red) or never classified in the same module (dark
724: blue), which indicates that the solution is robust. 
725: %
726: }
727: \label{f-perf-mod}
728: \end{figure}
729: 
730: 
731: 
732: 
733: \begin{figure} 
734: \centerline{
735: %
736: \includegraphics*[width=\textwidth]{role-regions}
737: %
738: }
739: %
740: \renewcommand{\baselinestretch}{1.0}
741: \caption{
742: %
743: Roles and regions in the $zP$ parameters-space. {\bf a,} Each node in
744: a network can be characterized by its within-module degree and its
745: participation coefficient (see Methods for definitions.) We classify
746: nodes with $z\ge 2.5$ as module hubs and nodes $z<2.5$ as non-hubs. We
747: find that non-hub nodes can be naturally assigned into four different
748: roles: (R1) {\it ultra-peripheral nodes}, i.e., nodes with all its
749: links within their module; (R2) {\it peripheral nodes}, i.e., nodes
750: with most links within their module; (R3) {\it non-hub connector
751: nodes}, i.e., nodes with many links to other modules; and (R4) {\it
752: non-hub kinless nodes}, i.e., nodes with links homogeneously
753: distributed among all modules. We find that hub nodes can be naturally
754: assigned into three different roles: (R5) {\it provincial hubs}. i.e.,
755: hub nodes with the vast majority of links within their module; (R6)
756: {\it connector hubs}, i.e., hubs with many links to most of the other
757: modules; and (R7) {\it kinless hubs}, i.e., hubs with links
758: homogeneously distributed among all modules. (Supplementary
759: Information.)
760: %
761: {\bf b,} Metabolite role determination for the metabolic network {\it
762: E. coli}, as obtained from the MZ database. Each metabolite is
763: represented as a point in the $zP$ parameters-space, and is colored
764: according to its role.
765: %
766: {\bf c,} Same as {\bf b} but for the complete KEGG database.
767: %
768: }
769: \label{f-roledef}
770: \end{figure}
771: 
772: 
773: 
774: \begin{figure}
775: \centerline{\includegraphics*[width=\textwidth]{modules-roles}}
776: %
777: \renewcommand{\baselinestretch}{1.0}
778: \caption{
779: %
780: ``Cartographic representation'' of the metabolic network of {\it
781: E. coli}.
782: %
783: Each circle represents a module and is colored according to the KEGG
784: pathway classification of the metabolites it contains. Certain
785: important nodes are depicted as triangles (non-hub connectors),
786: hexagons (connector hubs), and squares (provincial hubs). Interactions
787: between modules and nodes are depicted using lines, whith thickness
788: proportional to the number of actual links.
789: %
790: (Inset) Pajek-obtained representation of the entire metabolic network
791: of {\it E. coli} contains 473 metabolites and 574 links. Each node is
792: colored according to the ``main'' color of its module, as obtained
793: from the ``cartographic representation.''
794: %
795: }
796: \label{f-metab}
797: \end{figure}
798: 
799: 
800: 
801: \begin{figure}
802: \centerline{
803: %
804: \includegraphics*[width=0.45\textwidth]{roleconserv-Ma}\quad
805: %
806: \includegraphics*[width=0.45\textwidth]{roleconserv}
807: }
808: %
809: \renewcommand{\baselinestretch}{1.0}
810: \caption{
811: %
812: Roles of metabolites and inter-species conservation. To quantify the
813: relation between roles and conservation, we calculate the loss rate
814: $p_{\rm lost}(R)$ of each metabolite (see Methods).
815: %
816: Each thin line in the graph corresponds to a comparison between two
817: species. Since we are interested in metabolites that are present in
818: some species but missing in others, metabolic networks of species
819: within the same super-kingdom---bacteria, eukaryotes, and
820: archaea---are usually too similar to provide statistically sound
821: information, especially for roles containing only a few
822: metabolites. Therefore, we consider in our analysis only pairs of
823: species that belong to different super-kingdoms. The thick line is the
824: average over all pairs of species.
825: %
826: The loss rate $p_{\rm lost}(R)$ is maximum for ultra-peripheral (R1)
827: nodes and minimum for connector hubs (R6). Remarkably, provincial hubs
828: (R5) have a significantly and consistently higher $p_{\rm lost}(R)$
829: than non-hub connectors (R3), even though the within-module degree and
830: the total degree of provincial hubs is larger.
831: %
832: Note that, out of the total 48 pair comparisons, only in two cases
833: $p_{\rm lost}(R)$ is lower for provincial hubs than for non-hub
834: connectors, while the opposite is true in 44 cases.
835: %
836: {\bf a,} Results obtained for the MZ database and {\bf b,} the
837: complete KEGG database.
838: %
839: }
840: \label{f-conservation}
841: \end{figure}
842: 
843: 
844: \end{document}
845: 
846: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
847: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
848: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
849: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
850: 
851: