q-bio0312009/pin.tex
1: \documentclass[prl,aps,twocolumn,showpacs,showkeys]{revtex4}
2: \usepackage{epsf,amssymb,amsmath}
3: 
4: \begin{document}
5: \title{ \sffamily\bfseries\Large
6: Evolution of the Protein Interaction Network of Budding Yeast: \\
7: Role of the Protein Family Compatibility Constraint\\}
8: 
9: \author{\sc K.-I. Goh, B. Kahng, and D. Kim}
10: 
11: \affiliation{\mbox{School of Physics and Center for 
12: Theoretical Physics, Seoul National University NS50, 
13: Seoul 151-747, Korea}}
14: \date{\today}
15: 
16: \begin{abstract}
17: Understanding of how protein interaction networks (PIN) of living 
18: organisms have evolved or are organized can be the first stepping 
19: stone in unveiling how life works on a fundamental basis.
20: Here, we introduce a new {\em in-silico} evolution model of
21: the PIN of budding yeast, {\em Saccharomyces cerevisiae}; 
22: the model is composed of the PIN and the protein family network. 
23: The basic ingredient of the 
24: model includes family compatibility which constrains
25: the potential binding ability of a protein,
26: as well as the previously proposed 
27: gene duplication, divergence, and mutation.
28: We investigate various structural properties of our model network 
29: with parameter values relevant to budding yeast and 
30: find that the model successfully reproduces the 
31: empirical data. 
32: \end{abstract}
33: \pacs{89.75.Hc, 87.15.Aa }
34: \keywords{Protein interaction network, Family compatibility}
35: \maketitle
36: 
37: Studying complex systems by means of their network representation
38: has attracted much attention recently \cite{rmp,advphys,siam,saemulli,dslee,han}.
39: The cell, one of the best examples of complex systems, can also
40: be viewed as a network:
41: The cellular components, such as genes, proteins, and other 
42: biological molecules, connected by all physiologically
43: relevant interactions, form a full weblike molecular architecture
44: in a cell~\cite{pyramid,network-biology}. 
45: Among the various levels, the protein interaction network (PIN) 
46: plays a pivotal role as it acts as a basic physical protocol 
47: of cooperative functioning in many physiological processes.
48: In the PIN, proteins are viewed 
49: as nodes, and two proteins are linked if they physically 
50: contact each other. 
51: Thanks to recent progress in high-throughput experimental techniques, 
52: the data set of protein interactions for budding yeast,
53: {\em Saccharomyces cerevisiae}, has been firmly 
54: established in the last few years \cite{uetz,ito,gavin,ho,tong,mips,dip,bind}. 
55: Thus, it offers a good testbed to understand how it has evolved
56: to form its status quo from basic evolutionary rules.
57: In this paper, our aim is to introduce a simple evolutionary model
58: to reproduce the structural properties of the PIN of budding yeast,
59: thereby deepening our understanding of the driving force for
60: cellular evolution.
61: 
62: At a certain level of abstraction, one may view a protein as 
63: an assembly of domains. It is domains that offer structural 
64: and functional units. They act as basic units in 
65: the interactions between proteins and in the evolution 
66: of protein structures. Proteins are grouped into so-called protein 
67: families or superfamilies
68: according to the domain structure within them \cite{alberts}.
69: The proteins within a family are monophyletic;
70: that is, they originate from a common ancestor
71: and are fairly well conserved during evolution. 
72: The protein family network (PFN) is defined as the one 
73: whose nodes are protein families, and two families are connected 
74: if any of the domains within them simultaneously
75: occur in a single protein or any proteins within
76: them interact with each other \cite{jpark}.
77: The distributions of the degrees and the sizes of the families in the PFN
78: also follow power laws \cite{jpark,huynen}.
79: Given that the entities of proteins and protein families
80: are not separable but linked via domains as intermediates,
81: it is desirable to unify their evolutions into a single framework. 
82: 
83: So far, several {\it in-silico} evolution models have been proposed 
84: for the yeast PIN \cite{sole,vazquez,berg,kim,chung}.
85: A distinguishing aspect in the evolution of the PIN compared
86: with that of other complex networks is the concept of ``evolution 
87: by duplication''~\cite{ohno}:
88: A new protein is thought to be created mainly by gene duplication.
89: Subsequently, the duplicate protein may lose redundant interactions
90: endowed from its ancestor to reduce redundancy, 
91: which is called divergence (or diversification).
92: A protein also gains new interactions with other
93: proteins via mutation. These three processes,
94: duplication--divergence--mutation, have been regarded as the basic
95: ingredients in the evolution of the PIN. While those {\it in-silico}
96: models~\cite{sole,vazquez,kim,chung,berg}
97: were successful in generating a fat-tail or power-law behavior in
98: the degree distribution,  
99: they hardly reproduced other structural properties of the yeast 
100: PIN network, such as the clustering coefficient, the assortativity,
101: {\it etc.}, which we will specify in more detail shortly. 
102: The model we introduce here, however, can incorporate other 
103: structural properties of the yeast PIN as well as the degree distribution.
104: To this end, we introduce the concept of 
105: ``family compatibility'' (FC):
106: An interaction between two proteins is possible only when
107: the corresponding families they belong to are compatible,
108: and only those families linked via the PFN are compatible with one another.
109: With this, we realize the effective structural constraint 
110: in physical binding between proteins, which is coupled with
111: the evolutionary lineage of proteins through the notion of protein family.
112: 
113: \begin{figure}[t]
114: \centerline{\epsfxsize=9cm \epsfbox{fig1.eps}}
115: \caption{Schematic picture of the evolution rule of the model.
116: The elementary steps are composed of i) duplication 
117: (light blue protein $\rightarrow$ red protein), 
118: ii) divergence (dashed pink links), and 
119: iii) mutation (violet link from the pink protein).
120: In addition, the mutation is constrained by family 
121: compatibility; for example, the pink protein cannot 
122: interact with the black protein because they are not compatible.
123: }
124: \end{figure}
125: 
126: \begin{figure*}
127: \centerline{\epsfxsize=15cm \epsfbox{fig2.eps}}
128: \caption{
129: Simulation results ($\bigcirc$) of the model agree well with the 
130: empirical data ($\diamond$).
131: Shown are 
132: (a) the degree distribution $P(k)$,
133: (b) the hierarchical clustering $C(k)$, and
134: (c) the average neighbor-degree function 
135: $\langle k_{\rm nn}\rangle$ for the protein interaction network.
136: The dotted line in (a) is a fit to Eq.~(\ref{pk}).
137: The results of the model without FC ($\Box$), which fail
138: to reproduce the empirical features, are also shown for 
139: comparison.
140: }
141: \end{figure*}
142: 
143: {\em Model}--- The model can be depicted schematically as in Fig.~1.
144: The whole system is composed of two types of networks, 
145: the PIN and the PFN. A number of proteins are grouped, forming 
146: a protein family. Protein families link to other protein families,
147: forming the PFN.
148: Two proteins belonging to different protein families can 
149: interact only when the respective families are also linked.
150: Each family has a fitness-like parameter, the number of domains
151: within it, $D_f$, which is not fixed, but evolves with the PFN.
152: The evolution takes place in two stages. In the first stage,
153: the protein families are created along with the proteins;
154: thus, the PFN coevolves with the PIN.
155: In the second stage, the PFN is kept fixed, and the evolution of 
156: the PIN continues on top of it. 
157: A detailed description of the procedure is as follows:
158: 
159: \begin{enumerate}
160: \item Initially, there are $n_0$ proteins, each of which constitutes
161: its own protein family. All $n_0$ proteins
162: are interconnected with one another, as are the $n_0$ protein families.
163: We choose $n_0=3$ to be minimal. 
164: Each family has $D_f=2$ domains, the number of family-links it has.
165: 
166: \item In the first stage, proteins and protein families coevolve: 
167: At each step, with rate $\alpha$, a new protein, say $a$, is created 
168: by duplicating an existing protein $b$ chosen randomly. The new protein $a$
169: creates its own protein family $F_a$. 
170: Each of the inherited interactions of the protein $a$
171: is removed with probability $\delta$, a process called divergence.
172: Through divergence, the degree of the new protein $a$, $k_a$,
173: usually becomes less than that of the mother protein $k_b$.
174: The linkage of the new protein family is determined by that of 
175: the protein created. By this process, the newly born family $F_a$ 
176: consists of a single protein, but has a number of linkages, say $K_{F_a}$, 
177: to existing families. 
178: The initial number of domains in the family is set to 
179: $D_{F_a}=K_{F_a}$. In some cases, the newly created protein is left with no 
180: interaction at all $(K_{F_a}=0)$. 
181: In this case, we do not let it establish a new 
182: family, but regard it as a remnant in the previous family.
183: When this case happens, the population of the family to which 
184: the duplicated protein belongs is increased by 1. Note that the 
185: remnant can later gain new interactions via mutation described below
186: and join the protein interaction network.
187: 
188: With rate $1$, a randomly chosen existing protein $i$ gains a new 
189: interaction to another previously unlinked protein $j$, which is 
190: chosen among the proteins within compatible families,
191: according to the probability,
192: \begin{equation}
193: \Pi_j= \dfrac{D_{F_j}}{\underset{F_{l}\leftrightarrow F_i}{\displaystyle \sum\nolimits} D_{F_{l}}},
194: \end{equation}
195: where $F_i$ means the family
196: to which the protein $i$ belongs and $X\leftrightarrow Y$ means that 
197: the families $X$ and $Y$ are compatible, i.e., linked in the PFN.
198: Eq.~(1), the preferential attachment in the domain abundance
199: constrained by FC, makes our model distinct
200: and successful.
201: In this process, the mutation as we will call it, the number of domains 
202: in the family $F_i$ increases by 1, but the number of domains in $F_j$
203: does not. 
204: This accounts for the acquisition of a new domain via mutation in 
205: the family $F_i$. This stage lasts until there are 1,000 proteins 
206: made, during which about $500$$\sim$$600$ families are created, a number 
207: comparable with the number of superfamilies in yeast~\cite{superfamily}
208: 
209: \item 
210: In the second stage, the same protein evolution process as in 
211: the first stage occurs, except that the PFN is 
212: kept fixed and the daughter protein remains in the same family as 
213: its mother in the duplication process.
214: This stage lasts until there are about 6,000 proteins in
215: the network, the approximate size of the yeast proteome. 
216: \end{enumerate}
217: 
218: A few remarks on the model are in order.
219: First, this model is designed to be as simple as possible while 
220: implementing FC into the 
221: trio of duplication, divergence, and mutation,
222: which we believe to be the most basic processes.
223: Many interesting processes, such as lateral gene transfer
224: and {\it de novo} creation of proteins and protein families,
225: are not covered in this model, however.
226: Second, we made an assumption that the time-scale of
227: the PFN evolution is strictly separated,
228: which might be an oversimplification.
229: Third, proteins and protein families may become extinct during evolution,
230: followed by the loss of the interactions between them.
231: However, we may view the parameters of the evolution rates,
232: such as $\alpha$ and $\delta$,
233: as {\it effective} ones incorporating all these details. 
234: Also, for the sake of minimizing the number of free parameters,
235: we assume that the duplication and the divergence rates of proteins 
236: and protein families are equal, i.e., $\alpha=\alpha_f$ and 
237: $\delta=\delta_f$, although we can fix $\alpha$ and $\delta$ for any 
238: given set of ($\alpha_f$, $\delta_f$) to incorporate the empirical 
239: data. 
240: 
241: {\em Structure of the yeast PIN}--- 
242: Several analyses on the topological properties of the yeast
243: PIN have been performed during recent
244: years \cite{lethal,maslov,wagner}. Since then, however, new 
245: protein--protein interactions in yeast have been discovered steadily,
246: so we repeat the analysis by integrating the most up-to-date data 
247: from various public resources, such as
248: (i) the database at the Munich Information Center for Protein Sequences \cite{mips}, 
249: (ii) the database of the interacting proteins \cite{dip}, 
250: (iii) the biomolecular interaction network database \cite{bind},
251: (iv) the two-hybrid datasets obtained by Uetz {\it et al.}~\cite{uetz}, 
252: by Ito {\it et al.}~\cite{ito}, and by Tong {\it et al.}~\cite{tong},
253: and (v) the mass spectrometry data (filtered) by Ho {\it et al.}~\cite{ho}.
254: After trimming the synonyms and other redundant entries manually,
255: the resulting network consists of 15,\mbox{ }652 interactions
256: (excluding self-interactions) between 4,\mbox{ }926 nodes (in terms of
257: distinct open reading frames and other biomolecules).
258: 
259: The topological properties of the integrated yeast PIN are shown 
260: in Fig.~2:
261: 
262: (a) The degree distribution of the PIN fits well to the generalized Pareto 
263: distribution (or a generalized power law) \cite{ab,koonin},
264: \begin{equation}
265: p_d(k)\sim (k+k_0)^{-\gamma},
266: \label{pk}
267: \end{equation}
268: with $k_0=8.0$ and $\gamma\simeq3.45$.
269: Note that different functional types of the degree distribution from 
270: Eq.~(\ref{pk}) were proposed~\cite{sole,vazquez,berg,wagner,lethal} 
271: based on smaller-scale datasets than the current one.  
272: 
273: (b) The yeast PIN is highly clustered and modular. 
274: To quantify this, we measured the local clustering of a protein $i$,
275: $c_i = {2e_i}/{k_i(k_i-1)}$, where $e_i$ is the number of links 
276: present between the $k_i$ neighbors of node $i$ out of its maximum 
277: possible number $k_i(k_i-1)/2$.
278: The clustering coefficient of a graph, $C$, is the average of 
279: $c_i$ over all nodes with $k_i\ge 2$. We obtain $C\approx 0.128$. 
280: $C(k)$ is the clustering function of vertices with degree 
281: $k$~\cite{vespignani2,ravasz}.
282: $C(k)$ exhibits a plateau for small $k$ while it drops rapidly 
283: for large $k$.
284: Such a plateau in the clustering function may reflect the 
285: functional module structure within the PIN, inside which the 
286: network is denser due to the high cooperativity to perform
287: a given cellular task. Such locally dense modules are interconnected
288: by a few global mediators, which are likely to be the hubs in the PIN \cite{han-vidal}.
289: This feature is what most existing PIN models fail to reproduce.
290: As we will show, the FC constraint that we introduce 
291: successfully accounts for the emergence of the plateau in $C(k)$.
292: 
293: (c) The yeast PIN shows a dissortative degree correlation.
294: The average neighbor-degree function 
295: $\langle k_{\rm nn}\rangle(k)$ \cite{knn} is measured to be
296: $\langle k_{\rm nn} \rangle(k) \sim k^{-\nu}$
297: with $\nu\approx 0.3$, somewhat smaller than the value reported based 
298: on a single two-hybrid dataset alone~\cite{maslov}.
299: The assortativity $r$, defined as the Pearson correlation coefficient 
300: between the degrees of the two vertices on each side of 
301: a link~\cite{assort}, is measured to be $r \approx -0.13$.
302: In Table \ref{tab1}, we summarize our measurements for the topological properties 
303: of the integrated yeast PIN. 
304: \begin{table}[b]
305: \caption{Topological quantities of the integrated 
306: yeast PIN and the model network. 
307: Error bars in the model results are the standard deviations of the
308: quantities from 1000 runs.}
309: \label{tab1}
310: \begin{ruledtabular}
311: \begin{tabular}{lll}
312: item & model & yeast PIN \\
313: \hline
314: total number of nodes $n$\phantom{aaa} & 6000\phantom{aaa} & $\approx$6000 \\
315: number of interacting nodes $N$\phantom{aaa} & 5079$\pm$54 & 4926 \\
316: average degree $\langle k\rangle$\phantom{aaa} & 6.5$\pm$0.3 & 6.35 \\
317: clustering coefficient $C$ & 0.13$\pm$0.02 & 0.128 \\
318: assortativity index $r$ & $-$0.09$\pm$0.04 & $-0.13$ \\
319: size of the largest component $N_1$ & 5051$\pm$53 & 4832 \\
320: \end{tabular}
321: \end{ruledtabular}
322: \end{table}
323: 
324: \begin{figure*}
325: \begin{minipage}[!t]{0.5\linewidth}
326: \flushright{\epsfxsize=6.3cm \epsfbox{fig3a.eps}}
327: \end{minipage}\hfill
328: \begin{minipage}[!t]{0.5\linewidth}
329: \flushleft{\epsfxsize=6.3cm \epsfbox{fig3b.eps}}
330: \end{minipage}\hfill
331: \caption{(a) Comparison between the degree correlation profiles of the 
332: yeast PIN and (b) the model network. The color code denotes the value
333: of $\log_{10}[P(k,k')/P_{\rm random}(k,k')]$. The randomized networks
334: are generated by the switching method \cite{maslov} 
335: that conserves the degree sequence.\\
336: }
337: \label{corr}
338: \end{figure*}
339: 
340: {\em Results}--- Now we compare the simulation results of our model. 
341: In typical simulations, 
342: we employed $\alpha=0.8$ and $\delta=0.7$. The value of $\delta$ was 
343: chosen to accommodate the fact that superfamilies exhibit extensive
344: sequence diversity~\cite{todd}. The value of $\alpha$ was set to match
345: the empirical value of the average degree of the PIN,
346: $\langle k\rangle\simeq 6.4$. Also, we matched approximately the numbers
347: of protein families and proteins with those of budding yeast, as we
348: described before.
349: The results obtained from the model show 
350: good agreements with the empirical data as shown in Fig.~2 and Table \ref{tab1}. 
351: In Fig.~2, we also show the results with the model without implementing
352: FC, which is similar to the model of Sol\'e et al.~\cite{sole}.
353: One can clearly see that without FC, we cannot 
354: account for the clustering and the degree correlation characteristics.
355: We also examine the full degree-correlation profile of 
356: the joint probability $P(k,k')$ that two proteins with degrees $k$ and
357: $k'$ are connected to each other. 
358: The degree-correlation intensity is quantified by $P(k,k')/P_{\rm random}(k,k')$,
359: the ratio with the joint probability in the randomized ensemble of
360: the original network \cite{maslov,sole03}.
361: As shown in Fig.~3, the profile obtained from the model 
362: has a pattern that is quite similar to that of the empirical yeast PIN.
363: 
364: \begin{figure}[t]
365: \centerline{\epsfxsize=\linewidth \epsfbox{fig4.eps}}
366: \caption{Network randomization test with and without FC.
367: (a) Clustering function $C(k)$ and 
368: (b) the clustering coefficient $C$ as functions of 
369: the number of edge shufflings are shown.
370: Symbols are for the unperturbed model network ($\bigcirc$), 
371: the network shuffled with FC ($\diamond$),
372: and the network shuffled without FC ($\Box$).
373: The horizontal line in (b) corresponds to the value of the clustering
374: coefficient in the unperturbed model network.
375: }
376: \end{figure}
377: 
378: To get further support for the relevance of the FC constraint,
379: we performed a network randomization test. We randomized the model network
380: by using the conventional edge switching method \cite{maslov}, but with the 
381: FC constraint. That is, when we are to switch the interactions
382: between the protein pairs, only the switching attempts that preserve 
383: FC are accepted. In this way, we can filter out the role of
384: FC. In Fig.~4, we show the results of randomization. We find that the
385: high clustering property of the network is preserved with randomization
386: with FC, but not without FC. Without FC, the clustering coefficient
387: drops as soon as we shuffle the network, as can be seen in Fig.~4(b). 
388: Thus, we conclude FC, indeed, plays a crucial role in PIN evolution.
389: 
390: \begin{figure}[t]
391: \centerline{\epsfxsize=9.5cm \epsfbox{fig5.eps}}
392: \caption{Simulation results for the protein family network: 
393: (a) The family degree distribution $p_d(k_F)$ and 
394: (b) the family size distribution $p_s(s_F)$.
395: The dotted lines in (a) and (b) are fit lines to Eq.~(\ref{pk}).
396: }
397: \end{figure}
398: 
399: Finally, we check the properties of the PFN. In Fig.~5, we show the 
400: degree distribution of the PFN and the family size distribution 
401: generated {\it in silico}. The degree distribution of the PFN follows 
402: a similar form to Eq.~(2), but with a different value of the exponent,
403: $\gamma_f\approx 3$. The family size distribution also follows a power 
404: law with an exponent of 3$\sim$4. 
405: 
406: In summary, we have introduced an {\em in-silico} model for PIN 
407: evolution. The model network is composed of the PIN and the PFN. 
408: In the early stage of evolution, the PIN and the PFN coevolve, 
409: and in the later stage, the PFN becomes fixed.
410: The evolution proceeds by the three major mechanisms
411: previously proposed, duplication, divergence, and mutation.
412: However, it is constrained by FC and 
413: follows a modified preferential attachment rule in the domain abundance,
414: which is the new feature of our model.
415: We have checked various structural properties of the model network, finding 
416: that they show good agreements with those of the integrated empirical data 
417: of the yeast PIN. 
418: Finally, it would be interesting to apply our model to higher eukaryotes,
419: as the data for the protein interactions are accumulating for the 
420: multicellular species such as the nematode worm {\em Caenorhabditis elegans} 
421: \cite{vidal} and the fruit fly {\em Drosophila melanogater} \cite{giot}.
422: \\
423: 
424: \begin{acknowledgments}
425: The authors would like to thank J.~Park for helpful conversation.
426: This work is supported by Korea Science and Engineering Foundation
427: grant No. R14-2002-059-01000-0 in the Advanced Basic Research Laboratory 
428: program and Ministry of Science and Technology grant No. M1 03B500000110.
429: \end{acknowledgments}
430: 
431: \begin{thebibliography}{99}
432: \bibitem{rmp} R. Albert and A.-L. Barab\'asi, Rev. Mod. Phys. {\bf 74}, 47 (2002).
433: \bibitem{advphys} S. N. Dorogovtsev and J. F. F. Mendes, Adv. Phys. {\bf 51}, 1079 (2002).
434: \bibitem{siam} M. E. J. Newman, SIAM Rev. {\bf 45}, 167 (2003).
435: \bibitem{saemulli} B. Kahng, K.-I. Goh, D.-S. Lee, and D. Kim, Saemulli, New Physics (in Korean) {\bf 48}, 115 (2004).
436: \bibitem{dslee} D.-S. Lee, K.-I. Goh, B. Kahng, and D. Kim, J. Korean Phys. Soc. {\bf 44}, 633 (2004).
437: \bibitem{han}  C. N. Yoon, S. K. Han, and H. Y. Kim, J. Korean Phys. Soc. {\bf 44}, 638 (2004).
438: \bibitem{pyramid} Z. N. Oltvai and A.-L. Barab\'asi, {Science} {\bf 298}, 763 (2002).
439: \bibitem{network-biology} A.-L. Barab\'asi and Z. N. Oltvai, Nat. Rev. Genet. {\bf 5}, 101 (2004).
440: \bibitem{uetz} P. Uetz, {\em et al.}, {Nature (London)} {\bf 403}, 623 (2000); B. Schwikowski, P. Uetz, and S. Fields, {Nat. Biotechnol.} {\bf 18}, 1257 (2000).
441: \bibitem{ito} T. Ito, T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki, {Proc. Natl. Acad. Sci.} USA {\bf 98}, 4569 (2001).
442: \bibitem{tong} A. H. Y. Tong, {\em et al.}, {Science} {\bf 295}, 321 (2002).
443: \bibitem{gavin} A.-C. Gavin, {\em et al.}, {Nature (London)} {\bf 415}, 141 (2002).
444: \bibitem{ho} Y. Ho, {\em et al.}, {Nature (London)} {\bf 415}, 180 (2002).
445: \bibitem{mips} H. W. Mewes, {\em et al.}, Nucl. Acids Res. {\bf 32}, D41 (2004).
446: \bibitem{dip} L. Salwinski, C. S. Miller, A. J. Smith, F. K. Pettit, J. U. Bowie, and D. Eisenberg, Nucl. Acids Res. {\bf 32}, D449 (2004).
447: \bibitem{bind} G. D. Bader, D. Betel, and C. W. V. Hogue, Nucl. Acids Res. {\bf 31}, 248 (2003).
448: \bibitem{alberts} B. Alberts, D. Bray, A. Johnson, J. Lewis, M. Raff, K. Robert, and P. Walter, {\it Essential Cell Biology} (Garland, New York, 1998).
449: \bibitem{jpark} J. Park, M. Lappe, and S. A. Teichmann, {J. Mol. Biol.} {\bf 307}, 929 (2001).
450: \bibitem{huynen} M. A. Huynen and E. van Nimwegen, {Mol. Biol. Evol.} {\bf 15,} 583 (1998).
451: \bibitem{sole} R. V. Sol\'e, R. Pastor-Satorras, E. Smith, and T. Kepler., {Adv. Compl. Syst.} {\bf 5}, 43 (2002); R. Pastor-Satorras, E. D. Smith, and R. V. Sol\'e, {J. Theor. Biol.} {\bf 222}, 199 (2003).
452: \bibitem{vazquez} A. V\'azquez, A. Flammini, A. Maritan, and A. Vespignani, {ComPlexUs} {\bf 1}, 38 (2003).
453: \bibitem{kim} J. Kim, P. L. Krapivsky, B. Kahng, and S. Redner, Phys. Rev. E {\bf 66}, 05510(R) (2002).
454: \bibitem{chung} F. Chung, L. Lu, T. G. Dewey, and D. J. Galas, {J. Comput. Biol.} {\bf 18}, 1486 (2003).
455: \bibitem{berg} J. Berg, M. L\"assig, and A. Wagner, BMC Evol. Biol. {\bf 4}, 51 (2004).
456: \bibitem{ohno} S. Ohno, {\it Evolution by Gene Duplication} (Springer-Verlag, Berlin, 1970).
457: \bibitem{superfamily} J. Gough, K. Karplus, R. Hughey, and C. Chothia, J. Mol. Biol. {\bf 313}, 903 (2001). 
458: \bibitem{lethal} H. Jeong, S. P. Mason, A.-L. Barab\'asi, and Z. N. Oltvai, {Nature (London)} {\bf 411}, 41 (2001).
459: \bibitem{wagner} A. Wagner, {Mol. Biol. Evol.} {\bf 18}, 1283 (2001).
460: \bibitem{maslov} S. Maslov and K. Sneppen, {Science} {\bf 296}, 910 (2002).
461: \bibitem{ab} R. Albert and A.-L. Barab\'asi, Phys. Rev. Lett. {\bf 85}, 5234 (2000).
462: \bibitem{koonin} E. V. Koonin, Y. I. Wolf, and G. P. Karev, {Nature} {\bf 420}, 218 (2002).
463: \bibitem{vespignani2} A. V\'azquez, R. Pastor-Satorras, and A. Vespignani, Phys. Rev. E {\bf 65}, 066130 (2002).
464: \bibitem{ravasz} E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai, and A.-L. Barab\'asi, Science {\bf 297,} 1551 (2002); E. Ravasz and A.-L. Barab\'asi, Phys. Rev. E {\bf 67,} 026112 (2003). 
465: \bibitem{han-vidal} J.-D. Han, {\em et al.}, Nature (London) {\bf 430}, 88 (2004).
466: \bibitem{knn} R. Pastor-Satorras, A. V\'azquez and A. Vespignani, Phys. Rev. Lett. {\bf 87,} 258701 (2001).
467: \bibitem{assort} M. E. J. Newman, {Phys. Rev. Lett.} {\bf 89}, 208701 (2002).
468: \bibitem{todd} A. E. Todd, C. A. Orengo, and J.~M. Thornton, {J. Mol. Biol.} {\bf 307}, 1113 (2001).
469: \bibitem{sole03} R.~V. Sol\'e and P. Fern\'andez, (arXiv:q-bio.GN/0312032). 
470: \bibitem{vidal} S. Li, {\em et al.}, Science {\bf 303}, 540 (2004).
471: \bibitem{giot} L. Giot, {\em et al.}, Science {\bf 302}, 1727 (2003).
472: \end{thebibliography}
473: \end{document}
474: 
475: