1: \documentclass[aps,twocolumn]{revtex4}
2: \usepackage[dvips]{graphicx}
3: \usepackage{color}
4: %\usepackage{AMSfonts}
5: %\usepackage{amssymb,amsmath}
6:
7: \begin{document}
8:
9: \title{Connectivity and expression in protein networks: \\
10: Proteins in a complex are uniformly expressed}
11:
12: % use optional labels to link authors explicitly to addresses:
13: % \author[label1,label2]{}
14: % \address[label1]{}
15: % \address[label2]{}
16:
17: \author{Shai Carmi$^1$, Erez Y. Levanon$^2$, Shlomo Havlin$^1$, Eli Eisenberg$^3$}
18:
19: \affiliation{$^1$Minerva Center and Dept.\ of Physics,
20: Bar-Ilan University, Ramat-Gan 52900, Israel}
21: \affiliation{$^2$Compugen Ltd., 72 Pinhas Rosen St., Tel-Aviv 69512, Israel}
22: \affiliation{$^3$School of Physics and Astronomy, Raymond and Beverly Sackler
23: Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel}
24:
25: \begin{abstract}
26: We explore the interplay between the protein-protein interactions network
27: and the expression of the interacting proteins.
28: It is shown that interacting proteins are expressed in
29: significantly more similar cellular concentrations. This is
30: largely due to interacting pairs which are part of protein
31: complexes. We solve a generic model of complex formation and show
32: explicitly that complexes form most efficiently when their
33: members have roughly the same concentrations. Therefore, the
34: observed similarity in interacting protein concentrations could be
35: attributed to optimization for efficiency of complex formation.
36: \end{abstract}
37:
38: %\begin{keyword}
39: % keywords here, in the form: keyword \sep keyword
40:
41: % PACS codes here, in the form: \PACS code \sep code
42: %\PACS
43: %\end{keyword}
44: \maketitle
45:
46: \section{Introduction}
47:
48: Statistical analysis of real-world networks topology has attracted
49: much interest in recent years, proving to supply new insights and
50: ideas to many diverse fields. In particular, the protein-protein
51: interaction network, combining many different interactions of
52: proteins within a cell, has been the subject of many studies
53: (for a recent review see \cite{Barabasi}).
54: While this network shares many of the universal
55: features of natural networks such as the scale-free distribution
56: of degrees \cite{Jeong}, and the small world characteristics
57: \cite{flag}, it also has some unique features. One of the most
58: important of these is arguably the fact that the protein interactions
59: underlying this network can be separated into two roughly disjoint
60: classes. One of them relates to transmission of information within
61: the cell: protein A interacts with protein B and changes it, by a
62: conformational or chemical transformation. The usual scenario
63: after such an interaction is that the two proteins disassociate
64: shortly after the completion of the transformation. On the other
65: hand, many protein interactions are aimed at the formation of a
66: protein complex. In this mode of operation the physical attachment
67: of two or more proteins is needed in order to allow for the
68: biological activity of the combined complex, and is typically
69: stable over relatively long time scales \cite{Han}.
70:
71: The yeast {\it Saccharomyces cerevisiae} serves as the model
72: organism for most of the analyses of protein-protein interaction
73: network. The complete set of genes and proteins with extensive
74: data on gene expression are available \cite{Cherry} for this
75: unicellular organism, accompanied by large datasets of
76: protein-protein interactions based on a wide range of experimental
77: and computational methods
78: \cite{synexp,mrna,genefus,hms,y2h,synleth,2nei,tap,von Mering}.
79: In addition, the intracellular locations
80: and the expression levels of most proteins of the yeast were recently
81: reported \cite{Ghaemmaghami}. The availability of such data enables
82: us to study the relationship between network topology and the
83: expression levels of each protein.
84:
85: In this work we demonstrate the importance of the distinction
86: between different types of protein interaction, by highlighting
87: one property which is unique to interactions of the protein
88: complexes. Combining databases of yeast protein interactions
89: with the recently reported information on the protein
90: concentration, we find that proteins belonging to the same complex
91: tend to have a more uniform concentration distribution. We further
92: explain this finding by a model of complex formation, showing that
93: uneven concentrations of the complex members result in inefficient
94: complex formation. Surprisingly, in some cases increasing the
95: concentration of one of the complex ingredients decreases the
96: absolute number of complexes formed. Thus, the experimental
97: observation of uniform complex members concentrations can be
98: explained in terms of selection for efficiency.
99:
100: \section{Concentrations of Interacting Proteins}
101:
102: We start by studying the concentrations of pairs of interacting proteins,
103: and demonstrate that different types of protein-protein interactions differ
104: in their properties. For this purpose we use the recently published database
105: providing the (average) concentration \cite{Ghaemmaghami},
106: as well as the localization within the cell,
107: for most of the {\it Saccharomyces cerevisiae}
108: (baker's yeast) proteins \cite{Huh}.
109: The concentrations $c_i$ (given in arbitrary units)
110: are approximately distributed according to a log-normal distribution
111: with $\langle log(c_i)\rangle=7.89$
112: and standard deviation $1.53$ (Fig. \ref{lognormal}).
113:
114:
115: \begin{figure}[htb]
116: \centering
117: \includegraphics[totalheight=1.5in]{conc-pdf.eps}
118: \caption{ {(Color online) Distribution of the logarithm of the protein
119: concentration (in units of protein molecules per cell) for all measured
120: proteins within the yeast cell.}}
121: \label{lognormal}
122: \end{figure}
123:
124:
125: The bakers' yeast serves as a model organism for most of the
126: protein-protein interaction network studies. Thus a set of many of
127: its protein-protein interactions is also readily available. Here
128: we use a dataset of recorded yeast protein interactions, given
129: with various levels of confidence \cite{von Mering}. The dataset
130: lists about $80000$ interactions between approximately $5300$
131: of the yeast proteins (or about $12000$ interactions between
132: $2600$ proteins when excluding interactions of the lowest
133: confidence). These interactions were deduced by many different
134: experimental methods, and describe different biological relations
135: between the proteins involved. The protein interaction network
136: exhibits a high level of clustering (clustering coefficient
137: $\approx0.39$). This is partly due to the existence of many sets
138: of proteins forming complexes, where each of the complex members
139: interacts with many other members.
140:
141: Combining these two databases, we study the correlation between the
142: (logarithm of) concentrations of pairs of interacting
143: proteins. In order to gain insight into the different components
144: of the network, we perform this calculation separately for the interactions
145: deduced by different experimental methods.
146: For simplicity, we report here the results after excluding the interactions
147: annotated as
148: low-confidence (many of which are expected to be false-positives).
149: We have explicitly checked that their inclusion does not change the results
150: qualitatively. The results are summarized in Table \ref{table}, and show a
151: significant correlation between the expression levels of
152: interacting proteins.
153:
154: \begin{widetext}
155: \begin{table}[b]
156: \begin{tabular}{|*{8}{c|}}
157: \hline Interaction & Number of & Number of & Number
158: of & Correlation & STD of & P-Value \\
159: & interacting & interactions & interactions in &
160: between & random & \\
161: & proteins && which expression & expression & correlations \\
162: &&& level is & levels of && \\
163: &&& known for & interacting && \\
164: &&& both proteins & proteins && \\
165: \hline
166: All & 2617 & 11855 & 6347 & 0.167 & 0.012 & $10^{-42}$ \\
167: \hline
168: Synexpression\cite{synexp,mrna}
169: & 260 & 372 & 200 & 0.4 & 0.065 & $3.5\cdot10^{-10}$ \\
170: \hline
171: Gene Fusion\cite{genefus} & 293 & 358 & 174 & -0.079 & - & - \\
172: \hline
173: HMS\cite{hms} & 670 & 1958 & 1230 & 0.164 & 0.027 & $3.3\cdot10^{-10}$ \\
174: \hline
175: yeast 2-Hybrid\cite{y2h}& 954 & 907 & 501 & 0.097 & 0.046 & $1.7 \cdot 10^{-2}$ \\
176: \hline
177: Synthetic Lethality\cite{synleth} & 678 & 886 & 497 & 0.285 & 0.045 & $1.2\cdot10^{-10}$ \\
178: \hline
179: 2-neighborhood\cite{2nei} & 998 & 6387 & 3110 & 0.054 & 0.016 & $5.4\cdot10^{-4}$ \\
180: \hline
181: TAP\cite{tap} & 806 & 3676 & 2239 & 0.291 & 0.02 & $10^{-49}$ \\
182: \hline
183: \end{tabular}
184: \caption{ (Color online) Correlation coefficients between the logarithm of the
185: concentrations of interacting proteins. Only interactions of medium
186: or high confidence were included. The statistical
187: significance of the results was estimated by randomly permuting
188: the concentrations of the proteins and reevaluating the
189: correlation on the same underlying network, repeated for 1,000
190: different permutations. The mean correlation of the randomly
191: permuted networks was zero, and the standard deviation (STD) is given.
192: The P-value was calculated assuming gaussian distribution of the
193: correlation values for the randomized networks. We have verified
194: that the distributions of the 1,000 realizations calculated are
195: roughly Guassian.} \label{table}
196: \end{table}
197: \end{widetext}
198:
199: The strongest correlation is seen for the subset of protein
200: interactions which were derived from synexpression, i.e. inferred
201: from correlated mRNA expression. This result confirms the common
202: expectation that genes with correlated mRNA expression would yield
203: correlated protein levels as well\cite{mrna}. However, our results show that
204: interacting protein pairs whose interaction was deduced by other methods
205: exhibit significant positive correlation as well. The effect is weak for
206: the yeast 2-Hybrid (Y2H) method\cite{y2h} which includes all possible
207: physical interactions between the proteins (and is also known to
208: suffer from many artifacts and false-positives), but stronger for
209: the HMS (High-throughput Mass Spectrometry)\cite{hms} and TAP
210: (Tandem-Affinity Purification)\cite{tap} interactions corresponding to
211: actual physical interactions (i.e., experimental evidence that the
212: proteins actually bind together in-vivo). These experimental
213: methods are specifically designed to detect cellular protein
214: complexes. The above results thus hint that the overall
215: correlation between concentrations of interacting proteins is due
216: to the tendency of proteins which are part of a stable complex to
217: have similar concentrations.
218:
219: The same picture emerges when one counts the number of
220: interactions a protein has with other proteins of similar
221: concentration, compared to the number of interactions with
222: randomly chosen proteins. A protein interacts, on average, with
223: $0.49\%$ of the proteins with similar expression level (i.e.,
224: $|$log-difference$| < 1$), as opposed to only $0.36\pm 0.01~\%$ of
225: random proteins, in agreement with the above observation of
226: complex members having similar protein concentrations.
227:
228: In order to directly test this hypothesis (i.e. that proteins in a
229: complex have similar concentrations), we use existing datasets of
230: protein complexes and study the uniformity of concentrations of
231: members of each complex. The complexes data were taken from
232: \cite{complexes}, and were found to have many TAP interactions
233: within them.
234: %Each complex is a list of
235: %proteins; we got $\approx1000$ complexes, of average size $7.4$
236: %and a wide distribution of sizes (standard deviation $9.06$).
237: %First, we compared the complex list to the TAP network. More than
238: %one half of the proteins that participate in a complex were also
239: %found to interact by the TAP method, and the average connectivity
240: %within a complex (i.e., the fraction of pairs of complex members
241: %which interact according to the TAP database, out of the possible
242: %$n(n-1)/2$ possible pairs) is roughly one third.
243: As a measure of
244: the uniformity of the expression levels within each complex, we
245: calculate the variance of the (logarithm of the)
246: concentrations among the members of each complex.
247: The average variance (over all complexes) is found to be
248: $2.35$, compared to $2.88\pm0.07$ and $2.74\pm0.11$
249: for randomized complexes in two different randomization schemes (see
250: figure), confirming that the concentrations of complex members
251: tend to be more uniform than a random set of proteins.
252:
253: \begin{figure}[htb]
254: \centering
255: \includegraphics[totalheight=1.5in]{comp-penta-std.eps}
256: \caption{{(Color online) (a) Variance of the logarithm of the
257: protein expression levels (in units of mulecules per cell)
258: for members of real complexes, averaged over all complexes,
259: comapred with the averaged variance of the complexes after
260: randomization of their members, letting each protein participate on average in
261: the same number of complexes (random(1)),
262: as well as randomized complexes where the number
263: of complexes each protein participates in is kept fixed (random(2)).
264: Real complexes have a lower variance, indicating higher uniformity in the
265: expression levels of the underlying proteins. (b) Same as (a) for
266: expression levels in pentagons (see text).}} \label{unity}
267: \end{figure}
268:
269:
270: As another test, we study a different yeast protein interaction
271: network, the one from the DIP database \cite{Xenarios}. We look
272: for fully-connected sub-graphs of size $5$, which are expected to
273: represent complexes, sub-complexes or groups of proteins working
274: together. The network contains approximately $1600$ (highly overlapping)
275: such pentagons, made of about $300$ different proteins. The
276: variance of the logarithm of the concentrations of each
277: pentagon members, averaged over the different pentagons, is 1.234.
278: As before, this is a significantly low variance compared with random
279: sets of five proteins (average variance $1.847 \pm 0.02$ and $1.718\pm0.21$),
280: see figure \ref{unity}.
281:
282: Finally, we have used mRNA expression data \cite{mrna} and looked
283: for correlated expression patterns within complexes. We have
284: calculated the correlation coefficient between the expression data
285: of the two proteins for each pair of proteins which are part of
286: the same pentagon. The average correlation coefficient between
287: proteins belonging to the same fully-connected pentagon is $0.15$
288: compared to $0.056\pm0.005$ for a random pair.
289:
290: In summary, combination of a number of yeast protein interaction
291: networks with protein and mRNA expression data yields the
292: conclusion that interacting proteins tend to have similar
293: concentrations. The effect is stronger when focusing on
294: interactions which represent stable physical interactions, i.e.
295: complex formation, suggesting that the overall effect is largely
296: due to the uniformity in the concentrations of proteins belonging
297: to the same complex. In the next Section we explain this finding
298: by a model of complex formation. We show, on general grounds, that
299: complex formation is more effective when the concentrations of its
300: constituents is roughly the same. Thus, the observation made in
301: the present Section can be explained by selection for efficiency
302: of complex formation.
303:
304: \section {Model}
305: Here we study a model of complex formation, and explore the
306: effectiveness of complex production as a function of the relative abundances of
307: its constituents. For simplicity, we start by a detailed analysis of the
308: three-components complex production, which already captures most
309: of the important effects.
310:
311: Denote the concentrations of the three components of the complex
312: by $A$, $B$ and $C$, and the concentrations
313: of the complexes they form by $AB$, $AC$, $BC$ and $ABC$.
314: The latter is the concentration of the full complex, which is
315: the desired outcome of
316: the production, while the first three describe the different sub-complexes
317: which are formed (in this case, each of which is composed of two components).
318: Three-body processes, i.e., direct generation
319: (or decomposition) of $ABC$ out of $A$ $B$ and $C$, can usually be neglected
320: \cite{book}, but their inclusion here does not complicate the analysis.
321: The resulting set of reaction kinetic equations is given by
322:
323: \begin{widetext}
324: \begin{eqnarray}
325: \frac{d(A)}{dt} & = & k_{d_{A,B}} AB + k_{d_{A,C}} AC +
326: (k_{d_{A,BC}} + k_{d_{A,B,C}}) \cdot ABC \nonumber\\ && - k_{a_{A,B}} A
327: \cdot B - k_{a_{A,C}} A \cdot C - k_{a_{A,BC}} A \cdot BC -
328: k_{a_{A,B,C}} A \cdot B \cdot C \\
329: \frac{d(B)}{dt}& = & k_{d_{A,B}} AB + k_{d_{B,C}} BC +
330: (k_{d_{B,AC}} + k_{d_{A,B,C}}) \cdot ABC \nonumber\\ &&- k_{a_{A,B}} A
331: \cdot B - k_{a_{B,C}} B \cdot C - k_{a_{B,AC}} B \cdot AC - k_{a_{A,B,C}} A \cdot B \cdot C\\
332: \frac{d(C)}{dt}& = & k_{d_{A,C}} AC + k_{d_{B,C}} BC +
333: (k_{d_{C,AB}} + k_{d_{A,B,C}}) \cdot ABC \nonumber\\ && - k_{a_{A,C}} A
334: \cdot C - k_{a_{B,C}} B \cdot C -
335: k_{a_{C,AB}} C \cdot AB - k_{a_{A,B,C}} A \cdot B \cdot C\\
336: \frac{d(AB)}{dt} & = & k_{a_{A,B}} A \cdot B + k_{d_{C,AB}} ABC -
337: k_{d_{A,B}} AB - k_{a_{C,AB}} C \cdot AB\\
338: \frac{d(AC)}{dt} & = & k_{a_{A,C}} A \cdot C + k_{d_{B,AC}} ABC -
339: k_{d_{A,C}} AC - k_{a_{B,AC}} B \cdot AC\\
340: \frac{d(BC)}{dt} & = & k_{a_{B,C}} B \cdot C + k_{d_{A,BC}} ABC -
341: k_{d_{B,C}} BC - k_{a_{A,BC}} A \cdot BC\\
342: \frac{d(ABC)}{dt} & = & k_{a_{A,BC}} A \cdot BC + k_{a_{B,AC}} B
343: \cdot AC + k_{a_{C,AB}} C \cdot AB + k_{a_{A,B,C}} A \cdot B \cdot
344: C \nonumber\\ && - (k_{d_{A,BC}} + k_{d_{B,AC}} + k_{d_{C,AB}} +
345: k_{d_{A,B,C}}) \cdot ABC
346: \end{eqnarray}
347: \end{widetext}
348: where $k_{a_{x,y}}$ ($k_{d_{x,y}}$) are the association
349: (dissociation) rates of the subcomponents $x$ and $y$ to form the
350: complex $xy$. Denoting the total number of type $A$, $B$ and $C$
351: particles by $A_0$, $B_0$, $C_0$, respectively, we may write the
352: conservation of material equations:
353: \begin{eqnarray}
354: A + AB + AC + ABC = A_0\\
355: B + BC + AB + ABC = B_0\\
356: A + AC + BC + ABC = C_0
357: \end{eqnarray}
358:
359: We look for the steady-state solution of these equations, where
360: all time derivatives vanish. For simplicity, we consider first the
361: totally symmetric situation, where all the ratios of association
362: coefficients to their corresponding dissociation coefficients are equal,
363: i.e., the ratios $k_{d_{x,y}}/k_{a_{x,y}}$ are all equal to
364: $X_0$ and $k_{d_{x,y,z}}/k_{a_{x,y,z}}=X_0^2$, where $X_0$ is a constant with
365: concentrations units. In this case,
366: measuring all concentrations in units of $X_0$, all the
367: reaction equations are solved by the substitutions $AB = A \cdot
368: B$, $AC = A \cdot C$, $BC = B \cdot C$ and $ABC = A \cdot B \cdot
369: C$, and one needs only to solve the material conservation
370: equations, which take the form:
371: \begin{eqnarray}
372: A + A \cdot B + A \cdot C + A \cdot B \cdot C = A_0\\
373: B + B \cdot C + A \cdot B + A \cdot B \cdot C = B_0\\
374: A + A \cdot C + B \cdot C + A \cdot B \cdot C = C_0
375: \end{eqnarray}
376: These equations allow for an exact and straight-forward (albeit
377: cumbersome) analytical solution. In the following, we explore the
378: properties of this solution. The efficiency of the production of
379: $ABC$, the desired complex, can be measured by the number of
380: formed complexes relative to the maximal number of complexes
381: possible given the initial concentrations of supplied particles
382: ${\rm eff} \equiv ABC / \min{(A_0,B_0,C_0)}$. This definition does
383: not take into account the obvious waste resulting from proteins of
384: the more abundant species which are bound to be leftover due to
385: shortage of proteins of the other species. In the following we
386: show that having unmatched concentrations of the different complex
387: components result in lower efficiency beyond this obvious waste.
388:
389: In the linear regime, $A_0, B_0, C_0 \ll 1$, the fraction of
390: particles forming complexes is small, and all concentrations are
391: just proportional to the initial concentrations. The overall
392: efficiency of the process in this regime is extremely low,
393: $ABC=A\cdot B\cdot C \sim A_0\cdot B_0\cdot C_0\ll A_0,B_0,C_0$.
394: We thus go beyond this trivial linear regime, and focus on the
395: region where all concentrations are greater than unity. Fig.
396: \ref{1k} presents the efficiency as a function of $A_0$ and $B_0$,
397: for fixed $C_0 = 10^2$. The efficiency is maximized when the two
398: more abundant components have approximately the same concentration,
399: i.e., for $A_0 \approx B_0$ (if $C_0<A_0,B_0$), for $A_0\approx C_0=10^2$
400: (if $B_0<A_0,C_0$) and for $B_0\approx C_0=10^2$
401: (if $A_0<B_0,C_0$).
402:
403: \begin{figure}[htb]
404: \centering
405: \includegraphics[totalheight=1.5in]{1-k-A0B0.eps}
406: \caption{{(Color online) The efficiency of the synthesis ${\rm eff} \equiv
407: ABC / \min{(A_0,B_0,C_0)}$ as a function of $A_0$ and $B_0$, for
408: $C_0=10^2$. The efficiency is maximized when the two most
409: abundant species have roughly the same concentration.}} \label{1k}
410: \end{figure}
411:
412: Moreover, looking at the absolute quantity of the complex product,
413: one observes (fixing the concentrations of two of substances,
414: e.g., $B_0$ and $C_0$) that $ABC$ itself has a maximum at some
415: finite $A_0$, i.e., there is a finite optimal concentration for
416: $A$ particles (see Fig. \ref{a0max}). Adding more molecules of
417: type $A$ beyond the optimal concentration {\it decreases} the
418: amount of the desired complexes. The concentration that maximizes
419: the overall production of the three-component complex is
420: $A_{0,max} \approx \max{(B_0,C_0)}$.
421:
422: \begin{figure}[htb]
423: \centering
424: \includegraphics[totalheight=1.5in]{A0-max-image.eps}
425: \caption{(Color online) {$\log{(ABC)}$ as a function of $A_0,B_0$, for fixed
426: $C_0 = 10^2$.
427: For each row (fixed $A_0$) or column (fixed $B_0$) in the graph,
428: $ABC$ has a maximum, which occurs where $A_{0,max}
429: \approx\max{(B_0,C_0)}$ (for columns), and $B_{0,max}
430: \approx\max{(A_0,C_0)}$ (for rows).}} \label{a0max}
431: \end{figure}
432:
433:
434: An analytical solution is available for a somewhat more general situation,
435: allowing the ratios $k_{d_{x,y}}/k_{a_{x,y}}$ to take different
436: values for the two-components association/dissociation ($X_0$) and
437: the three-components association/dissociation ($X_0/\alpha$ and
438: $X_0^2/\alpha$ for association/dissociation of the three-component complex
439: from/to a two-component complex plus one single particle or to three
440: single particles, respectively).
441: It can be easily seen that under these conditions, and measuring the
442: concentration in units of $X_0$ again,
443: the solution of the reaction kinetics equations is given by
444: \begin{eqnarray}
445: AB & = & A \cdot B, \\
446: AC & = & A \cdot C, \\
447: BC & = & B \cdot C, \\
448: ABC & = & \alpha ~ A \cdot B \cdot C,
449: \end{eqnarray}
450: and therefore the conservation of material equations take the form
451: \begin{eqnarray}
452: A + A \cdot B + A \cdot C + \alpha A \cdot B \cdot C = A_0\\
453: B + B \cdot C + A \cdot B + \alpha A \cdot B \cdot C = B_0\\
454: A + A \cdot C + B \cdot C + \alpha A \cdot B \cdot C = C_0
455: \end{eqnarray}
456: These equations are also amenable for an analytical solution, and
457: one finds that taking $\alpha$ not equal to $1$
458: does not qualitatively change the above results. In particular,
459: the synthesis is most efficient when the two highest concentrations are
460: roughly equal, see Fig. \ref{4k}. Note that our results hold
461: even for $\alpha\gg 1$, where the three-component complex is much more stable
462: than the intermediate $AB$, $AC$, and $BC$ states.
463:
464: \begin{figure}[htb]
465: \centering
466: \includegraphics[totalheight=2.5in]{4-k-A0B0.eps}
467: \caption{(Color online) {Synthesis efficiency ${\rm eff}\equiv ABC /
468: \min{(A_0,B_0,C_0)}$ as a function of $A_0$ and $B_0$, for
469: different values of $\alpha$. $C_0$ is fixed, $C_0=100$. The
470: efficiency is maximized when the two most abundant substances are
471: of roughly the same concentration, regardless of the values of
472: $\alpha$.}} \label{4k}
473: \end{figure}
474:
475: We have explicitly checked that the same picture holds for
476: 4-component complexes as well: fixing the concentrations $B_0$,
477: $C_0$, and $D_0$, the concentration of the target complex $ABCD$
478: is again maximized for $A_{0,max} \approx\max{(B_0,C_0,D_0)}$.
479: This behavior is expected to hold qualitatively for a general
480: number of components and arbitrary reaction rates, due to the
481: following argument: Assume a complex is to be produced from many
482: constituents, one of which ($A$) is far more abundant than the
483: others ($B$, $C$, ...). Since $A$ is in excess, almost all $B$
484: particles will bound to $A$ and form $AB$ complexes. Similarly,
485: almost all $C$ particles will bound to $A$ to form an $AC$
486: complex. Thus, there will be very few free $C$ particles to bound
487: to the $AB$ complexes, and very few free $B$ particles available
488: for binding with the $AC$ complexes. As a result, one gets
489: relatively many half-done $AB$ and $AC$ complexes, but not the
490: desired $ABC$ (note that $AB$ and $AC$ cannot bound together).
491: Lowering the concentration of $A$ particles allows more $B$ and
492: $C$ particles to remain in an unbounded state, and thus {\it
493: increases} the total production rate of $ABC$ complexes (Fig.
494: \ref{ABCAB}).
495:
496: \begin{figure}[htb]
497: \centering
498: \includegraphics[totalheight=1.5in]{ABCvsAB.eps}
499: \caption{(Color online) The dimensionless
500: concentrations of the complex $ABC$ (solid line),
501: partial complex $AB$ (dashed line), and $C$ (dotted line) as a
502: function of the total concentration of $A$ particles, $A_0$ ($C$
503: is multiplied by 10 for visibility). $B_0$ and $C_0$ are fixed
504: $B_0 = C_0 = 10^3$. The maximum of $ABC$ for finite $A_0$ is a
505: result of the balance between increase in the number of $AB$ and
506: $AC$ complexes and the decrease in the number of available free
507: $B$ and $C$ particles as $A_0$ increases.} \label{ABCAB}
508: \end{figure}
509:
510: Many proteins take part in more than one complex. One might thus wonder
511: what is the optimal concentration for these, and how it affects
512: the general correlation observed between the concentrations of
513: members of the same complex. In order to clarify this issue,
514: we have studied a model in which four proteins $A$, $B$, $C$ and $D$
515: bind together to form two desired products: the $ABC$ and $BCD$ complexes.
516: $A$ and $D$ do not interact, so that there are no complexes or
517: sub-complexes of the type $AD$, $ABD$, $ACD$ and $ABCD$. Solution of this
518: model (see appendix) reveals that the efficiency of the production of $ABC$
519: and $BCD$ is maximized when (for a fixed ratio of $A_0$ and $D_0$)
520: $A_0+D_0\approx B_0\approx C_0$. One thus sees, as could have been expected,
521: that proteins that are
522: involved in more than one complex
523: (like $B$ and $C$ in the above model) will tend to have higher concentrations
524: than other members of the same complex participating in only one complex.
525: Nevertheless, since the protein-protein interaction network is scale-free,
526: most proteins take part in a small-number of complexes, and only a very
527: small fraction participate in many complexes. Moreover, given the three
528: orders of magnitude spread in protein concentrations (see figure
529: \ref{lognormal}),
530: only proteins participating in a very large number of complexes (relative to the avregae participation) or participating in two complexes of a very different
531: concentrations (i.e., $A_0\gg D_0$) will result in order-of-magnitude
532: deviations from the equal concentration optimum.
533: The effects of these relatively
534: few proteins on the average over all interacting proteins
535: is small enough not to destroy the concentration correlation, as we observed
536: in the experimental data.
537:
538: In summary, the solution of our simplified complex formation model
539: shows that the rate and efficiency of complex formation depends
540: strongly, and in a non-obvious way, on the relative concentrations
541: of the constituents of the complex. The efficiency is maximized when all
542: concentrations of the different complex constituents are roughly
543: equal. Adding more of the ingredients beyond this optimal point
544: not only reduces the efficiency, but also results in lower product
545: yield. This unexpected behavior is qualitatively explained by a
546: simple argument, and is expected to hold generally. Therefore,
547: effective formation of complexes in a network puts constraints on
548: the concentrations on the underlying building blocks. Accordingly,
549: one can understand the tendency of members of cellular
550: protein-complexes to have uniform concentrations, as presented in
551: the previous Section, as a selection towards efficiency.
552:
553: \appendix*
554: \section{Two coupled complexes}
555: We consider a model in which four proteins $A$, $B$, $C$ and $D$
556: bind together to form two desired products: the $ABC$ and $BCD$ complexes.
557: $A$ and $D$ do not interact, so that there are no complexes or
558: sub-complexes of the type $AD$, $ABD$, $ACD$ and $ABCD$.
559: For simplicity, we assume the totally symmetric situation,
560: where all the ratios of association
561: coefficients to their corresponding dissociation coefficients are equal,
562: i.e., the ratios $k_{d_{x,y}}/k_{a_{x,y}}$ are all equal to
563: $X_0$ and $k_{d_{x,y,z}}/k_{a_{x,y,z}}=X_0^2$, where $X_0$ is a constant with
564: concentrations units. The extension
565: to the more general case discussed in the paper is straight forward.
566: Using the same scaling
567: as above, the reaction equations are solved by the substitutions $AB = A \cdot
568: B$, $AC = A \cdot C$, $BC = B \cdot C$, $BD=B\cdot D$, $CD=C\cdot D$,
569: $ABC = A \cdot B \cdot C$, and $BCD=B\cdot C\cdot D$,
570: and one needs only to solve the material conservation
571: equations, which take the form:
572:
573: \begin{eqnarray}
574: \label{eqA}
575: &A& + A \cdot B + A \cdot C + A \cdot B \cdot C = A_0\\
576: &B& + A \cdot B + B \cdot C + B \cdot D + A \cdot B \cdot C + B \cdot C \cdot D = B_0\nonumber\\ \label{eqB}\\
577: &C& + A \cdot C + B \cdot C + C \cdot D + A \cdot B \cdot C + B \cdot C \cdot D = C_0\nonumber\\ \label{eqC} \\
578: \label{eqD}
579: &D& + B \cdot D + C \cdot D + B \cdot C \cdot D = D_0
580: \end{eqnarray}
581:
582: Denoting $\gamma \equiv \frac{D_0}{A_0}, D' \equiv \frac{D}{\gamma}$,
583: Eq (\ref{eqD}) becomes
584: \begin{equation}
585: D' + D' \cdot B + D' \cdot C + D' \cdot B \cdot C = A_0
586: \end{equation}
587: This is exactly the equation we wrote for A (\ref{eqA}), and thus
588: $D = \gamma A$.
589: Substitutng this into equations (\ref{eqB}) and (\ref{eqC}), one gets
590: \begin{eqnarray}
591: \label{newB}
592: B + B \cdot C + (\gamma + 1)A \cdot B + (\gamma + 1)A \cdot B \cdot C = B_0\\
593: \label{newC}
594: C + B \cdot C + (\gamma + 1)A \cdot C + (\gamma+1)A \cdot B \cdot C = C_0
595: \end{eqnarray}
596: We now define $A' \equiv (\gamma + 1)A$, $A'_0 \equiv(\gamma+1)A_0$ and obtain
597: from (\ref{eqA},\ref{newB},\ref{newC})
598:
599: \begin{eqnarray}
600: A' + A' \cdot B + A' \cdot C + A' \cdot B \cdot C &=& A'_0\\
601: B + A' \cdot B + B \cdot C + A' \cdot B \cdot C &=& B_0\\
602: C + A' \cdot C + B \cdot C + A' \cdot B \cdot C &=& C_0
603: \end{eqnarray}
604:
605: These are the very same equations that we wrote for the 3-particles
606: case where the desired product was $ABC$. Their solution showed that
607: efficiency is maximized at $A_0 \approx B_0 \approx C_0$. We thus
608: conclude that in the present 4-component scenario, the efficiency of
609: $ABC$ and $BCD$ (for fixed $\gamma$) is maximized when
610: $(A_0+D_0)\approx B_0\approx C_0$.
611:
612: \acknowledgements{
613: We thank Ehud Schreiber for critical reading of the manuscript and
614: many helpful comments.
615: E.E. is supported by an Alon fellowship at Tel-Aviv University.}
616:
617: \begin{thebibliography}{10}
618: \bibliographystyle{apsrev}
619: \bibitem{Barabasi}A.L. Barabasi and Z.N. Oltvai, Nat Rev Genet {\bf 5}, 101 (2004).
620: \bibitem{Jeong} H. Jeong {\it et al}, Nature {\bf 411}, 41 (2001).
621: \bibitem{flag} S.H. Yook, Z.N. Oltvai and A.L. Barabasi, Proteomics {\bf 4}, 928 (2004).
622: \bibitem{Han} J.D. Han {\it et al}, Nature {\bf 430}, 88 (2004).
623: \bibitem{Cherry} J.M. Cherry {\it et al}, Nature {\bf 387}, 67 (1997).
624: \bibitem{synexp}R.J. Cho {\it et al},Mol. Cell {\bf 2}, 65 (1998).
625: \bibitem{mrna} T.R. Hughes {\it et al}, Cell 102, 109 (2000).
626: \bibitem{genefus}A.J. Enright, I. Iliopoulos, N.C. Kyrpides, and C.A. Ouzounis,
627: Nature {\bf 402}, 86 (1999);
628: E.M. Marcotte {\it et al}, Science {\bf 285}, 751 (1999).
629: \bibitem{hms}Y. Ho {\it et al}, Nature {\bf 415}, 180 (2002).
630: \bibitem{y2h}P. Uetz {\it et al}, Nature {\bf 403}, 623 (2000);
631: T. Ito {\it et al}, Proc. Natl Acad. Sci. USA {\bf 98}, 4569 (2001).
632: \bibitem{synleth}A.H. Tong {\it et al}, Science {\bf 294}, 2364 (2001).
633: \bibitem{2nei}R. Overbeek {\it et al}. Proc. Natl Acad. Sci. USA {\bf 96},
634: 2896 (1999).
635: \bibitem{tap}A.C. Gavin {\it et al}, Nature {\bf 415}, 141 (2002).
636: \bibitem{von Mering} C. von Mering {\it et al}, Nature 417, 399 (2002).
637: \bibitem{Ghaemmaghami} S. Ghaemmaghami {\it et al}, Nature 425, 737 (2003).
638: \bibitem{Huh} W.K. Huh {\it et al}, Nature 425, 686 (2003).
639: \bibitem{complexes} H. W. Mewes {\it et al}, Nucleic Acids Res. 30, 31 (2002).
640: %\bibitem{Gavin} A.C. Gavin {\it et al}, Nature 415, 141 (2002).
641: \bibitem{Xenarios} I. Xenarios {\it et al}, Nucleic Acids Res. 29, 239 (2001).
642: \bibitem{book}See, e.g., P.L. Brezonik, {\it Chemical Kinetics and
643: Process Dynamics in Aquatic Systems}, Lewis Publishers, 1993 Boca Raton,
644: FL, USA.
645:
646:
647:
648:
649: \end{thebibliography}
650:
651: \end{document}
652: