0508:q-bio0508021/draft.tex

1: \documentclass[aps,twocolumn]{revtex4}

2: \usepackage[dvips]{graphicx}

3: \usepackage{color}

4: %\usepackage{AMSfonts}

5: %\usepackage{amssymb,amsmath}

6:

7: \begin{document}

8:

9: \title{Connectivity and expression in protein networks: \\

10: Proteins in a complex are uniformly expressed}

11:

12: % use optional labels to link authors explicitly to addresses:

13: % \author[label1,label2]{}

14: % \address[label1]{}

15: % \address[label2]{}

16:

17: \author{Shai Carmi$^1$, Erez Y. Levanon$^2$, Shlomo Havlin$^1$, Eli Eisenberg$^3$}

18:

19: \affiliation{$^1$Minerva Center and Dept.\ of Physics,

20: Bar-Ilan University, Ramat-Gan 52900, Israel}

21: \affiliation{$^2$Compugen Ltd., 72 Pinhas Rosen St., Tel-Aviv 69512, Israel}

22: \affiliation{$^3$School of Physics and Astronomy, Raymond and Beverly Sackler

23: Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel}

24:

25: \begin{abstract}

26: We explore the interplay between the protein-protein interactions network

27: and the expression of the interacting proteins.

28: It is shown that interacting proteins are expressed in

29: significantly more similar cellular concentrations. This is

30: largely due to interacting pairs which are part of protein

31: complexes. We solve a generic model of complex formation and show

32: explicitly that complexes form most efficiently when their

33: members have roughly the same concentrations. Therefore, the

34: observed similarity in interacting protein concentrations could be

35: attributed to optimization for efficiency of complex formation.

36: \end{abstract}

37:

38: %\begin{keyword}

39: % keywords here, in the form: keyword \sep keyword

40:

41: % PACS codes here, in the form: \PACS code \sep code

42: %\PACS

43: %\end{keyword}

44: \maketitle

45:

46: \section{Introduction}

47:

48: Statistical analysis of real-world networks topology has attracted

49: much interest in recent years, proving to supply new insights and

50: ideas to many diverse fields. In particular, the protein-protein

51: interaction network, combining many different interactions of

52: proteins within a cell, has been the subject of many studies

53: (for a recent review see \cite{Barabasi}).

54: While this network shares many of the universal

55: features of natural networks such as the scale-free distribution

56: of degrees \cite{Jeong}, and the small world characteristics

57: \cite{flag}, it also has some unique features. One of the most

58: important of these is arguably the fact that the protein interactions

59: underlying this network can be separated into two roughly disjoint

60: classes. One of them relates to transmission of information within

61: the cell: protein A interacts with protein B and changes it, by a

62: conformational or chemical transformation. The usual scenario

63: after such an interaction is that the two proteins disassociate

64: shortly after the completion of the transformation. On the other

65: hand, many protein interactions are aimed at the formation of a

66: protein complex. In this mode of operation the physical attachment

67: of two or more proteins is needed in order to allow for the

68: biological activity of the combined complex, and is typically

69: stable over relatively long time scales \cite{Han}.

70:

71: The yeast {\it Saccharomyces cerevisiae} serves as the model

72: organism for most of the analyses of protein-protein interaction

73: network. The complete set of genes and proteins with extensive

74: data on gene expression are available \cite{Cherry} for this

75: unicellular organism, accompanied by large datasets of

76: protein-protein interactions based on a wide range of experimental

77: and computational methods

78: \cite{synexp,mrna,genefus,hms,y2h,synleth,2nei,tap,von Mering}.

79: In addition, the intracellular locations

80: and the expression levels of most proteins of the yeast were recently

81: reported \cite{Ghaemmaghami}. The availability of such data enables

82: us to study the relationship between network topology and the

83: expression levels of each protein.

84:

85: In this work we demonstrate the importance of the distinction

86: between different types of protein interaction, by highlighting

87: one property which is unique to interactions of the protein

88: complexes. Combining databases of yeast protein interactions

89: with the recently reported information on the protein

90: concentration, we find that proteins belonging to the same complex

91: tend to have a more uniform concentration distribution. We further

92: explain this finding by a model of complex formation, showing that

93: uneven concentrations of the complex members result in inefficient

94: complex formation. Surprisingly, in some cases increasing the

95: concentration of one of the complex ingredients decreases the

96: absolute number of complexes formed. Thus, the experimental

97: observation of uniform complex members concentrations can be

98: explained in terms of selection for efficiency.

99:

100: \section{Concentrations of Interacting Proteins}

101:

102: We start by studying the concentrations of pairs of interacting proteins,

103: and demonstrate that different types of protein-protein interactions differ

104: in their properties. For this purpose we use the recently published database

105: providing the (average) concentration \cite{Ghaemmaghami},

106: as well as the localization within the cell,

107: for most of the {\it Saccharomyces cerevisiae}

108: (baker's yeast) proteins \cite{Huh}.

109: The concentrations $c_i$ (given in arbitrary units)

110: are approximately distributed according to a log-normal distribution

111: with $\langle log(c_i)\rangle=7.89$

112: and standard deviation $1.53$ (Fig. \ref{lognormal}).

113:

114:

115: \begin{figure}[htb]

116: \centering

117: \includegraphics[totalheight=1.5in]{conc-pdf.eps}

118: \caption{ {(Color online) Distribution of the logarithm of the protein

119: concentration (in units of protein molecules per cell) for all measured

120: proteins within the yeast cell.}}

121: \label{lognormal}

122: \end{figure}

123:

124:

125: The bakers' yeast serves as a model organism for most of the

126: protein-protein interaction network studies. Thus a set of many of

127: its protein-protein interactions is also readily available. Here

128: we use a dataset of recorded yeast protein interactions, given

129: with various levels of confidence \cite{von Mering}. The dataset

130: lists about $80000$ interactions between approximately $5300$

131: of the yeast proteins (or about $12000$ interactions between

132: $2600$ proteins when excluding interactions of the lowest

133: confidence). These interactions were deduced by many different

134: experimental methods, and describe different biological relations

135: between the proteins involved. The protein interaction network

136: exhibits a high level of clustering (clustering coefficient

137: $\approx0.39$). This is partly due to the existence of many sets

138: of proteins forming complexes, where each of the complex members

139: interacts with many other members.

140:

141: Combining these two databases, we study the correlation between the

142: (logarithm of) concentrations of pairs of interacting

143: proteins. In order to gain insight into the different components

144: of the network, we perform this calculation separately for the interactions

145: deduced by different experimental methods.

146: For simplicity, we report here the results after excluding the interactions

147: annotated as

148: low-confidence (many of which are expected to be false-positives).

149: We have explicitly checked that their inclusion does not change the results

150: qualitatively. The results are summarized in Table \ref{table}, and show a

151: significant correlation between the expression levels of

152: interacting proteins.

153:

154: \begin{widetext}

155: \begin{table}[b]

156: \begin{tabular}{|*{8}{c|}}

157: \hline Interaction & Number of & Number of & Number

158: of & Correlation & STD of & P-Value \\

159: & interacting & interactions & interactions in &

160: between & random &  \\

161: & proteins &&  which expression & expression &  correlations \\

162: &&& level is & levels of &&   \\

163: &&&  known for & interacting &&  \\

164: &&&  both proteins &     proteins && \\

165: \hline

166: All & 2617 & 11855 & 6347 & 0.167 & 0.012 & $10^{-42}$ \\

167: \hline

168: Synexpression\cite{synexp,mrna}

169: & 260 & 372 & 200 & 0.4 & 0.065 & $3.5\cdot10^{-10}$ \\

170: \hline

171: Gene Fusion\cite{genefus} & 293 & 358 & 174 & -0.079 & - & - \\

172: \hline

173: HMS\cite{hms} & 670 & 1958 & 1230 & 0.164 & 0.027 & $3.3\cdot10^{-10}$ \\

174: \hline

175: yeast 2-Hybrid\cite{y2h}& 954 & 907 & 501 & 0.097 & 0.046 & $1.7 \cdot 10^{-2}$ \\

176: \hline

177: Synthetic Lethality\cite{synleth} & 678 & 886 & 497 & 0.285 & 0.045 & $1.2\cdot10^{-10}$ \\

178: \hline

179: 2-neighborhood\cite{2nei} & 998 & 6387 & 3110 & 0.054 & 0.016 & $5.4\cdot10^{-4}$ \\

180: \hline

181: TAP\cite{tap} & 806 & 3676 & 2239 & 0.291 & 0.02 & $10^{-49}$ \\

182: \hline

183: \end{tabular}

184: \caption{ (Color online) Correlation coefficients between the logarithm of the

185: concentrations of interacting proteins. Only interactions of medium

186: or high confidence were included. The statistical

187: significance of the results was estimated by randomly permuting

188: the concentrations of the proteins and reevaluating the

189: correlation on the same underlying network,  repeated for 1,000

190: different permutations. The mean correlation of the randomly

191: permuted networks was zero, and the standard deviation (STD) is given.

192: The P-value was calculated assuming gaussian distribution of the

193: correlation values for the randomized networks. We have verified

194: that the distributions of the 1,000 realizations calculated are

195: roughly Guassian.} \label{table}

196: \end{table}

197: \end{widetext}

198:

199: The strongest correlation is seen for the subset of protein

200: interactions which were derived from synexpression, i.e. inferred

201: from correlated mRNA expression. This result confirms the common

202: expectation that genes with correlated mRNA expression would yield

203: correlated protein levels as well\cite{mrna}. However, our results show that

204: interacting protein pairs whose interaction was deduced by other methods

205: exhibit significant positive correlation as well. The effect is weak for

206: the yeast 2-Hybrid (Y2H) method\cite{y2h} which includes all possible

207: physical interactions between the proteins (and is also known to

208: suffer from many artifacts and false-positives), but stronger for

209: the HMS (High-throughput Mass Spectrometry)\cite{hms} and TAP

210: (Tandem-Affinity Purification)\cite{tap} interactions corresponding to

211: actual physical interactions (i.e., experimental evidence that the

212: proteins actually bind together in-vivo). These experimental

213: methods are specifically designed to detect cellular protein

214: complexes. The above results thus hint that the overall

215: correlation between concentrations of interacting proteins is due

216: to the tendency of proteins which are part of a stable complex to

217: have similar concentrations.

218:

219: The same picture emerges when one counts the number of

220: interactions a protein has with other proteins of similar

221: concentration, compared to the number of interactions with

222: randomly chosen proteins. A protein interacts, on average, with

223: $0.49\%$ of the proteins with similar expression level (i.e.,

224: $|$log-difference$| < 1$), as opposed to only $0.36\pm 0.01~\%$ of

225: random proteins, in agreement with the above observation of

226: complex members having similar protein concentrations.

227:

228: In order to directly test this hypothesis (i.e. that proteins in a

229: complex have similar concentrations), we use existing datasets of

230: protein complexes and study the uniformity of concentrations of

231: members of each complex. The complexes data were taken from

232: \cite{complexes}, and were found to have many TAP interactions

233: within them.

234: %Each complex is a list of

235: %proteins; we got $\approx1000$ complexes, of average size $7.4$

236: %and a wide distribution of sizes (standard deviation $9.06$).

237: %First, we compared the complex list to the TAP network. More than

238: %one half of the proteins that participate in a complex were also

239: %found to interact by the TAP method, and the average connectivity

240: %within a complex (i.e., the fraction of pairs of complex members

241: %which interact according to the TAP database, out of the possible

242: %$n(n-1)/2$ possible pairs) is roughly one third.

243: As a measure of

244: the uniformity of the expression levels within each complex, we

245: calculate the variance of the (logarithm of the)

246: concentrations among the members of each complex.

247: The average variance (over all complexes) is found to be

248: $2.35$, compared to $2.88\pm0.07$ and $2.74\pm0.11$

249: for randomized complexes in two different randomization schemes (see

250: figure), confirming that the concentrations of complex members

251: tend to be more uniform than a random set of proteins.

252:

253: \begin{figure}[htb]

254: \centering

255: \includegraphics[totalheight=1.5in]{comp-penta-std.eps}

256: \caption{{(Color online) (a) Variance of the logarithm of the

257: protein expression levels (in units of mulecules per cell)

258: for members of real complexes, averaged over all complexes,

259: comapred with the averaged variance of the complexes after

260: randomization of their members, letting each protein participate on average in

261: the same number of complexes (random(1)),

262: as well as randomized complexes where the number

263: of complexes each protein participates in is kept fixed (random(2)).

264: Real complexes have a lower variance, indicating higher uniformity in the

265: expression levels of the underlying proteins. (b) Same as (a) for

266: expression levels in pentagons (see text).}} \label{unity}

267: \end{figure}

268:

269:

270: As another test, we study a different yeast protein interaction

271: network, the one from the DIP database \cite{Xenarios}. We look

272: for fully-connected sub-graphs of size $5$, which are expected to

273: represent complexes, sub-complexes or groups of proteins working

274: together. The network contains approximately $1600$ (highly overlapping)

275: such pentagons, made of about $300$ different proteins. The

276: variance of the logarithm of the concentrations of each

277: pentagon members, averaged over the different pentagons, is 1.234.

278: As before, this is a significantly low variance compared with random

279: sets of five proteins (average variance $1.847 \pm 0.02$ and $1.718\pm0.21$),

280: see figure \ref{unity}.

281:

282: Finally, we have used mRNA expression data \cite{mrna} and looked

283: for correlated expression patterns within complexes. We have

284: calculated the correlation coefficient between the expression data

285: of the two proteins for each pair of proteins which are part of

286: the same pentagon. The average correlation coefficient between

287: proteins belonging to the same fully-connected pentagon is $0.15$

288: compared to $0.056\pm0.005$ for a random pair.

289:

290: In summary, combination of a number of yeast protein interaction

291: networks with protein and mRNA expression data yields the

292: conclusion that interacting proteins tend to have similar

293: concentrations. The effect is stronger when focusing on

294: interactions which represent stable physical interactions, i.e.

295: complex formation, suggesting that the overall effect is largely

296: due to the uniformity in the concentrations of proteins belonging

297: to the same complex. In the next Section we explain this finding

298: by a model of complex formation. We show, on general grounds, that

299: complex formation is more effective when the concentrations of its

300: constituents is roughly the same. Thus, the observation made in

301: the present Section can be explained by selection for efficiency

302: of complex formation.

303:

304: \section {Model}

305: Here we study a model of complex formation, and explore the

306: effectiveness of complex production as a function of the relative abundances of

307: its constituents. For simplicity, we start by a detailed analysis of the

308: three-components complex production, which already captures most

309: of the important effects.

310:

311: Denote the concentrations of the three components of the complex

312: by $A$, $B$ and $C$, and the concentrations

313: of the complexes they form by $AB$, $AC$, $BC$ and $ABC$.

314: The latter is the concentration of the full complex, which is

315: the desired outcome of

316: the production, while the first three describe the different sub-complexes

317: which are formed (in this case, each of which is composed of two components).

318: Three-body processes, i.e., direct generation

319: (or decomposition) of $ABC$ out of $A$ $B$ and $C$, can usually be neglected

320: \cite{book}, but their inclusion here does not complicate the analysis.

321: The resulting set of reaction kinetic equations is given by

322:

323: \begin{widetext}

324: \begin{eqnarray}

325: \frac{d(A)}{dt} & = & k_{d_{A,B}} AB + k_{d_{A,C}} AC +

326: (k_{d_{A,BC}} + k_{d_{A,B,C}}) \cdot ABC \nonumber\\ && - k_{a_{A,B}} A

327: \cdot B - k_{a_{A,C}} A \cdot C - k_{a_{A,BC}} A \cdot BC -

328: k_{a_{A,B,C}} A \cdot B \cdot C \\

329: \frac{d(B)}{dt}& = & k_{d_{A,B}} AB + k_{d_{B,C}} BC +

330: (k_{d_{B,AC}} + k_{d_{A,B,C}}) \cdot ABC \nonumber\\ &&- k_{a_{A,B}} A

331: \cdot B - k_{a_{B,C}} B \cdot C - k_{a_{B,AC}} B \cdot AC - k_{a_{A,B,C}} A \cdot B \cdot C\\

332: \frac{d(C)}{dt}& = & k_{d_{A,C}} AC + k_{d_{B,C}} BC +

333: (k_{d_{C,AB}} + k_{d_{A,B,C}}) \cdot ABC \nonumber\\ && - k_{a_{A,C}} A

334: \cdot C - k_{a_{B,C}} B \cdot C -

335: k_{a_{C,AB}} C \cdot AB - k_{a_{A,B,C}} A \cdot B \cdot C\\

336: \frac{d(AB)}{dt} & = & k_{a_{A,B}} A \cdot B + k_{d_{C,AB}} ABC -

337: k_{d_{A,B}} AB - k_{a_{C,AB}} C \cdot AB\\

338: \frac{d(AC)}{dt} & = & k_{a_{A,C}} A \cdot C + k_{d_{B,AC}} ABC -

339: k_{d_{A,C}} AC - k_{a_{B,AC}} B \cdot AC\\

340: \frac{d(BC)}{dt} & = & k_{a_{B,C}} B \cdot C + k_{d_{A,BC}} ABC -

341: k_{d_{B,C}} BC - k_{a_{A,BC}} A \cdot BC\\

342: \frac{d(ABC)}{dt} & = & k_{a_{A,BC}} A \cdot BC + k_{a_{B,AC}} B

343: \cdot AC + k_{a_{C,AB}} C \cdot AB + k_{a_{A,B,C}} A \cdot B \cdot

344: C \nonumber\\ && - (k_{d_{A,BC}} + k_{d_{B,AC}} + k_{d_{C,AB}} +

345: k_{d_{A,B,C}}) \cdot ABC

346: \end{eqnarray}

347: \end{widetext}

348: where $k_{a_{x,y}}$ ($k_{d_{x,y}}$) are the association

349: (dissociation) rates of the subcomponents $x$ and $y$ to form the

350: complex $xy$. Denoting the total number of type $A$, $B$ and $C$

351: particles by $A_0$, $B_0$, $C_0$, respectively, we may write the

352: conservation of material equations:

353: \begin{eqnarray}

354: A + AB + AC + ABC = A_0\\

355: B + BC + AB + ABC = B_0\\

356: A + AC + BC + ABC = C_0

357: \end{eqnarray}

358:

359: We look for the steady-state solution of these equations, where

360: all time derivatives vanish. For simplicity, we consider first the

361: totally symmetric situation, where all the ratios of association

362: coefficients to their corresponding dissociation coefficients are equal,

363: i.e., the ratios $k_{d_{x,y}}/k_{a_{x,y}}$ are all equal to

364: $X_0$ and $k_{d_{x,y,z}}/k_{a_{x,y,z}}=X_0^2$, where $X_0$ is a constant with

365: concentrations units. In this case,

366: measuring all concentrations in units of $X_0$, all the

367: reaction equations are solved by the substitutions $AB = A \cdot

368: B$, $AC = A \cdot C$, $BC = B \cdot C$ and $ABC = A \cdot B \cdot

369: C$, and one needs only to solve the material conservation

370: equations, which take the form:

371: \begin{eqnarray}

372: A + A \cdot B + A \cdot C + A \cdot B \cdot C = A_0\\

373: B + B \cdot C + A \cdot B + A \cdot B \cdot C = B_0\\

374: A + A \cdot C + B \cdot C + A \cdot B \cdot C = C_0

375: \end{eqnarray}

376: These equations allow for an exact and straight-forward (albeit

377: cumbersome) analytical solution. In the following, we explore the

378: properties of this solution. The efficiency of the production of

379: $ABC$, the desired complex, can be measured by the number of

380: formed complexes relative to the maximal number of complexes

381: possible given the initial concentrations of supplied particles

382: ${\rm eff} \equiv ABC / \min{(A_0,B_0,C_0)}$. This definition does

383: not take into account the obvious waste resulting from proteins of

384: the more abundant species which are bound to be leftover due to

385: shortage of proteins of the other species. In the following we

386: show that having unmatched concentrations of the different complex

387: components result in lower efficiency beyond this obvious waste.

388:

389: In the linear regime, $A_0, B_0, C_0 \ll 1$, the fraction of

390: particles forming complexes is small, and all concentrations are

391: just proportional to the initial concentrations. The overall

392: efficiency of the process in this regime is extremely low,

393: $ABC=A\cdot B\cdot C \sim A_0\cdot B_0\cdot C_0\ll A_0,B_0,C_0$.

394: We thus go beyond this trivial linear regime, and focus on the

395: region where all concentrations are greater than unity. Fig.

396: \ref{1k} presents the efficiency as a function of $A_0$ and $B_0$,

397: for fixed $C_0 = 10^2$. The efficiency is maximized when the two

398: more abundant components have approximately the same concentration,

399: i.e., for $A_0 \approx B_0$ (if $C_0<A_0,B_0$), for $A_0\approx C_0=10^2$

400: (if $B_0<A_0,C_0$) and for $B_0\approx C_0=10^2$

401: (if $A_0<B_0,C_0$).

402:

403: \begin{figure}[htb]

404: \centering

405: \includegraphics[totalheight=1.5in]{1-k-A0B0.eps}

406: \caption{{(Color online) The efficiency of the synthesis ${\rm eff} \equiv

407: ABC / \min{(A_0,B_0,C_0)}$ as a function of $A_0$ and $B_0$, for

408: $C_0=10^2$. The efficiency is  maximized when the two most

409: abundant species have roughly the same concentration.}} \label{1k}

410: \end{figure}

411:

412: Moreover, looking at the absolute quantity of the complex product,

413: one observes (fixing the concentrations of two of substances,

414: e.g., $B_0$ and $C_0$) that $ABC$ itself has a maximum at some

415: finite $A_0$, i.e., there is a finite optimal concentration for

416: $A$ particles (see Fig. \ref{a0max}). Adding more molecules of

417: type $A$ beyond the optimal concentration {\it decreases} the

418: amount of the desired complexes. The concentration that maximizes

419: the overall production of the three-component complex is

420: $A_{0,max} \approx \max{(B_0,C_0)}$.

421:

422: \begin{figure}[htb]

423: \centering

424: \includegraphics[totalheight=1.5in]{A0-max-image.eps}

425: \caption{(Color online) {$\log{(ABC)}$ as a function of $A_0,B_0$, for fixed

426: $C_0 = 10^2$.

427: For each row (fixed $A_0$) or column (fixed $B_0$) in the graph,

428: $ABC$ has a maximum, which occurs where $A_{0,max}

429: \approx\max{(B_0,C_0)}$ (for columns), and $B_{0,max}

430: \approx\max{(A_0,C_0)}$ (for rows).}} \label{a0max}

431: \end{figure}

432:

433:

434: An analytical solution is available for a somewhat more general situation,

435: allowing the ratios $k_{d_{x,y}}/k_{a_{x,y}}$ to take different

436: values for the two-components association/dissociation ($X_0$) and

437: the three-components association/dissociation ($X_0/\alpha$ and

438: $X_0^2/\alpha$ for association/dissociation of the three-component complex

439: from/to a two-component complex plus one single particle or to three

440: single particles, respectively).

441: It can be easily seen that under these conditions, and measuring the

442: concentration in units of $X_0$ again,

443: the solution of the reaction kinetics equations is given by

444: \begin{eqnarray}

445: AB & = & A \cdot B, \\

446: AC & = & A \cdot C, \\

447: BC & = & B \cdot C, \\

448: ABC & = & \alpha ~ A \cdot B \cdot C,

449: \end{eqnarray}

450: and therefore the conservation of material equations take the form

451: \begin{eqnarray}

452: A + A \cdot B + A \cdot C + \alpha A \cdot B \cdot C = A_0\\

453: B + B \cdot C + A \cdot B + \alpha A \cdot B \cdot C = B_0\\

454: A + A \cdot C + B \cdot C + \alpha A \cdot B \cdot C = C_0

455: \end{eqnarray}

456: These equations are also amenable for an analytical solution, and

457: one finds that taking $\alpha$ not equal to $1$

458: does not qualitatively change the above results. In particular,

459: the synthesis is most efficient when the two highest concentrations are

460: roughly equal, see Fig. \ref{4k}. Note that our results hold

461: even for $\alpha\gg 1$, where the three-component complex is much more stable

462: than the intermediate $AB$, $AC$, and $BC$ states.

463:

464: \begin{figure}[htb]

465: \centering

466: \includegraphics[totalheight=2.5in]{4-k-A0B0.eps}

467: \caption{(Color online) {Synthesis efficiency ${\rm eff}\equiv ABC /

468: \min{(A_0,B_0,C_0)}$ as a function of $A_0$ and $B_0$, for

469: different values of $\alpha$. $C_0$ is fixed, $C_0=100$. The

470: efficiency is maximized when the two most abundant substances are

471: of roughly the same concentration, regardless of the values of

472: $\alpha$.}} \label{4k}

473: \end{figure}

474:

475: We have explicitly checked that the same picture holds for

476: 4-component complexes as well: fixing the concentrations $B_0$,

477: $C_0$, and $D_0$, the concentration of the target complex $ABCD$

478: is again maximized for $A_{0,max} \approx\max{(B_0,C_0,D_0)}$.

479: This behavior is expected to hold qualitatively for a general

480: number of components and arbitrary reaction rates, due to the

481: following argument: Assume a complex is to be produced from many

482: constituents, one of which ($A$) is far more abundant than the

483: others ($B$, $C$, ...). Since $A$ is in excess, almost all $B$

484: particles will bound to $A$ and form $AB$ complexes. Similarly,

485: almost all $C$ particles will bound to $A$ to form an $AC$

486: complex. Thus, there will be very few free $C$ particles to bound

487: to the $AB$ complexes, and very few free $B$ particles available

488: for binding with the $AC$ complexes. As a result, one gets

489: relatively many half-done $AB$ and $AC$ complexes, but not the

490: desired $ABC$ (note that $AB$ and $AC$ cannot bound together).

491: Lowering the concentration of $A$ particles allows more $B$ and

492: $C$ particles to remain in an unbounded state, and thus {\it

493: increases} the total production rate of $ABC$ complexes (Fig.

494: \ref{ABCAB}).

495:

496: \begin{figure}[htb]

497: \centering

498: \includegraphics[totalheight=1.5in]{ABCvsAB.eps}

499: \caption{(Color online) The dimensionless

500: concentrations of the complex $ABC$ (solid line),

501: partial complex $AB$ (dashed line), and $C$ (dotted line) as a

502: function of the total concentration of $A$ particles, $A_0$ ($C$

503: is multiplied by 10 for visibility). $B_0$ and $C_0$ are fixed

504: $B_0 = C_0 = 10^3$. The maximum of $ABC$ for finite $A_0$ is a

505: result of the balance between increase in the number of $AB$ and

506: $AC$ complexes and the decrease in the number of available free

507: $B$ and $C$ particles as $A_0$ increases.} \label{ABCAB}

508: \end{figure}

509:

510: Many proteins take part in more than one complex. One might thus wonder

511: what is the optimal concentration for these, and how it affects

512: the general correlation observed between the concentrations of

513: members of the same complex. In order to clarify this issue,

514: we have studied a model in which four proteins $A$, $B$, $C$ and $D$

515: bind together to form two desired products: the $ABC$ and $BCD$ complexes.

516: $A$ and $D$ do not interact, so that there are no complexes or

517: sub-complexes of the type $AD$, $ABD$, $ACD$ and $ABCD$. Solution of this

518: model (see appendix) reveals that the efficiency of the production of $ABC$

519: and $BCD$ is maximized when (for a fixed ratio of $A_0$ and $D_0$)

520: $A_0+D_0\approx B_0\approx C_0$. One thus sees, as could have been expected,

521: that proteins that are

522: involved in more than one complex

523: (like $B$ and $C$ in the above model) will tend to have higher concentrations

524: than other members of the same complex participating in only one complex.

525: Nevertheless, since the protein-protein interaction network is scale-free,

526: most proteins take part in a small-number of complexes, and only a very

527: small fraction participate in many complexes. Moreover, given the three

528: orders of magnitude spread in protein concentrations (see figure

529: \ref{lognormal}),

530: only proteins participating in a very large number of complexes (relative to the avregae participation) or participating in two complexes of a very different

531: concentrations (i.e., $A_0\gg D_0$) will result in order-of-magnitude

532: deviations from the equal concentration optimum.

533: The effects of these relatively

534: few proteins on the average over all interacting proteins

535: is small enough not to destroy the concentration correlation, as we observed

536: in the experimental data.

537:

538: In summary, the solution of our simplified complex formation model

539: shows that the rate and efficiency of complex formation depends

540: strongly, and in a non-obvious way, on the relative concentrations

541: of the constituents of the complex. The efficiency is maximized when all

542: concentrations of the different complex constituents are roughly

543: equal. Adding more of the ingredients beyond this optimal point

544: not only reduces the efficiency, but also results in lower product

545: yield. This unexpected behavior is qualitatively explained by a

546: simple argument, and is expected to hold generally. Therefore,

547: effective formation of complexes in a network puts constraints on

548: the concentrations on the underlying building blocks. Accordingly,

549: one can understand the tendency of members of cellular

550: protein-complexes to have uniform concentrations, as presented in

551: the previous Section, as a selection towards efficiency.

552:

553: \appendix*

554: \section{Two coupled complexes}

555: We consider a model in which four proteins $A$, $B$, $C$ and $D$

556: bind together to form two desired products: the $ABC$ and $BCD$ complexes.

557: $A$ and $D$ do not interact, so that there are no complexes or

558: sub-complexes of the type $AD$, $ABD$, $ACD$ and $ABCD$.

559: For simplicity, we assume the totally symmetric situation,

560: where all the ratios of association

561: coefficients to their corresponding dissociation coefficients are equal,

562: i.e., the ratios $k_{d_{x,y}}/k_{a_{x,y}}$ are all equal to

563: $X_0$ and $k_{d_{x,y,z}}/k_{a_{x,y,z}}=X_0^2$, where $X_0$ is a constant with

564: concentrations units. The extension

565: to the more general case discussed in the paper is straight forward.

566: Using the same scaling

567: as above, the reaction equations are solved by the substitutions $AB = A \cdot

568: B$, $AC = A \cdot C$, $BC = B \cdot C$, $BD=B\cdot D$, $CD=C\cdot D$,

569: $ABC = A \cdot B \cdot C$, and $BCD=B\cdot C\cdot D$,

570: and one needs only to solve the material conservation

571: equations, which take the form:

572:

573: \begin{eqnarray}

574: \label{eqA}

575: &A& + A \cdot B + A \cdot C + A \cdot B \cdot C = A_0\\

576: &B& + A \cdot B + B \cdot C + B \cdot D + A \cdot B \cdot C + B \cdot C \cdot D = B_0\nonumber\\ \label{eqB}\\

577: &C& + A \cdot C + B \cdot C + C \cdot D + A \cdot B \cdot C + B \cdot C \cdot D = C_0\nonumber\\ \label{eqC} \\

578: \label{eqD}

579: &D& + B \cdot D + C \cdot D + B \cdot C \cdot D = D_0

580: \end{eqnarray}

581:

582: Denoting $\gamma \equiv \frac{D_0}{A_0}, D' \equiv \frac{D}{\gamma}$,

583: Eq (\ref{eqD}) becomes

584: \begin{equation}

585: D' + D' \cdot B + D' \cdot C + D' \cdot B \cdot C = A_0

586: \end{equation}

587: This is exactly the equation we wrote for A (\ref{eqA}), and thus

588: $D = \gamma A$.

589: Substitutng this into equations (\ref{eqB}) and (\ref{eqC}), one gets

590: \begin{eqnarray}

591: \label{newB}

592: B + B \cdot C + (\gamma + 1)A \cdot B + (\gamma + 1)A \cdot B \cdot C = B_0\\

593: \label{newC}

594: C + B \cdot C + (\gamma + 1)A \cdot C + (\gamma+1)A \cdot B \cdot C = C_0

595: \end{eqnarray}

596: We now define $A' \equiv (\gamma + 1)A$, $A'_0 \equiv(\gamma+1)A_0$ and obtain

597: from (\ref{eqA},\ref{newB},\ref{newC})

598:

599: \begin{eqnarray}

600: A' + A' \cdot B + A' \cdot C + A' \cdot B \cdot C &=& A'_0\\

601: B + A' \cdot B + B \cdot C + A' \cdot B \cdot C &=& B_0\\

602: C + A' \cdot C + B \cdot C + A' \cdot B \cdot C &=& C_0

603: \end{eqnarray}

604:

605: These are the very same equations that we wrote for the 3-particles

606: case where the desired product was $ABC$. Their solution showed that

607: efficiency is maximized at $A_0 \approx B_0 \approx C_0$. We thus

608: conclude that in the present 4-component scenario, the efficiency of

609: $ABC$ and $BCD$ (for fixed $\gamma$) is maximized when

610: $(A_0+D_0)\approx B_0\approx C_0$.

611:

612: \acknowledgements{

613: We thank Ehud Schreiber for critical reading of the manuscript and

614: many helpful comments.

615: E.E. is supported by an Alon fellowship at Tel-Aviv University.}

616:

617: \begin{thebibliography}{10}

618: \bibliographystyle{apsrev}

619: \bibitem{Barabasi}A.L. Barabasi and Z.N. Oltvai, Nat Rev Genet {\bf 5}, 101 (2004).

620: \bibitem{Jeong} H. Jeong {\it et al}, Nature {\bf 411}, 41 (2001).

621: \bibitem{flag} S.H. Yook, Z.N. Oltvai and A.L. Barabasi, Proteomics {\bf 4}, 928 (2004).

622: \bibitem{Han} J.D. Han {\it et al}, Nature {\bf 430}, 88 (2004).

623: \bibitem{Cherry} J.M. Cherry {\it et al}, Nature {\bf 387}, 67 (1997).

624: \bibitem{synexp}R.J. Cho {\it et al},Mol. Cell {\bf 2}, 65 (1998).

625: \bibitem{mrna} T.R. Hughes {\it et al}, Cell 102, 109 (2000).

626: \bibitem{genefus}A.J. Enright, I. Iliopoulos, N.C. Kyrpides, and C.A. Ouzounis,

627: Nature {\bf 402}, 86 (1999);

628: E.M. Marcotte {\it et al}, Science {\bf 285}, 751 (1999).

629: \bibitem{hms}Y. Ho {\it et al}, Nature {\bf 415}, 180 (2002).

630: \bibitem{y2h}P. Uetz {\it et al}, Nature {\bf 403}, 623 (2000);

631: T. Ito {\it et al}, Proc. Natl Acad. Sci. USA {\bf 98}, 4569 (2001).

632: \bibitem{synleth}A.H. Tong {\it et al}, Science {\bf 294}, 2364 (2001).

633: \bibitem{2nei}R. Overbeek {\it et al}. Proc. Natl Acad. Sci. USA {\bf 96},

634: 2896 (1999).

635: \bibitem{tap}A.C. Gavin {\it et al}, Nature {\bf 415}, 141 (2002).

636: \bibitem{von Mering} C. von Mering {\it et al}, Nature 417, 399 (2002).

637: \bibitem{Ghaemmaghami} S. Ghaemmaghami {\it et al}, Nature 425, 737 (2003).

638: \bibitem{Huh} W.K. Huh {\it et al}, Nature 425, 686 (2003).

639: \bibitem{complexes} H. W. Mewes {\it et al}, Nucleic Acids Res. 30, 31 (2002).

640: %\bibitem{Gavin} A.C. Gavin {\it et al}, Nature 415, 141 (2002).

641: \bibitem{Xenarios} I. Xenarios {\it et al}, Nucleic Acids Res. 29, 239 (2001).

642: \bibitem{book}See, e.g., P.L. Brezonik, {\it Chemical Kinetics and

643: Process Dynamics in Aquatic Systems}, Lewis Publishers, 1993 Boca Raton,

644: FL, USA.

645:

646:

647:

648:

649: \end{thebibliography}

650:

651: \end{document}

652: