0503:q-bio0503018/paper.tex

1: %\documentclass[a4]{article}

2: %\usepackage{cb}

3:

4: \documentclass[12pt]{article}

5: \usepackage{graphicx}

6:

7: % autolabel: eq:4

8:

9: \author{Christoph Best\footnote{Electronic mail: {\tt

10:       christoph.best@ifi.lmu.de}},\ \ Ralf Zimmer, and Joannis Apostolakis\\

11:         Institute for Informatics, LMU, \\

12:         Amalienstr. 17, 80333 M\"unchen, Germany}

13: \date{June 15, 2004}

14:

15: \title{Probabilistic methods for predicting protein functions

16:        in protein-protein interaction networks}

17:

18: \newcommand{\nnm}{\nonumber}

19: \newcommand{\onehalf}{{\scriptstyle \frac{1}{2}}}

20: \newcommand{\const}{{\rm const}}

21:

22: \begin{document}

23: %\selectlanguage{english}

24:

25: \maketitle

26:

27: \begin{abstract}

28:   We discuss probabilistic methods for predicting protein functions

29:   from protein-protein interaction networks.  Previous work based on

30:   Markov Randon Fields is extended and compared to a general

31:   machine-learning theoretic approach.  Using actual protein

32:   interaction networks for yeast from the MIPS database and GO-SLIM

33:   function assignments, we compare the predictions of the different

34:   probabilistic methods and of a standard support vector machine. It

35:   turns out that, with the currently available networks, the simple

36:   methods based on counting frequencies perform as well as the more

37:   sophisticated approaches.

38: \end{abstract}

39:

40: \section{Introduction}

41:

42: Large-scale comprehensive protein-protein interaction data, which have

43: become available recently, open the possibility of deriving new

44: information about proteins from their associations in the interaction

45: graph. In the following, we discuss and compare several probabilistic

46: methods for predicting protein functions from the functions of

47: neighboring proteins in the interaction graph.

48:

49: In particular, we compare two recently published methods that are

50: based on Markov Random Fields \cite{Letovsky,Deng} with a prediction

51: based on a machine-learning appproach using maximum-likelihood

52: parameter estimation. It turns out that all three approaches can be

53: considered different versions of each other using different

54: approximations.  The main difference between the Markov Random Field (MRF)

55: and the machine-learning methods is that the former apprach

56: takes a global look at the network, while the latter considers each

57: networks node as an independent training example. However, in the

58: mean-field approximation required to make the MRF approach numerically

59: tractable, it is reduced to considering each node independently. The

60: local enrichment-method considered in \cite{Letovsky} can then be

61: interpreted as another approximation which enables us to make

62: predictions directly from observer frequencies, bypassing the

63: numerical minimization step required in the more general

64: machine-learning approach.

65:

66: We also extend these methods by considering a non-linear

67: generalization for the probability distribution in the

68: machine-learning approach, and by taking larger neighborhoods in the

69: network into account. Finally, we compare the performance of these

70: methods to a standard Supper Vector Machine.

71:

72: \section{Methods}

73:

74: We consider a network specified by a graph whose nodes are proteins

75: and whose undirected vertices indicate interactions between the

76: proteins. Each node is assigned one of a set of protein functions.  In

77: a machine-learning approach to prediction, this assignment follows a

78: simple probability function depending on the protein functions in the

79: network neighborhood of each node and parametrized by a small set of

80: parameters. The learning problem is to estimate these parameters from

81: a given sample of assignments. The prediction can then be

82: performed by evaluating the probability distribution using these

83: parameters.

84:

85: \subsection{Machine-learning approach}

86: \label{sec:ml}

87:

88: Assume we only consider a single protein function at a time. Node

89: assignments can then be chosen binary, $x\in\{0,1\}$, with $1$

90: indicating that a node has the function under consideration. In the

91: simplest case, the probability that a node $i$ has assignment $x$

92: depends only its immediate neighbors, and since all vertices of the

93: graph are equal, it can only depend on the number of neighbors $C$,

94: and the number of active neighbors $N$. Borrowing from statistical

95: mechanics, we write the probability using a potential $U(x;C,N)$

96: \begin{equation} \label{eq:2}

97:   p(x|C,N) = \frac{e^{-U(x;C,N)}}{Z(C,N)}, \qquad

98:   Z(C,N) = \sum_{y=0,1} e^{-U(y;C,N)}

99: \end{equation}

100: where the partition sum $Z(C,N)$ is a normalizing factor. This equation

101: basically expresses that the log-probabilities of $x$ are

102: proportional to the potential $U(x;C,N)$. In a lowest-order

103: approximation, we can choose a linear function for the potential:

104: \begin{equation} \label{eq:1}

105:   U(x;C,N;\alpha) = (\alpha_0 + \alpha_1 C + \alpha_2 N)x \quad.

106: \end{equation}

107: Later, we will extend this approach to more general functions.

108:

109: The parameters $\alpha$ can be estimated from a set of training

110: samples $(x_i,C_i,N_i)$ by maximum-likelihood estimation. In this

111: approach, they are chosen to maximize the joint probability

112: \begin{equation}

113:   P = \prod_i p(x_i|C_i,N_i)

114: \end{equation}

115: of the training data, or equivalently, to minimize its negative logarithm

116: \begin{equation}

117:   -\log P = \sum_i \left[ \ln Z(C_i,N_i) + U(x_i;C_i,N_i) \right] \quad.

118: \end{equation}

119: Taking the partial derivative w.r.t.~to a parameter gives the equation

120: \begin{equation}

121:   -\frac{\partial P}{\partial \alpha}

122:   = \sum_i \left\{ - \frac{1}{Z(C_i,N_i)} \sum_{y=0,1}

123:     \frac{\partial U(y,C_i,N_i)}{\partial\alpha}

124:     e^{-U(y,C_i,N_i)}

125:     + \frac{\partial U(x_i,C_i,N_i)}{\partial\alpha} \right\} \quad.

126: \end{equation}

127: The first term in the bracket is the expectation value of $\partial

128: U/\partial\alpha$ in the neighborhood $(C_i,N_i)$ under the

129: probability distributions parametrized by $(\alpha,\ldots)$:

130: \begin{equation}

131:   \left\langle \frac{\partial U(y,C_i,N_i)}{\partial\alpha}

132:   \right\rangle_{N_i,C_i;\alpha,\ldots} =

133:   \frac{1}{Z(C_i,N_i)} \sum_{y=0,1}

134:     \frac{\partial U(y,C_i,N_i)}{\partial\alpha} \,

135:     e^{-U(y,C_i,N_i)}

136: \end{equation}

137: At the extremum, the derivative vanishes and we have the simple relation

138: \begin{equation}

139:     \sum_i \left\langle \frac{\partial U(y,C_i,N_i)}{\partial\alpha}

140:   \right\rangle

141:     = \sum_i \frac{\partial U(x_i,C_i,N_i)}{\partial\alpha} \quad.

142: \end{equation}

143: Thus, in the maximum-likelihood model, the parameters are adjusted so

144: that the average expectation values of the derivatives of the

145: potential are equal to the averages observed in the training data.

146: Using eq.~\ref{eq:1}, this gives the set of three equations.

147: \begin{eqnarray}

148:   \sum_i \left\{ \begin{array}{l} 1 \\ C_i \\ N_i \end{array} \right\} \,

149:    \langle x \rangle

150:  &=& \sum_i \left\{ \begin{array}{l} 1 \\ C_i \\ N_i \end{array} \right\} \, x_i

151: \end{eqnarray}

152: where the expectation value of $x$ in the environment $(C_i,N_i)$ and

153: in the model parametrized by $\alpha$ is given by

154: \begin{equation}

155:   \langle x \rangle =

156:   \langle x \rangle_{\alpha_0,\alpha_1,\alpha_2;C_i,N_i}

157:   = \frac{e^{-(\alpha+\alpha_1 C_i+\alpha_2 N_i)}}{1+e^{-(\alpha+\alpha_1

158:      C_i+\alpha_2 N_i)}} \quad.

159: \end{equation}

160: Only in the simplest case, $\alpha_1 = \alpha_2 = 0$,  this equation

161: can be solved analytically, leading to the relation:

162: \begin{equation} \label{eq:3}

163:   \alpha = \frac{\bar x}{1-\bar x}, \qquad\mbox{with}\qquad

164:   \bar x = \frac{1}{n} \sum_{i=1}{n} x_i \quad.

165: \end{equation}

166: In the general case, we solve these equations numerically using a

167: conjugate-gradient method by explicitly minimizing the joint

168: probability $P$.

169:

170: \subsection{Network approach}

171:

172: An alternative approach to prediction starts out from considering a

173: given network and the protein function assignments as a whole and

174: assigning a score based on how well the network and the function

175: assignments agree. In the approach of \cite{Deng}, each link

176: contributes to this score with a gain $G_0$ or $G_1$, resp., if both nodes at the ends of the

177: link have the same function $0$ or $1$, and a penalty $P$ if they have different

178: function assignments. Assuming again that the log-probabilities are

179: proportional to the scores, this induces a probability

180: distribution over all joint function assignments ${\bf x}$ given by

181: \begin{equation}

182:   p({\bf x}) = \frac{1}{Z} e^{-U({\bf x})} \quad, \qquad

183:   Z = \sum_{\bf x} e^{-U({\bf x})}

184: \end{equation}

185: where now the normalization factor is calculated by summing over all

186: possible joint function assignments of the nodes.

187:

188: The scoring function $U({\bf x})$ can be expressed as

189: \begin{eqnarray}

190:   U({\bf x}) &=& -\frac{G_1}{2} \sum_{i,j:(i,j)\in E}   x_i x_j

191:     - \frac{G_0}{2} \sum_{i,j:(i,j)\in E}   (1-x_i)\, (1- x_j)

192:     \\ \nnm

193:     &&+ \frac{P}{2} \sum_{i,j:(i,j)\in E} \left( (1-x_i) \, x_j + x_i \, (1-x_j) \right)

194:          + \eta_0 \sum_i x_i

195:     \\ \nnm

196:    &=& \eta_0 \sum_i  x_i

197:                      + \eta_1 \sum_i C_i x_i

198:                      + \frac{\eta_2}{2} \sum_{i,j: (i,j)\in E}  x_i x_j

199: \end{eqnarray}

200: with the parameters

201: \begin{equation}

202:   \eta_2 = - G_1 - G_0 - 2P \qquad\mbox{and}\qquad

203:   \eta_1 = G_0 + P \quad.

204: \end{equation}

205: In terms of statistical mechanics, this describes a ferromagnetic

206: system where the inverse temperature is determined by $\eta_2$ and an

207: external field by $\eta_0$ and $\eta_1$.

208:

209: Again, maximum-likelihood parameter estimation is performed by finding

210: a set of parameters $\eta = (\eta_0,\eta_1,\eta_2)$ such that the

211: probability of the $N$ sample configurations ${\bf x}^{(n)}$ is maximized:

212: \begin{equation}

213:   \alpha = \mathop{\rm argmax}_\alpha \sum_n^N \ln p({\bf x}^{(n)};\alpha)

214:   = \mathop{\rm argmin}_\alpha \left( \sum_n U({\bf x}^{(n)}) + N \ln Z(\alpha) \right)

215: \end{equation}

216: The logarithm of the partition sum appearing in the second term can

217: be related to the entropy by

218: \begin{eqnarray}

219:   S &=& - \sum_x p(x) \, \ln p(x)

220:     = \sum_{x} p(x) \, U(x) + \ln Z

221:   \\ \Rightarrow\qquad

222:     -\ln Z &=& \langle U \rangle- S = F

223: \end{eqnarray}

224: The quantity $\langle U \rangle - S$ is the thermodynamical free

225: energy. Maximum likelihood parameters estimation therefore corresponds

226: to choosing the parameters such that the energy of the given

227: configuration is minimized while the free energy of the system as a

228: whole is maximized:

229: \begin{equation} \label{eq:13}

230:   \mathop{\rm argmin}_\alpha \left( U(X;\alpha) - F(\alpha) \right)

231:   = \mathop{\rm argmin}_\alpha \left( U(X;\alpha) - \langle U \rangle(\alpha) +

232:   S(\alpha) \right)

233:   \quad.

234: \end{equation}

235: Unfortunately, this requires the calculation of both the internal

236: energy, $\langle U \rangle(\alpha)$, and the entropy, $S(\alpha)$, of

237: the system and thus more or less a complete solution of the system.

238:

239: This can be avoided by employing the \emph{mean field} approximation, in

240: which the probability distribution $p(x)$ is replaced by a trial

241: distribution $p_{\rm trial}(x)$ as a product of single-variable

242: distributions

243: \begin{equation}

244:   p_{\rm trial}(x) = p_1(x_1) \ldots p_n(x_n)

245: \end{equation}

246: which can be completely parametrized by the expectation values $\bar

247: x_i$ using

248: \begin{equation}

249:   p_i(x_i) = x_i \bar x_i + (1-x_i) (1-\bar x_i)

250:            = \left\{ \begin{array}{ll} 1-\bar x_i & \mbox{if $x_i=0$} \\

251:                                    \bar x_i & \mbox{if $x_i=1$} \end{array}\right.

252: \end{equation}

253: Optimum values for the parameters $\bar x_i$ can then be estimated by

254: minimizing the KL entropy of $p_{\rm trial}(x)$ vs.~the true distribution

255: $p(x)$.

256:

257: Interestingly, this approximation removes the distinguishing feature

258: of the network approach, namely that the neighborhood structure (in

259: the sense of neghbors of neighbors) is taken into account.  The

260: resulting equations are very similar to the machine-learning equations

261: in which neighbors are treated as unrelated.

262:

263: \subsection{Binomial-neighborhood approach}

264: \label{sec:bin}

265:

266: The binomial-neighborhood approach \cite{Letovsky} is a simpler approach in which the

267: probability distribution $p(x|C,N)$ is chosen in such a way that it

268: can be directly derived from observed frequencies without the

269: minimization process typical for maximum-likelihood approaches. It is

270: based on the assumption that the distribution of active neighbors

271: $N_i$ of a

272: node $i$ follows a binomial distribution whose single probability $p$

273: depends on whether the node $i$ is active or not:

274: \begin{equation}

275:   p(N_i|C_i,x_i=1) = \left(\begin{array}{c} C_i \\ N_i

276:       \end{array} \right) \,

277:     p_1^{N_i} (1-p_1)^{C_i - N_i} \quad,

278: \end{equation}

279: and correspondingly for $x_i=0$ using a single probability $p_0$. This

280: is the assumption of \emph{local enrichment}, i.e.~that the

281: probability $p_1$ to find an active node around another active node is

282: larger than the probability $p_0$ to find an active node around an

283: inactive node. Using Bayes' theorem, we can use this to calculate the

284: probability distribution of $x_i$:

285: \begin{equation}

286:   p(x_i|C_i,N_i) = \frac{ p(N_i|C_i,x_i) \, p(x_i|C_i) }{

287:                           p(N_i|C_i)}

288: \end{equation}

289: where $p(x_i|C_i) = \bar x$ is the overall probability of observing an

290: active node, and

291: \begin{equation}

292:   p(N_i|C_i) = \bar x p(N_i|C_i,x_i=1) + (1-\bar x) p(N_i|C_i,x_i=0) \quad.

293: \end{equation}

294: The resulting probability distribution can be written as

295: \begin{equation}

296:   p(x_i=1|C_i,N_i) = \frac{\lambda}{1+\lambda} \qquad\mbox{and}\qquad

297:   p(x_i=0|C_i,N_i) = \frac{1}{1+\lambda}

298: \end{equation}

299: with

300: \begin{equation}

301:   \lambda = \frac{\bar x}{1-\bar x} \, \frac{p_1^{N_i} \, (1-p_1)^{C_i - N_i}}{

302:                       p_0^{N_i} \, (1-p_0)^{C_i - N_i}}  \quad.

303: \end{equation}

304: This can be easily rewritten in the same form as (\ref{eq:2})

305: \begin{equation}

306:   p(x_i|C_i,N_i) = \frac{1}{Z} \exp \left[-\left(  -\ln \frac{\bar x}{1-\bar x}

307:    - \ln \frac{p_1}{p_0}  N_i

308:    + \ln \frac{1-p_0}{1-p_1} \, (C_i-N_i) \right)  \,x_i\right]

309: \end{equation}

310: The first term in the potential has the same form as (\ref{eq:3}) and

311: adjusts the overall number of positive sites; the two other terms

312: constitute a bones for having positive neighbors (proportional to $N_i$)

313: and a penalty for having negative neighbors (proportional to $C_i -

314: N_i$).

315:

316: This approach evidently gives a conditional probability distribution

317: $p(x_i|C_i,N_i)$ of the same for as the one in the machine-learning

318: approach. However, the coefficient in the potential can be directly

319: calculated from the observed frequencies $\bar x$, $p_0$, and

320: $p_1$. This is only possible because we made here the assumption that

321: the probability distribution $p(N_i|C_i,x_i)$ is binomial. The

322: machine-learning approach is more flexible in that in does not have to

323: make this assumption and yields a true maximum-likelihood estimate

324: even for distributions that deviate greatly from binomial form. In

325: particular, the binomial distribution implies that the neighbors of a

326: node behave statistically independent, which might be violated in a

327: densely connected network, where we would expect clusters to form.

328:

329: \section{Results}

330:

331: To compare the different prediction methods, we chose the MIPS

332: protein-protein interaction database for \emph{Saccharomyces

333:   cerevisiae} \cite{MIPS,Uetz} and the GO-SLIM database of protein function

334: assignments from the Gene Ontology Consortium \cite{GO}. The latter is a

335: slimmed-down subset of the full gene ontology assignments comprising

336: 32 different processes, 21 functions, and 22 cell compartments. We

337: focused here on the process assignments as these were expected to

338: correspond most closely to the interaction network.

339:

340: We compared four methods:

341: \begin{enumerate}

342: \item the binomial neighborhood enrichment from sec.~\ref{sec:bin},

343: \item the machine-learning maximum-likelihood method from

344:   sec.~\ref{sec:ml} using a linear potential (\ref{eq:1})

345: \item the same method with an extended non-linear potential, and

346: \item a standard support vector machine \cite{libsvm}.

347: \end{enumerate}

348:

349: For the probabilistic methods, we first looked at the single-function

350: prediction problem in which the system is presented with a binary

351: assignment expressing which proteins are known to have a given

352: function, and then makes a prediction for an unknown protein based on

353: the number of neighbors that have this function.

354:

355: \begin{figure}[htb]

356:   \begin{center}

357:     \includegraphics[angle=270,width=\hsize]{glyphs-5.eps}

358:     \caption{Glyph plot summarizing the probability distribution

359:       for a single-function prediction problem.

360:       Each box represents a possible situation of a single node,

361:       characterized by the total number of neighbors on the $x$-axis,

362:       and the number of neighbors having the funtion of interested on

363:       the $y$-axis. The numbers indicate the total incidence of the

364:       situation, while the shading expresses how frequently the

365:       central node had the function of interest in that situation.

366:       The lines are the decision boundaries for the binomial method

367:       and the linear and polynomal machine-learning methods. The

368:       shading is the prediction region from the SVM.

369:     }

370:     \label{fig:1}

371:   \end{center}

372: \end{figure}

373:

374: In this case, the local environment of a node can be described by two

375: numbers: $n$, the number of neighbors, and $j$, the number of

376: neighbors that have the function assignment under consideration. The

377: content of the training data set can be characterized by a glyph plot

378: such as in fig.~\ref{fig:1}.

379:

380: After learning the training data, the probabilistic method has

381: inferred a probability distribution that yields, for each pair

382: $(n,j)$, a probability $p(X_i=1|n,j)$ which is then utilized for

383: predictions. The 50\%-level of this probability, which determines the

384: prediction in a binary system, is indicated in fig.~\ref{fig:1} by

385: green lines.

386:

387: The three probabilistic predictors in fig.~\ref{fig:1} yield similar

388: results that differ rarely by more than one box. The main difference

389: is that the binomial predictor is restricted to a straight line, while

390: the linear and non-linear maximum-likelihood predictors can accomodate

391: a little turn. Linear and non-linear predictors differ only minimally.

392:

393: \begin{figure}[htb]

394:   \begin{center}

395:     \includegraphics[width=\hsize]{single-spec-2a.eps}

396:     \caption{Sensitivity-specificity curve for the three probabilistic

397:     prediction methods for a single-function prediction.}

398:   \end{center}

399:     \label{fig:2}

400: \end{figure}

401:

402: Finally the prediction from a support vector machine that was trained

403: on the same single-function data set is indicated by a shaded area

404: marking all those $(n,j)$ for which the SVM returned a positive

405: prediction. The border of this area very closely follows the linear

406: and non-linear M.L.~predictors.

407:

408: Fig.~\ref{fig:2} shows a sensitivity-specificity curve using five-fold

409: cross validation for single-function prediction using the

410: probabilistic predictors. Again, all three curves follow each other

411: quite closely, with a slight edge for the nonlinear M.L.~predictor.

412:

413: The preceding discussion applied to the problem of single function

414: prediction. To perform full prediction, we generated each of the three

415: predictors separately for each function and chose, for each protein

416: with an unknown function, the prediction with the largest probability.

417: For simplicity, this approach does not take into account possible

418: correlations between different protein functions. However, such

419: correlations were taken into account for the support vector machine,

420: which generated a full set of cross-predictors (predicting function

421: $i$ with neighbors of type $j$).

422:

423: \begin{figure}[htb]

424:   \begin{center}

425:     \includegraphics[width=\hsize]{full-spec-2.eps}

426:     \caption{Accuracy of multiple-function prediction as a function of

427:       the number of predictions made using the three probabilistic prediction methods.}

428:   \end{center}

429:     \label{fig:3}

430: \end{figure}

431:

432: In the probabilistic case, each predictor does not only provide us

433: with a yes-no decision, but also with a probability for the

434: prediction. We can use the information to restrict the predictions to

435: highly probable ones.  Fig.~\ref{fig:3} shows the accuracy of the

436: prediction as a function of how many predictions are made with

437: different cut-offs in the predicted probability. Again, all three

438: curves closely follow each other, with maybe a small but unsignificant

439: edge of the linear M.L.~predictor. The predictions from all predictors

440: including the SVM were similar, and combining them would not have

441: improved predictive accuracy.

442:

443: \begin{table}[htb]

444:   \centering

445:   \begin{tabular}{|l|l|l|}

446:     \hline

447:     METHOD & \#SUCCESS & accuracy \\

448:     \hline

449:     binomial classifier & 623 & 31\% \\

450:     linear M.L.\ classifier & 655 & 33\% \\

451:     nonlinear M.L.\ classifier & 640 & 31.7\% \\

452:     linear SVM classifier & 601 & 29.8\% \\

453:     \hline

454:     randomized network & 101 & 11.4\% \\

455:     \hline

456:     \hline

457:     binomial classifier, process &  & 32.5\%\\

458:     randomized network & & 8.7\% \\

459:     \hline

460:   \end{tabular}

461:   \caption{Prediction accuracy in five-fold cross validation for the

462:     yeast data set.}

463:   \label{tab:1}

464: \end{table}

465:

466: Finally, the success rates for all predictors are shown in table

467: \ref{tab:1} using five-fold cross-validation on a data set of 2014

468: unique function assignments for the yeast proteome. It turns out that

469: all four methods perform closely, with success rates between 30 and

470: 33\%. This compares to the null-hypothesis of prediction in a

471: randomized network, in which we would have a success rate of 11\% for

472: these data. The protein-protein interaction data therefore roughly

473: triples the prediction success over a random network. However, all

474: methods, from the simple, counting-based binomial classifier to the

475: full support vector machine, perform similarly.

476:

477: We also extended our methods to take larger neighborhoods (second and

478: higher-order neighbors) into account, but failed to substantially

479: improve predictive power.

480:

481: Finally, we also performed protein function prediction on a recently

482: published protein-interaction network for {\em Drosophila

483:   melanogaster} \cite{Droso}, with similar results.

484:

485: \section{Discussion}

486: %\vspace*{-12pt}

487:

488: We compared different probabilistic approaches to predicting protein

489: functions in protein interaction networks. Under closer analysis, the

490: different Markov Random Field methods in the literature can be related

491: to a basic machine-learning approach with maximum-likelihood parameter

492: estimation. Using real data, they exhibit similar performance, with

493: simple methods performing as well as more complex ones. This might

494: indicate limits on the functional information contained in

495: protein-protein interaction networks.

496:

497: A standard support vector machine gave similar result, though it was

498: equipped with more information, namely the frequencies of all function

499: classes in the neighborhood. The additional information did neither improve nor

500: harm predictive performance.

501:

502: %\vspace*{-12pt}

503: \begin{thebibliography}{9999}

504: %\vspace*{-12pt}

505:

506: \bibitem{Letovsky}

507:   S. Letovsky, S. Kasif, Bioinformatics {\bf 19}, Suppl. 1, i197 (2003).

508: \bibitem{Deng}

509:   M. Deng, T. Chen, F. Sun, in: Proceedings, RECOMB '03,

510:   7th international conference on

511:   Research in Computational Molecular Biology,

512:   p.~95, ACM Press, New York, NY (2003).

513: \bibitem{Droso}

514:   L. Giot \emph{et.~al.}, Science {\bf 302}, 1727 (2003).

515: \bibitem{Uetz}

516:   P. Uetz \emph{et.~al.}, Nature {\bf 403}, 623 (2000).

517: \bibitem{MIPS}

518:   H. W. Mewes \emph{et.~al.}, Nucleic Acids Research {\bf 32}, D41

519:   (2004).

520: \bibitem{GO}

521:   The Gene Ontology Consortium,

522:   Nucleic Acids Res {\bf 32}, D258 (2004).

523: \bibitem{libsvm}

524:   C.-C. Chang, C.-J. Lin, LIBSVM : a library for support vector

525:   machines, 2001.

526:   Software available at {\bf http://www.csie.ntu.edu.tw/~cjlin/libsvm}

527: \end{thebibliography}

528:

529: \if0

530: Nature. 2000 Feb 10;403(6770):623-7.

531: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae.

532:

533: Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon

534: D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin

535: B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M,

536: Fields S, Rothberg JM.

537:

538: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=10688190&dopt=Abstract

539: \fi

540:

541: \end{document}

542:

543: %%% Local Variables:

544: %%% mode: latex

545: %%% TeX-master: t

546: %%% End:

547: