0801:0801.2033/gibbs_sampler

1: \documentclass{bioinfo}

2: \copyrightyear{2007}

3: \pubyear{2007}

4:

5: \usepackage{mathbbol,amssymb,latexsym,amsfonts,amsmath,amsthm}

6: \usepackage{graphicx}

7: \usepackage{float}

8: \usepackage{textcomp}

9:

10: \newcommand{\ie}{\textit{i.e.}}

11: \newcommand{\G}{\mathcal{G}}

12: \newcommand{\C}{\mathcal{C}}

13: \newcommand{\D}{\mathcal{D}}

14: \newcommand{\E}{\mathcal{E}}

15: \DeclareMathOperator*{\argmax}{arg\, max}

16:

17: \begin{document}

18: \firstpage{1}

19:

20: \title{Analysis of a Gibbs sampler method for model based clustering

21:   of gene expression data}

22:

23: \author[A. Joshi \textit{et~al}]{Anagha Joshi\,$^{\rm a,b}$, Yves Van

24:   de Peer\,$^{\rm a, b}$\footnote{Corresponding author,

25:     E-mail:yves.vandepeer@psb.ugent.be}, Tom Michoel\,$^{\rm a, b}$}

26:

27: \address{$^{\rm a}$Department of Plant Systems Biology, VIB,

28:   Technologiepark 927, 9052 Gent, Belgium, $^{\rm b}$Department of

29:   Molecular Genetics, UGent, Technologiepark 927, 9052 Gent, Belgium}

30: \maketitle

31:

32: \begin{abstract}

33:

34:   \section{Motivation:} Over the last decade, a large variety of

35:   clustering algorithms have been developed to detect coregulatory

36:   relationships among genes from microarray gene expression data.

37:   Model based clustering approaches have emerged as statistically well

38:   grounded methods, but the properties of these algorithms when

39:   applied to large-scale data sets are not always well understood.  An

40:   in-depth analysis can reveal important insights about the

41:   performance of the algorithm, the expected quality of the output

42:   clusters, and the possibilities for extracting more relevant

43:   information out of a particular data set.

44:

45:   \section{Results:} We have extended an existing algorithm for model

46:   based clustering of genes to simultaneously cluster genes and

47:   conditions, and used three large compendia of gene expression data

48:   for \emph{S.~cerevisiae} to analyze its properties. The algorithm

49:   uses a Bayesian approach and a Gibbs sampling procedure to

50:   iteratively update the cluster assignment of each gene and

51:   condition. For large-scale data sets, the posterior distribution is

52:   strongly peaked on a limited number of equiprobable clusterings.  A

53:   GO annotation analysis shows that these local maxima are all

54:   biologically equally significant, and that simultaneously clustering

55:   genes and conditions performs better than only clustering genes and

56:   assuming independent conditions.  A collection of distinct

57:   equivalent clusterings can be summarized as a weighted graph on the

58:   set of genes, from which we extract fuzzy, overlapping clusters

59:   using a graph spectral method.  The cores of these fuzzy clusters

60:   contain tight sets of strongly coexpressed genes, while the overlaps

61:   exhibit relations between genes showing only partial coexpression.

62:

63:   \section{Availability:} \textsf{GaneSh}, a Java package for

64:   coclustering, is available under the terms of the GNU General Public

65:   License from our website at

66:   http://bioinformatics.psb.ugent.be/software.

67:

68:   \section{Contact:} yves.vandepeer@psb.ugent.be

69:

70:   \section{Supplementary information:} available on our website at\\

71:   http://bioinformatics.psb.ugent.be/supplementary\_data/anjos/gibbs

72: \end{abstract}

73:

74: \section{Introduction}

75:

76: Since the seminal paper by \citet{pmid9843981}, now almost a decade

77: ago, clustering forms the basis for extracting comprehensible

78: information out of large-scale gene expression data sets. Clusters of

79: coexpressed genes tend to be enriched for specific functional

80: categories \citep{pmid9843981}, share \textit{cis}-regulatory

81: sequences in their promoters \citep{pmid10391217}, or form the

82: building blocks for reconstructing transcription regulatory networks

83: \citep{segal2003}.

84:

85: A variety of heuristic clustering methods have been used, such as

86: hierarchical clustering \citep{pmid9843981}, $k$-means

87: \citep{pmid10391217}, or self-organizing maps \citep{pmid10077610}.

88: Although these methods have had an enormous impact, their statistical

89: properties are generally not well understood and important parameters

90: such as the number of clusters are not determined automatically.

91: Therefore, there has been a shift in attention towards model based

92: clustering approaches in recent years

93: \citep{pmid11673243,fraley02,pmid12217911,pmid14871871,chinese,dahl2006}.

94: A model based approach assumes that the data is generated by a mixture

95: of probability distributions, one for each cluster, and takes

96: explicitly into account the noisyness of gene expression data. It

97: allows for a statistical assessment of the resulting clusters and

98: gives a formal estimate for the expected number of clusters.  To infer

99: model parameters and cluster assignments, standard statistical

100: techniques such as Expectation Maximization or Gibbs sampling are used

101: \citep{liu2002}.

102:

103: In this paper we use a novel model based clustering method which

104: builds upon the method recently introduced by \citet{chinese}. We

105: address two key questions that have remained largely unanswered for

106: model based clustering methods in general, namely convergence of the

107: Gibbs sampler for very large data sets, and non-heuristic

108: reconstruction of gene clusters from the posterior probability

109: distribution of the statistical model.

110:

111: In the model used by \cite{chinese}, it is assumed that the expression

112: levels of genes in one cluster are random samples drawn from a

113: Gaussian distribution and expression levels of different experimental

114: conditions are independent.  We have extended this model to allow

115: dependencies between different conditions in the same cluster.

116: \citet{pmid14871871} used a multivariate normal distribution to take

117: into account correlation among experimental conditions.  Our approach

118: consists of clustering the conditions within each gene cluster,

119: assuming that the expression levels of the genes in one gene cluster

120: for the conditions in one condition cluster are drawn from one

121: Gaussian distribution.  Hence our model is a model for

122: \emph{coclustering} or \emph{two-way clustering} of genes and

123: conditions. The same statistical model was also used in our recent

124: approach to reconstruct transcription regulatory networks

125: \citep{lemone}. The coclustering is carried out by a Gibbs sampler

126: which iteratively updates the assignment of each gene, and within each

127: gene cluster the assignment of each experimental condition, using the

128: full conditional distributions of the model.

129:

130: It is known that a Gibbs sampler may have poor mixing properties if

131: the distribution being approximated is multi-modal and it will then

132: have a slow convergence rate \citep{liu2002}.  Previous studies of

133: Gibbs samplers for model based clustering have not reported

134: convergence difficulties \citep{pmid12217911,pmid14871871,dahl2006}.

135: In those studies, only data sets with a relatively small number of

136: genes (upto a few $100$) \citep{pmid12217911,pmid14871871}, or a small

137: number of experimental conditions (less than $10$) \citep{dahl2006}

138: were considered, and special sampling techniques such as reverse

139: annealing \citep{pmid14871871} or merge-split proposals

140: \citep{dahl2006} were sufficient to generate a well mixing Gibbs

141: sampler.  We observe that for data sets of increasing size the

142: correlation between two Gibbs sampler runs as well as the number of

143: cluster solutions visited in one run after burn-in steadily decreases.

144: This means that for large-scale data sets, the posterior distribution

145: is very strongly peaked on multiple local modes. Since the peaks are

146: so strong, we approximate the posterior distribution by averaging over

147: multiple runs performed in parallel, each converging quickly to a

148: single mode. By computing the correlation between different averages

149: of the same number of runs we are able to show that the number of

150: distinct modes is relatively small and accurate approximations to the

151: posterior distribution can be obtained with as few as $10$ modes for

152: around $6000$ genes.

153:

154: To identify the final optimal clustering, the traditional approach is

155: to select out of all the clusterings visited by the Gibbs sampler the

156: one which maximizes the posterior distribution (maximum a posteriori

157: (MAP) clustering).  However, we show that for large data sets the

158: differences in likelihood between the different local maxima are

159: extremely small and statistically insignificant, such that the MAP

160: clustering is as good as taking any local maximum at random. A GO

161: \citep{ashb00} analysis of the different modes shows that also from

162: the biological point of view any difference between the local modes is

163: insignificant.  Taking into account the full posterior distribution is

164: more difficult since different clusterings may have a different number

165: of clusters and the labeling of clusters is not unique (the label

166: switching problem \citep{redner84}).  The common solution to this

167: problem is to consider pairwise probabilities for two genes being

168: clustered together or not \citep{pmid12217911,pmid14871871,dahl2006}.

169: A major question that has not yet recieved a final answer is how to

170: reconstruct gene clusters from these pairwise probabilities.

171: \cite{pmid12217911} and \cite{pmid14871871} use a heuristic

172: hierarchical clustering on the pairwise probability matrix to form a

173: final clustering estimate.  \cite{dahl2006} introduces a least-squares

174: method, which selects out of all clusterings visited by the Gibbs

175: sampler the one which minimizes a distance function to the pairwise

176: probability matrix. In both approaches, the probability matrix is

177: reduced to a single hard clustering. This necessarily removes

178: non-transitive relations between genes (such as a low probability for

179: a pair of genes to be clustered together even though they both have

180: relatively high probability to be clustered with the same third gene)

181: which may nevertheless be informative and biologically meaningful.

182:

183: We propose that the pairwise probability matrix reflects a \emph{soft}

184: or \emph{fuzzy clustering} of the data, \ie, genes can belong to

185: multiple clusters with a certain probability.  To extract these fuzzy

186: clusters from the pairwise probabilities we use a method from pattern

187: recognition theory \citep{graphspectral}. This method iteratively

188: computes the largest eigenvalue and corresponding eigenvector of the

189: probability matrix, constructs a fuzzy cluster with the eigenvector,

190: and updates the probability matrix by removing from it the weight of

191: the genes assigned to the last cluster.  By only keeping genes which

192: belong to one fuzzy cluster with very high probability we obtain tight

193: clusters which show higher functional coherence compared to standard

194: clusters. Keeping also genes which belong with lower but still

195: significant probability to multiple fuzzy clusters, we can tentatively

196: identify multifunctional genes or relations between genes showing only

197: partial coexpression. We show that our results are in good agreement

198: with previous fuzzy clustering approaches to gene expression data

199: \citep{gaschfuzzy}. We believe that our fuzzy clustering method to

200: summarize the posterior distribution will be of general interest for

201: all model based clustering approaches and solves the problems

202: associated to heuristic clusterings of the pairwise probability

203: matrix.

204:

205: All our analyses are performed on three large-scale public compendia

206: of gene expression data for \textit{S.~cerevisiae}

207: \citep{spellmandata,gaschdata,hughesdata}.

208:

209:

210: \begin{methods}

211: \section{Methods}

212:

213:

214: \subsection*{Mathematical model}

215:

216: For an expression matrix with $N$ genes and $M$ conditions, we define

217: a coclustering as a partition of the genes into $K$ gene clusters

218: $\G_k$, together with for each gene cluster, a partition of the set of

219: conditions into $L_k$ condition clusters $\E_{k,l}$.  We assume that

220: all data points in a cocluster $\{(i,m)\colon i\in\G_k, m\in

221: \E_{k,l}\}$ are random samples from the same normal distribution. This

222: model generalizes the model used by \cite{chinese}, where the

223: partition of conditions is always fixed at the trivial partition into

224: singleton sets.

225:

226: Given a set of means and precisions $(\mu_{kl},\tau_{kl})$, a

227: coclustering $\C$ defines a probability density on data matrices

228: $\D=(x_{im})$ by

229: \begin{align*}

230:   p\bigl(\D\mid\C,(\mu_{kl},\tau_{kl})\bigr) = \prod_{k=1}^K

231:   \prod_{l=1}^{L_k} \prod_{i\in\G_k}\prod_{m\in \E_{k,l}} p

232:   (x_{im}\mid \mu_{kl},\tau_{kl}).

233: \end{align*}

234: We use a uniform prior on the set of coclusterings with normal-gamma

235: conjugate priors for the parameters $\mu_{kl}$ and $\tau_{kl}$.  Using

236: Bayes' rule we find the probability of a coclustering $\C$ with

237: parameters $(\mu_{kl},\tau_{kl})$ given the data $\D$.  Then we take

238: the marginal probability over the parameters $(\mu_{kl},\tau_{kl})$ to

239: obtain the final probability of a coclustering $\C$ given the data

240: $\D$, upto a normalization constant:

241: \begin{equation}\label{eq:1}

242:   p(\C\mid\D) \propto \prod_{k=1}^K \prod_{l=1}^{L_k} \iint

243:   p(\mu,\tau) \prod_{i\in\G_k}\prod_{m\in \E_{k,l}} p (x_{im}\mid

244:   \mu,\tau)\; d\mu d\tau,

245: \end{equation}

246: where $p(\mu,\tau)=p(\mu\mid\tau)p(\tau)$ with

247: \begin{align*}

248:   p(\mu\mid\tau)=\bigl(\frac{\lambda_0\tau}{2\pi}\bigr)^{1/2}

249:   e^{-\frac{\lambda_0\tau}2 (\mu-\mu_0)^2},\quad

250:   p(\tau) = \frac{\beta_0^{\alpha_0}}{\Gamma(\alpha_0)}

251:   \tau^{\alpha_0-1} e^{-\beta_0\tau},

252: \end{align*}

253: $\alpha_0,\beta_0,\lambda_0 > 0$ and $-\infty<\mu_0<\infty$ being the

254: parameters of the normal-gamma prior distribution.  We use the values

255: $\alpha_0=\beta_0=\lambda_0= 0.1$ and $\mu_0=0.0$, resulting in a

256: non-informative prior. We have compared the normal-gamma prior with

257: other non-informative, conjugate priors, but found no difference in

258: results (see Supplementary Information).  The double integral in eq.

259: (\ref{eq:1}) can be solved exactly in terms of the sufficient

260: statistics $T^{(n)}_{kl} = \sum_{i \in \G_k,m\in\E_{kl}} x_{im}^n$

261: ($n=0,1,2$) for each cocluster.  The log-likelihood or Bayesian score

262: decomposes as a sum of cocluster scores:

263: \begin{equation}\label{eq:7}

264:   S(\C) =\log p(\C\mid\D) = \sum_{k=1}^K \sum_{l=1}^{L_k} S_{kl},

265: \end{equation}

266: with

267: \begin{multline*}

268:   S_{kl} = -\tfrac12 T^{(0)}_{kl}\log(2\pi) + \tfrac12

269:   \log\bigl(\frac{\lambda_0}{\lambda_0 + T^{(0)}_{kl}}\bigr)

270:    - \log\Gamma(\alpha_0)\\ + \log\Gamma(\alpha_0

271:   + \tfrac12 T^{(0)}_{kl})

272:   + \alpha_0\log\beta_0 -(\alpha_0 + \tfrac12 T^{(0)}_{kl})\log\beta_1

273: \end{multline*}

274: and

275: \begin{equation*}

276:   \beta_1 = \beta_0 + \frac12\Bigl[ T^{(2)}_{kl} -

277:   \frac{(T^{(1)}_{kl})^2}{T^{(0)}_{kl}} \Bigr]

278:   + \frac{\lambda_0 \bigl( T^{(1)}_{kl} - \mu_0 T^{(0)}_{kl}

279:     \bigr)^2}{2(\lambda_0 + T^{(0)}_{kl})T^{(0)}_{kl}}.

280: \end{equation*}

281:

282:

283: \subsection*{Gibbs sampler algorithm}

284:

285: We use a Gibbs sampler to sample coclusterings from the posterior

286: distribution (\ref{eq:1}). The algorithm iteratively updates the

287: assignment of genes to gene clusters, and for each gene cluster, the

288: assignment of conditions to condition clusters as follows:

289:

290: \begin{enumerate}

291: \item Initialization: randomly assign $N$ genes to a random $K_0$

292:   number of gene clusters, and for each cluster, randomly assign $M$

293:   conditions to a random $L_{k,0}$ number of condition clusters.

294: \item For $N$ cycles, remove a random gene $i$ from its current

295:   cluster.  For each gene cluster $k$, calculate the Bayesian score

296:   $S(\C_{i\to k})$, where $\C_{i\to k}$ denotes the coclustering

297:   obtained from $\C$ by assigning gene $i$ to cluster $k$, keeping all

298:   other assignments of genes and conditions equal, as well as the

299:   probability $S(\C_{i\to 0})$ for the gene to be alone in its own

300:   cluster.  Assign gene $i$ to one of the possible $K+1$ gene

301:   clusters, where $K$ is the current number of gene clusters,

302:   according to the probabilities $Q_k \propto e^{S(\C_{i\to k})}$,

303:   normalized such that $\sum_{k} Q_k=1$.

304: \item For each gene cluster $k$, for $M$ cycles, remove a random

305:   condition $m$ from its current cluster. For each condition cluster

306:   $l$, calculate the Bayesian score $S(\C_{k,m\to l})$. Assign

307:   condition $m$ to one of the possible $L_k+1$ clusters, where $L_k$

308:   is the current number of condition clusters for gene cluster $k$,

309:   according to the probabilities $Q_l \propto e^{S(\C_{k,m\to l})}$,

310:   normalized such that $\sum_{l} Q_l=1$.

311: \item Iterate step 2 and 3 until convergence. One iteration is defined

312:   as executing step 2 and 3 consecutively once, and hence consists of

313:   $N+K\times M$ sampling steps (with $K$ the number of gene clusters

314:   after Step 1 of that iteration).

315: \end{enumerate}

316:

317: This coclustering algorithm simulates a Markov chain which satisfies

318: detailed balance with respect to the posterior distribution

319: (\ref{eq:1}), \ie, after a sufficient number of iterations, the

320: probability to visit a particular coclustering $\C$ is given exactly

321: by $p(\C\mid\D)$. The expectation value of any real function $f$ with

322: respect to the posterior distribution can be approximated by averaging

323: over the iterations of a sufficiently long Gibbs sampler run:

324: \begin{equation}\label{eq:2}

325:   E(f) = \sum_\C f(\C) p(\C\mid\D) \approx \frac1T \sum_{t=T_0+1}^{T_0+T}

326:   f(\C_t)

327: \end{equation}

328: where $\C_t$ is the coclustering visited at iteration $t$ and $T_0$ is

329: a possible burn-in period.  We say that the Gibbs sampler has

330: converged if two runs starting from different random initializations

331: return the same averages (\ref{eq:2}) for a suitable set of test

332: functions $f$. More precisely, if $\{f_n\}$ is a set of test

333: functions, define $a_n=E_1(f_n)$ the average of $f_n$ in the first

334: Gibbs sampler run, and $b_n=E_2(f_n)$ the average of $f_n$ in the

335: second Gibbs sampler run. We define a correlation measure $\rho$

336: ($0\leq\rho\leq1$) between two runs as

337: \begin{equation}\label{eq:5}

338:   \rho = \frac{|\sum_n a_n b_n|}{\sqrt{(\sum_n a_n^2) (\sum_n b_n^2)}}.

339: \end{equation}

340: Full convergence is reached if $\rho=1$.

341:

342: \subsection*{Fuzzy clustering}

343:

344: To keep track of the gene clusters, independent of the (varying)

345: number of clusters or their labeling, we consider functions

346: \begin{equation}\label{eq:3}

347:   f_{ij}(\C) =

348:   \begin{cases}

349:     1 & \text{if gene $i$ and $j$ belong to the same gene cluster in $\C$}\\

350:     0 & \text{otherwise}

351:   \end{cases}

352: \end{equation}

353: In general, the posterior distribution (\ref{eq:1}) is not

354: concentrated on a single coclustering and the matrix $F=(E(f_{ij}))$

355: of expectation values (see eq. (\ref{eq:2})) consists of probabilities

356: between $0$ and $1$. To quantify this fuzzyness, we use an entropy

357: measure

358: \begin{equation}\label{eq:4}

359:   H_{\text{fuzzy}} = \frac1{N^2\ln 2}\sum_{ij }h(F_{ij}),

360: \end{equation}

361: where $N$ is the dimension of the square matrix $F$ and

362: \begin{equation*}

363:   h(q)=-q\ln(q) - (1-q)\ln(1-q) \text{ for } 0\leq q\leq 1.

364: \end{equation*}

365: For a hard clustering ($F_{ij}=0$ or $1$ for all $i,j$),

366: $H_{\text{fuzzy}}=0$, and for a maximally fuzzy clustering

367: ($F_{ij}=0.5$ for all $i,j$), $H_{\text{fuzzy}}=1$. In reality, the

368: matrix $F$ is very sparse (most gene pairs will never be clustered

369: together), so $H_{\text{fuzzy}}$ remains small even for real fuzzy

370: clusterings.

371:

372: We assume that a fuzzy gene-gene matrix $F$ is produced by a fuzzy

373: clustering of the genes, \ie, we assume that each gene $i$ has a

374: probability $p_{ik}$ to belong to each cluster $k$, such that $\sum_k

375: p_{ik}=1$. To extract these probabilities from $F$ we use a graph

376: spectral method \citep{graphspectral}, originally developed for

377: pattern recognition and image analysis, modified here to enforce the

378: normalization conditions on $p_{ik}$. A fuzzy cluster is represented

379: by a column vector $w=(w_1, \dots, w_N)^T$, with $w_i$ the weight of

380: gene $i$ in this cluster, normalized such that $\|w\|^2=w^Tw=\sum_i

381: w_i^2=1$.  The cohesiveness of the cluster with respect to the

382: gene-gene matrix $F$ is defined as $w^TFw = \sum_{ij}w_i F_{ij} w_j$.

383: By the Rayleigh-Ritz theorem,

384: \begin{align*}

385:   \max_{w\neq0} \frac{w^T F w}{w^Tw} = v_1^T F v_1 = \lambda_1,

386: \end{align*}

387: where $\lambda_1$ is the largest eigenvalue of $F$ and $v_1$ the

388: corresponding (normalized) eigenvector. Hence the maximally cohesive

389: cluster in $F$ is given by the eigenvector of the largest eigenvalue.

390: By the Perron-Frobenius theorem, this eigenvector is unique and all

391: its entries are nonnegative. We can then define the membership

392: probabilities to cluster $1$ by $p_{i1} =

393: \frac{v_{1,i}}{\max_j(v_{1,j})}$. Hence the gene with the highest

394: weight in $v_1$ is considered the prototypical gene for this cluster,

395: and it will not belong to any other cluster. The probability $p_{i1}$

396: measures to what extent other genes are coexpressed with this

397: prototypical gene.  To find the next most cohesive cluster, we remove

398: from $F$ the information already contained in the first cluster by

399: setting

400: \begin{align*}

401:   F^{(2)}_{ij}=\sqrt{1-p_{i1}} F_{ij} \sqrt{1-p_{j1}},

402: \end{align*}

403: and compute the largest eigenvalue and corresponding (normalized)

404: eigenvector $v_2$ for this matrix. The prototypical gene for this

405: cluster may already have some probability assigned to the previous

406: cluster, so we define the membership probabilities to the second

407: cluster by

408: \begin{align*}

409:   p_{i2} = \min\Bigl( \frac{v_{2,i}}{\max_j(v_{2,j})}

410:   (1-p_{i_{\text{max}}1}), 1-p_{i1}\Bigr).

411: \end{align*}

412: Here $i_{\text{max}}=\argmax_j(v_{2,j})$ is the prototypical gene for

413: the second cluster, and we take the `$\min$' to ensure that $\sum_k

414: p_{ik}$ will never exceed $1$.

415:

416: This procedure of reducing $F$ and computing the largest eigenvalue

417: and corresponding eigenvector to define the next cluster membership

418: probabilities is iterated until one of the following stopping criteria

419: is met:

420: \begin{enumerate}

421: \item All entries in the reduced matrix $F^{(k)}$ reach $0$, \ie, for

422:   all genes, $\sum_{k'<k} p_{ik'}=1$, and we have completely

423:   determined all fuzzy clusters and their membership probabilities.

424: \item The largest eigenvalue of the reduced matrix $F^{(k)}$ has rank

425:   $>1$. In this case the eigenvector is no longer unique and need no

426:   longer have nonnegative entries, so we cannot make new cluster

427:   membership probabilities out of it. This may happen if the

428:   (weighted) graph defined by connecting gene pairs with non-zero

429:   entries in $F^{(k)}$ is no longer strongly connected

430:   (Perron-Frobenius theorem).

431: \end{enumerate}

432:

433: To compute one or more of the largest eigenvalues and eigenvectors for

434: large sparse matrices such as $F$ and its reductions $F^{(k)}$ we use

435: efficient sparse matrix routines, such as for instance implemented in

436: the Matlab$^{\text{\textregistered}}$ function \texttt{eigs}.

437:

438: \subsection*{Data sets}

439:

440: We use three large compendia of gene expression data for budding

441: yeast:

442: \begin{enumerate}

443: \item \citet{gaschdata} data set: expression in $173$ stress related

444:   conditions.

445: \item \citet{hughesdata} data set: compendium of expression profiles

446:   corresponding to $300$ diverse mutations and chemical treatments.

447: \item \citet{spellmandata} data set: $77$ conditions for alpha factor

448:   arrest, elutriation, and arrest of a cdc15 temperature-sensitive

449:   mutant.

450: \end{enumerate}

451: We select the genes present in all three data sets ($6052$ genes) and,

452: to be as unbiased as possible, no further postprocessing is done.  We

453: use SynTReN \citep{syntren} to generate simulated data sets with

454: varying number of conditions for a synthetic transcription regulatory

455: network with $1000$ genes (see also Supplementary Information).

456:

457:

458: \subsection*{Functional coherence}

459:

460: To estimate the overall biological relevance of the clusters we use a

461: method which calculates the mutual information between clusters and GO

462: attributes \citep{clusterjudge}.  For each GOslim attribute, we create

463: a cluster-attribute contingency table where rows are clusters and

464: columns are attribute status (\emph{`Yes'} if the gene possesses the

465: attribute, \emph{`No'} if it is not known whether the gene possesses

466: the attribute).  The total mutual information is defined as the sum of

467: mutual informations between clusters and individual GO attributes:

468: \begin{equation}\label{eq:6}

469:   MI= \sum_A H(\C)+H(A)-H(\C,A)

470: \end{equation}

471: where $\C$ is a clustering of the genes, $A$ is a GO attribute and $H$

472: is Shannon's entropy, $H=-\sum_i p_i\log(p_i)$, and the $p_i$ are

473: probabilities obtained from the contingency tables.

474:

475: \end{methods}

476:

477: \section{Results and discussion}

478:

479: \subsection*{Convergence of the Gibbs sampler algorithm}

480:

481: We study convergence using the test functions $f_{ij}$ which indicate

482: if gene $i$ and $j$ are clustered together or not (see eq.

483: (\ref{eq:3}) in the Methods) and compute the correlation measure

484: $\rho$ between different runs for this set of functions (see eq.

485: (\ref{eq:5}) in the Methods).  In addition to the correlation

486: measure, we also compute the entropy measure $H_{\text{fuzzy}}$

487: (see eq. (\ref{eq:4}) in the Methods). This parameter summarizes the

488: `shape' of the posterior distribution: a value of $0$ corresponds to

489: hard clustering which implies that the distribution is completely

490: supported on a single solution, the more positive $H_{\text{fuzzy}}$

491: is, the more the distribution is supported on multiple solutions.

492:

493: In the analysis below we use subsets from the \citeauthor{gaschdata}

494: data set with a varying number of genes and conditions and perform

495: multiple Gibbs sampler runs with a large number of iterations.  One

496: iteration involves a reassignment of all genes and all conditions in

497: all clusters, and hence involves $N + M\times K$ sampling steps in the

498: Gibbs sampler, where $N$ is the number of genes, $M$ the number of

499: conditions, and $K$ the number of clusters at that iteration

500: (typically $K\sim\sqrt{N}$).

501:

502: \begin{figure}[h]

503:   \centering

504:   \includegraphics[width=\linewidth]{Fig1-GeneExptConvergence.eps}

505:   \caption{Trace plot of the correlation measure $\rho$ between two

506:     different Gibbs sampler runs as a function of the number of

507:     iterations, for a small data set ($100$ genes, $10$ conditions,

508:     top curve) and a large data set ($1000$ genes, $173$ conditions,

509:     bottom curve).  Both data sets are subsets of the

510:     \citeauthor{gaschdata} data set.}

511:   \label{convergence}

512: \end{figure}

513:

514:

515: First we consider a very small data set ($100$ genes, $10$

516: conditions). We start two Gibbs sampler runs in parallel and compute

517: the correlation measure $\rho$ at each iteration, see Figure

518: \ref{convergence}. In this case, $\rho$ approaches its maximum value

519: $\rho=1$ in less than $5000$ iterations and the Gibbs sampler

520: generates a well mixing chain which can easily explore the whole

521: space. Non-zero values of the entropy measure $H_{\text{fuzzy}}$

522: ($0.105\pm0.003$) indicate that the posterior distribution is

523: supported on multiple clusterings of the genes.

524:

525: Next we run the Gibbs sampler algorithm on a data set with $1000$

526: genes and all 173 conditions.  Unlike in the previous situation we

527: observe that the correlation between two Gibbs sampler runs saturates

528: well below $1$ (see Figure \ref{convergence}). Hence the Gibbs sampler

529: does not converge to the posterior distribution in one run.  We can

530: gain further understanding for the lack of convergence by looking in

531: more detail at a single Gibbs sampler run.  It turns out that the

532: correlation measure between two successive iterations reaches $1$ very

533: rapidly and remains unchanged afterwards (See Supplementary Figure

534: $2$).  Since each iteration involves a large number of sampling steps

535: (\ie, a large number of possible configuration changes), this implies

536: that the Gibbs sampler very rapidly finds a local maximum of the

537: posterior distribution from which it can no longer escape.  We

538: conclude that the posterior distribution is supported on multiple

539: local maxima which overlap only partially, and with valleys in between

540: that cannot be crossed by the Gibbs sampler.  These local maxima all

541: have approximately the same log-likelihood (see for instance the small

542: variance in Figure \ref{Spellman_conv} below) and are therefore all

543: equally meaningful.  The probability ratio between peaks and valleys

544: is so large (exponential in the size of the data set) that an accurate

545: approximation to the posterior distribution is given by averaging over

546: the local maxima only. Those can be uncovered by performing multiple

547: independent runs, each converging very quickly on one of the maxima,

548: and there is no need for special techniques to also sample in between

549: local maxima.  The number of local maxima (Gibbs sampler runs)

550: necessary for a good approximation can be estimated as follows. We

551: perform $150$ independent Gibbs sampler runs and compute for each the

552: pairwise gene-gene clustering probability matrix $F$ (see Methods).

553: For each $k=1,\dots,50$, we take two non-overlapping sets of $k$

554: solutions and compute the average of their pairwise probability

555: matrices $F$.  Then, we compute the correlation measure $\rho$ between

556: those two averages.  This is repeated several times, depending on the

557: number of non-overlapping sets that can be chosen from the pool of

558: $150$ solutions.  If for a given $k$ the correlation is always $1$,

559: then there are at most $k$ local maxima.  Figure \ref{merge} shows

560: that as $k$ increases, the correlation quickly reaches close to this

561: perfect value $1$. This implies that the number of local maxima is not

562: too large and a good approximation to the posterior distribution can

563: be obtained in this case already with $10$ to $20$ solutions.

564: Supplementary Figure $1$ shows an example of hard clusters formed as a

565: result of a single run and fuzzy clusters formed by merging the result

566: of $10$ independent runs.

567:

568: \begin{figure}[h]

569: \centering

570: \includegraphics[width=\linewidth]{Fig2-merge.eps}

571: \caption{Correlation measure $\rho$ between different averages of

572:   the same number of local maxima for a data set of 1000 genes and 173

573:   conditions (subset of the \citeauthor{gaschdata} data set).}

574: \label{merge}

575: \end{figure}

576:

577: In Figure \ref{corr_entropy}, we keep the same $1000$ genes and select

578: an increasing number of conditions. As the data set increases, the

579: entropy measure $H_{\text{fuzzy}}$ decreases, meaning the clusters

580: become increasingly hard. Simultaneously, the correlation measure

581: $\rho$ decreases from about $0.85$ to $0.55$ (see Supplementary Figure

582: $3$).  We conclude that the depth of the valleys between different

583: local maxima of the posterior distribution increases with the size of

584: the data set and it becomes increasingly more difficult for the Gibbs

585: sampler to escape from these maxima and visit the whole space in one

586: run.

587:

588: \begin{figure}[h]

589:   \centering

590:   \includegraphics[width=\linewidth]{Fig3-entropy.eps}

591:   \caption{Entropy measure $H_{\text{fuzzy}}$ for data sets with 1000

592:     genes and varying number of conditions (subsets of the

593:     \citeauthor{gaschdata} data set).}

594:   \label{corr_entropy}

595: \end{figure}

596:

597:

598: \subsection*{Analysis of whole genome data sets}

599:

600:

601: If we run the Gibbs sampler algorithm on the three whole genome yeast

602: data sets, we are in the situation where the algorithm very rapidly

603: gets stuck in a local maximum. In Figure \ref{Spellman_conv} we plot

604: the average Bayesian log-likelihood score (see eq. (\ref{eq:7}) in the

605: Methods) for $10$ different Gibbs sampler runs for the

606: \citeauthor{spellmandata} data set. The rapid convergence of the

607: log-likelihood shows that the Gibbs sampler reaches the local maxima

608: very quickly and the low variance shows that the different local

609: maxima are all equally likely.  The average over $10$ runs of the GO

610: mutual information score (see eq.  (\ref{eq:6}) in the Methods) shows

611: the same rapid convergence and small variance (see Supplementary

612: Figure $6$), implying that the different maxima are biologically

613: equally meaningful according to this score. The correlation between

614: different averages of $10$ Gibbs sampler runs reaches $0.85$, a value

615: we consider high enough for a good approximation of the posterior

616: distribution.  The other two data sets show precisely the same

617: behavior (see Supplementary Figures $4$ and $5$).

618:

619:

620: \begin{figure}[h]

621:   \centering

622:   \includegraphics[width=\linewidth]{Fig4-Spellman_score.eps}

623:   \caption{Trace plot of the average log-likelihood score and standard

624:     deviation for $10$ Gibbs sampler runs for the

625:     \citeauthor{spellmandata} data set.}

626:   \label{Spellman_conv}

627: \end{figure}

628:

629:

630:

631: \subsection*{Two-way clustering \textit{versus} one-way clustering}

632:

633: Our coclustering algorithm extends the CRC algorithm of \cite{chinese}

634: by also clustering the conditions for each cluster of genes

635: (\emph{`two-way clustering'}), instead of assuming they are always

636: independent (\emph{`one-way clustering'}). We compare the clustering

637: of genes for the three yeast data sets using both methods, by

638: computing the average number of clusters inferred ($K$), the average

639: log-likelihood score and the average GO mutual information score for

640: $10$ independent runs of each algorithm.  The results are tabulated in

641: Table \ref{oneway} and \ref{twoway}.  For all three data sets, both

642: the log-likelihood score and the GO mutual information score are

643: higher (better) for our method. The increase in GO mutual information

644: score is especially significant in case of the \citeauthor{hughesdata}

645: data set.  This data set has very few overexpressed or repressed

646: values and if each condition is considered independent, there are very

647: few distinct profiles which results in the formation of very few

648: clusters ($\sim 15$ for $6052$ genes). Also clustering the conditions

649: gives more meaningful results since differentially expressed

650: conditions form separate clusters from one large background cluster of

651: non-differentially expressed conditions.

652:

653: \begin{table}[t]

654:   \processtable{One-way clustering, averages for $10$ different

655:     Gibbs sampler runs.\label{oneway}}

656:   {\begin{tabular}{lccc}\toprule

657:       Data set & Avg. $K$ & Avg. log-likelihood score & Avg. MI\\\midrule

658:       \citeauthor{gaschdata} & $52.9 (2.6)$ & $-6.101 (0.014) \times 10^{5}$

659:       & $1.771 (0.031)$\\

660:       \citeauthor {hughesdata} & $14.9 (0.5)$ & $2.530 (0.002) \times 10^6$

661:       & $0.588 (0.044)$\\

662:       \citeauthor{spellmandata} & $49.7 (2.2)$ & $-7.183 (0.037) \times 10^{4}$

663:       & $1.491 (0.032)$\\\botrule

664: \end{tabular}}{}

665: \end{table}

666:

667: \begin{table}[t]

668:   \processtable{Two-way clustering, averages for $10$ different

669:     Gibbs sampler runs.\label{twoway}}

670:   {\begin{tabular}{lccc}\toprule

671:       Data set & Avg. $K$ & Avg. log-likelihood score & Avg. MI\\\midrule

672:       \citeauthor{gaschdata} & $84.5(2.5)$ & $-5.586(0.012)\times 10^{5}$

673:       & $1.912(0.033)$\\

674:       \citeauthor {hughesdata} & $85.5(2.7)$ & $2.798(0.004)\times 10^6$

675:       & $1.511(0.045)$\\

676:       \citeauthor{spellmandata} & $65.4(4.2)$ & $-5.112(0.011)\times 10^{4}$

677:       & $1.612(0.032)$\\\botrule

678: \end{tabular}}{}

679: \end{table}

680:

681: For simulated data sets, clusters are defined as sets of genes sharing

682: the same regulators in the synthetic regulatory network, and the true

683: number of clusters is known.  Here we consider a gene network whose

684: topology is subsampled from an \emph{E.~coli} transcriptional network

685: \citep{syntren} with $1000$ genes, of which $105$ transcription

686: factors, and $286$ clusters.  For two-way clustering, as we increase

687: the number of conditions in the simulated data set, more clusters are

688: formed and the number of clusters saturates close to the true number

689: (see Figure \ref{clusterOnewayTwoway}). For one-way clustering,

690: addition of conditions does not affect the inferred number of clusters

691: which is an order of magnitude smaller than the true number (see

692: Figure \ref{clusterOnewayTwoway}). For two-way clustering, due to the

693: clustering of conditions, the number of model parameters is reduced,

694: and greater statistical accuracy can be achieved, even when the number

695: of genes in a cluster becomes small.

696:

697: The correlation measure $\rho$ between true clusters and inferred

698: clusters also shows a higher value for two-way clustering over one-way

699: (Supplementary Figure 8).

700:

701: Unlike for simulated data sets, the inferred number of clusters does

702: not depend much upon the number of conditions for real biological data

703: sets (Supplementary Figure $7$), \ie, even if more conditions are

704: added, the algorithm does not generate more clusters. This is because

705: in simulated data, every addition of a condition adds new information,

706: but for real data sets that might not be the case. In order to get the

707: true clusters from the expression data, we do not only need more

708: conditions but also that each new condition contributes information

709: different from the information already available from the previous

710: conditions. This might be a reason why the algorithm clusters $6052$

711: genes in only $\sim 80$ clusters (see Table \ref{twoway}).

712:

713: \begin{figure}[h]

714:   \centering

715:   \includegraphics[width=\linewidth]{Fig5-OnevsTwo.eps}

716:   \caption{Number of gene clusters for a simulated data set with

717:     $1000$ genes and a varying number of conditions, for two-way

718:     clustering (top data points ($\times$)) and one-way clustering

719:     (bottom data points ($+$))}

720:   \label{clusterOnewayTwoway}

721: \end{figure}

722:

723: \subsection*{Fuzzy clusters}

724:

725: Our algorithm returns a summary of the posterior distribution in the

726: form of a gene-gene matrix whose entries are the probabilities that a

727: pair of genes is clustered together.  To convert these pairwise

728: probabilities back to clusters we use a graph spectral method as

729: explained in the Methods. The method produces fuzzy overlapping

730: clusters where each gene $i$ belongs to each fuzzy cluster $k$ with a

731: probability $p_{ik}$, such that $\sum_k p_{ik}=1$.  The size of a

732: fuzzy cluster $k$ is defined as $\sum_i p_{ik}$. The algorithm

733: iteratively produces new fuzzy clusters until all the information in

734: the pairwise matrix is converted into clusters ($1^{\text{st}}$

735: stopping criterium, see Methods), or until the mathematical conditions

736: underlying the algorithm cease to hold ($2^{\text{nd}}$ stopping

737: criterium, see Methods). We applied the algorithm to pairwise

738: probability matrices for each of the three data sets, obtained by

739: averaging over $10$ different Gibbs sampler runs.  For the

740: \citeauthor{gaschdata} and \citeauthor{hughesdata} data sets, full

741: fuzzy clustering is achieved with $500$ fuzzy clusters (all $6052$

742: genes have total assignment probability $\sum_k p_{ik}>0.98$).  For

743: the \citeauthor{spellmandata} data set the second stopping

744: criterium is met after producing $321$ fuzzy clusters.

745:

746: In general, we observe that the algorithm first produces one very

747: large fuzzy cluster corresponding to an average expression profile

748: that almost all genes can relate to. This cluster is of no interest

749: for further analysis.  Then it produces a number of fuzzy clusters of

750: varying size which show interesting coexpression profiles and are

751: useful for further analysis. For the three data sets considered here,

752: this number is around $100$, consistent with the average number of

753: clusters in different Gibbs sampler runs (see Table \ref{twoway}). The

754: remaining fuzzy clusters are typically very small and consist mostly

755: of noise. Like the very first cluster, they are of no interest for

756: further analysis.

757:

758: Since every gene belongs to every cluster, we use a probability cutoff

759: to remove from each cluster the genes which belong to it with a very

760: small probability. The smaller the cutoff, the more genes belong to a

761: cluster, which results into more fuzzy clusters and \textit{vice

762:   versa}.  Table \ref{cutoff} shows the total number of genes assigned

763: to at least one fuzzy cluster with different cutoff values and in

764: brackets the number of genes assigned to at least two fuzzy clusters.

765:

766: The goal of merging different Gibbs sampler solutions and forming

767: fuzzy clusters is to extract additional information out of a data set

768: that is not captured by a single hard clustering solution. This can be

769: achieved in two ways. First, by obtaining tight clusters of few but

770: highly coexpressed genes with a high probability cutoff. Second, by

771: characterizing genes which belong to multiple clusters with a

772: significant probability.

773:

774: \begin{table}[!t]

775:   \processtable{Number of genes clustered and number of genes belonging to

776:     multiple clusters with different membership probability cutoff values.\label{cutoff}}

777:   {\begin{tabular}{lccc}\toprule

778:       Data set & $0.1$ &  $0.3$  & $0.5$\\ \midrule

779:       \citeauthor{gaschdata} & $6045$ $(4356)$  &  $4062$ $(344)$  &  $1781$ $(0)$\\

780:       \citeauthor{hughesdata} & $6052$ $(4554)$  & $3959$ $(34)$  &  $2254$ $(0)$\\

781:       \citeauthor{spellmandata} & $6052$ $(5187)$  & $3158$ $(139)$  & $1255$ $(0)$\\\botrule

782: \end{tabular}}{}

783: \end{table}

784:

785:

786: For all three data sets, at a probability cutoff of $0.5$, we get a

787: subset of genes which belong to only one cluster with high

788: probability. Table \ref{cutoff} shows that each data set retains at

789: least $20\%$ of its genes. These are sets of strongly coexpressed

790: genes which cluster together in almost every hard cluster solution.

791: Ribosomal genes show such a strong coexpression pattern in all the

792: three data sets where most genes belong to this cluster with a

793: probability close to $1$ (see Figure \ref{hughes_ribosome}). At least

794: $75\%$ of all the genes in cluster $2$ (\citeauthor{gaschdata} data),

795: cluster $3$ (\citeauthor{hughesdata} data) and cluster $2$

796: (\citeauthor{spellmandata} data) are located in ribosome.

797:

798: \begin{figure}[h]

799: \centering

800: \includegraphics[width=\linewidth]{Fig6-hughes_cluster3part.eps}

801: \caption{Ribosomal genes form a tight cluster in the

802:   \citeauthor{hughesdata} data set. (Due to space constraints only the

803:   first few genes are shown; for the complete figure, see the

804:   Supplementary Information.)}

805: \label{hughes_ribosome}

806: \end{figure}

807:

808: Local but very strong coexpression patterns can also be detected by

809: our method. Cluster $15$ of the \citeauthor{gaschdata} dataset

810: consists of only $4$ genes clustered together with probability $1$

811: (see Figure \ref{gasch_galactose}). These four genes, GAL1, GAL2,

812: GAL7, and GAL10, are enzymes in the galactose catabolic pathway and

813: respond to different carbon sources during steady state. They are

814: strongly upregulated when galactose is used as a carbon source

815: ($2^{\text{nd}}$ experiment cluster in Figure \ref{gasch_galactose})

816: and strongly downregulated with any other sugar as a carbon source

817: ($1^{\text{st}}$ experiment cluster in Figure \ref{gasch_galactose}).

818: In every

819: hard cluster solution, these $4$ genes are clustered together along

820: with other genes.  By merging these hard cluster solutions to form

821: fuzzy clusters, we get a tight but more meaningful cluster with only

822: $4$ genes.

823:

824:

825: \begin{figure}[h]

826: \centering

827: \includegraphics[width=\linewidth]{Fig7-gasch_cluster15.eps}

828: \caption{Four genes GAL1, GAL2, GAL7 and GAL10 form a tight cluster

829:   showing conditional coexpression in the \citeauthor{gaschdata} data set.}

830: \label{gasch_galactose}

831: \end{figure}

832:

833: Table \ref{cutoff} shows that many genes belong to two or more

834: clusters with a significant probability.  For the

835: \citeauthor{gaschdata} data set, we find similar observations as in

836: \citep{gaschfuzzy}. Cluster 27 contains genes localized in endoplasmic

837: reticulum (ER) and induced under dithiothreitol (DTT) stress like

838: FKB2, JEM1, ERD2, ERP1, ERP2, RET2, RET3, SEC13, SEC21, SEC24 and

839: others.  Cluster 34 contains genes repressed under nitrogen stress and

840: stationary state.  20 percent of the genes in cluster 27 also belong

841: to cluster 34 with a significant membership.  These include genes

842: encoding for ER vesicle coat proteins like RET2, RET3, SEC13 and

843: others which are induced under DTT stress as well as repressed under

844: nitrogen stress and stationary state.  Also RIO1, an essential serine

845: kinase, belongs to two clusters with a significant probability.  It

846: clusters with genes involved in ribosomal biogenesis and assembly

847: (\citeauthor{gaschdata} data cluster $3$) as well as with genes

848: functioning as generators of precursor metabolites and energy

849: (\citeauthor{gaschdata} data cluster $7$). We find similar

850: observations for the \citeauthor{hughesdata} and

851: \citeauthor{spellmandata} datasets. Genes CLN1, CLN2 and other DNA

852: synthesis genes like CLB6 which are known to be regulated by SBF

853: during S1 phase \citep{cellcycle} belong to cluster $19$

854: (\citeauthor{spellmandata} data).  They also belong with significant

855: probability to cluster $4$ (\citeauthor{spellmandata} data). More than

856: one third of the genes in cluster $4$ are predicted to be cell cycle

857: regulated genes.

858:

859: \section*{Conclusion}

860:

861: We have developed an algorithm to simultaneously cluster genes and

862: conditions and sample such coclusterings from a Bayesian probabilistic

863: model.  For large data sets, the model is supported on multiple

864: equivalent local maxima. The average of these local maxima can be

865: represented by a matrix of pairwise gene-gene clustering probabilities

866: and we have introduced a new method for extracting fuzzy, overlapping

867: clusters from this matrix. This method is able to extract information

868: out of the data set that is not available from a single, hard

869: clustering.

870:

871:

872: \section*{Funding}

873:

874: Early Stage Marie Curie Fellowship to A.J.; Postdoctoral Fellowship of

875: the Research Foundation Flanders (Belgium) to T.M.

876:

877: \section*{Acknowledgement}

878: We thank Steven Maere and Vanessa Vermeirssen for helpful discussions.

879:

880:

881: % \bibliographystyle{natbib}

882: % \bibliography{gibbs_sampler_analysis}

883:

884: \begin{thebibliography}{}

885:

886: \bibitem[Ashburner {\em et~al.}(2000)Ashburner, Ball, Blake, Botstein, Butler,

887:   Cherry, Davis, Dolinski, Dwight, Eppig, Harris, Hill, Issel-Tarver,

888:   Kasarskis, Lewis, Matese, Richardson, Ringwald, Rubin, and Sherlock]{ashb00}

889: Ashburner, M., Ball, C.~A., Blake, J.~A., Botstein, D., Butler, H., Cherry,

890:   J.~M., Davis, A.~P., Dolinski, K., Dwight, S.~S., Eppig, J.~T., Harris,

891:   M.~A., Hill, D.~P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese,

892:   J.~C., Richardson, J.~E., Ringwald, M., Rubin, G.~M., and Sherlock, G.

893:   (2000).

894: \newblock {{G}ene ontology: tool for the unification of biology. {T}he {G}ene

895:   {O}ntology {C}onsortium}.

896: \newblock {\em Nat Genet\/}, {\bf 25}, 25--29.

897:

898: \bibitem[Dahl(2006)Dahl]{dahl2006}

899: Dahl, D.~B. (2006).

900: \newblock Model-based clustering for expression data via a {D}irichlet process

901:   mixture model.

902: \newblock In K.-A. Do, P.~M\"uller, and M.~Vannucci, editors, {\em {B}ayesian

903:   inference for gene expression and proteomics\/}, pages 201--218. Cambridge

904:   University Press.

905:

906: \bibitem[Eisen {\em et~al.}(1998)Eisen, Spellman, Brown, and

907:   Botstein]{pmid9843981}

908: Eisen, M.~B., Spellman, P.~T., Brown, P.~O., and Botstein, D. (1998).

909: \newblock {{C}luster analysis and display of genome-wide expression patterns}.

910: \newblock {\em Proc Natl Acad Sci U S A\/}, {\bf 95}(25), 14863--14868.

911:

912: \bibitem[Fraley and Raftery(2002)Fraley and Raftery]{fraley02}

913: Fraley, C. and Raftery, A.~E. (2002).

914: \newblock Model-based clustering, discriminant analysis, and density

915:   estimation.

916: \newblock {\em J Amer Statistical Assoc\/}, {\bf 97}, 611--631.

917:

918: \bibitem[Gasch and Eisen(2002)Gasch and Eisen]{gaschfuzzy}

919: Gasch, A.~P. and Eisen, M.~B. (2002).

920: \newblock {{E}xploring the conditional coregulation of yeast gene expression

921:   through fuzzy k-means clustering}.

922: \newblock {\em Genome Biol\/}, {\bf 3}(11), RESEARCH0059.

923:

924: \bibitem[Gasch {\em et~al.}(2000)Gasch, Spellman, Kao, Carmel-Harel, Eisen,

925:   Storz, Botstein, and Brown]{gaschdata}

926: Gasch, A.~P., Spellman, P.~T., Kao, C.~M., Carmel-Harel, O., Eisen, M.~B.,

927:   Storz, G., Botstein, D., and Brown, P.~O. (2000).

928: \newblock {{G}enomic expression programs in the response of yeast cells to

929:   environmental changes}.

930: \newblock {\em Mol Biol Cell\/}, {\bf 11}(12), 4241--4257.

931:

932: \bibitem[Gibbons and Roth(2002)Gibbons and Roth]{clusterjudge}

933: Gibbons, F.~D. and Roth, F.~P. (2002).

934: \newblock {{J}udging the quality of gene expression-based clustering methods

935:   using gene annotation}.

936: \newblock {\em Genome Res\/}, {\bf 12}(10), 1574--1581.

937:

938: \bibitem[Hughes {\em et~al.}(2000)Hughes, Marton, Jones, Roberts, Stoughton,

939:   Armour, Bennett, Coffey, Dai, He, Kidd, King, Meyer, Slade, Lum, Stepaniants,

940:   Shoemaker, Gachotte, Chakraburtty, Simon, Bard, and Friend]{hughesdata}

941: Hughes, T.~R., Marton, M.~J., Jones, A.~R., Roberts, C.~J., Stoughton, R.,

942:   Armour, C.~D., Bennett, H.~A., Coffey, E., Dai, H., He, Y.~D., Kidd, M.~J.,

943:   King, A.~M., Meyer, M.~R., Slade, D., Lum, P.~Y., Stepaniants, S.~B.,

944:   Shoemaker, D.~D., Gachotte, D., Chakraburtty, K., Simon, J., Bard, M., and

945:   Friend, S.~H. (2000).

946: \newblock {{F}unctional discovery via a compendium of expression profiles}.

947: \newblock {\em Cell\/}, {\bf 102}(1), 109--126.

948:

949: \bibitem[Inoue and Urahama(1999)Inoue and Urahama]{graphspectral}

950: Inoue, K. and Urahama, K. (1999).

951: \newblock Sequential fuzzy cluster extraction by a graph spectral method.

952: \newblock {\em Pattern Recogn. Lett.}, {\bf 20}(7), 699--705.

953:

954: \bibitem[Koch {\em et~al.}(1996)Koch, Schleiffer, Ammerer, and

955:   Nasmyth]{cellcycle}

956: Koch, C., Schleiffer, A., Ammerer, G., and Nasmyth, K. (1996).

957: \newblock {{S}witching transcription on and off during the yeast cell cycle:

958:   {C}ln/{C}dc28 kinases activate bound transcription factor {S}{B}{F}

959:   ({S}wi4/{S}wi6) at start, whereas {C}lb/{C}dc28 kinases displace it from the

960:   promoter in {G}2}.

961: \newblock {\em Genes Dev\/}, {\bf 10}(2), 129--141.

962:

963: \bibitem[Liu(2002)Liu]{liu2002}

964: Liu, J.~S. (2002).

965: \newblock {\em {M}onte {C}arlo strategies in scientific computing\/}.

966: \newblock Springer.

967:

968: \bibitem[Medvedovic and Sivaganesan(2002)Medvedovic and

969:   Sivaganesan]{pmid12217911}

970: Medvedovic, M. and Sivaganesan, S. (2002).

971: \newblock {{B}ayesian infinite mixture model based clustering of gene

972:   expression profiles}.

973: \newblock {\em Bioinformatics\/}, {\bf 18}(9), 1194--1206.

974:

975: \bibitem[Medvedovic {\em et~al.}(2004)Medvedovic, Yeung, and

976:   Bumgarner]{pmid14871871}

977: Medvedovic, M., Yeung, K.~Y., and Bumgarner, R.~E. (2004).

978: \newblock {{B}ayesian mixture model based clustering of replicated microarray

979:   data}.

980: \newblock {\em Bioinformatics\/}, {\bf 20}(8), 1222--1232.

981:

982: \bibitem[Michoel {\em et~al.}(2007)Michoel, Maere, Bonnet, Joshi, Saeys,

983:   Van~den Bulcke, Van~Leemput, van Remortel, Kuiper, Marchal, and Van~de

984:   Peer]{lemone}

985: Michoel, T., Maere, S., Bonnet, E., Joshi, A., Saeys, Y., Van~den Bulcke, T.,

986:   Van~Leemput, K., van Remortel, P., Kuiper, M., Marchal, K., and Van~de Peer,

987:   Y. (2007).

988: \newblock {{V}alidating module network learning algorithms using simulated

989:   data}.

990: \newblock {\em BMC Bioinformatics\/}, {\bf 8 Suppl 2}, S5.

991:

992: \bibitem[Qin(2006)Qin]{chinese}

993: Qin, Z.~S. (2006).

994: \newblock {{C}lustering microarray gene expression data using weighted

995:   {C}hinese restaurant process}.

996: \newblock {\em Bioinformatics\/}, {\bf 22}(16), 1988--1997.

997:

998: \bibitem[Redner and Walker(1984)Redner and Walker]{redner84}

999: Redner, R.~A. and Walker, H.~F. (1984).

1000: \newblock Mixture densities, maximum likelihood, and the {EM} algorithm.

1001: \newblock {\em SIAM Review\/}, {\bf 26}(2), 195--239.

1002:

1003: \bibitem[Segal {\em et~al.}(2003)Segal, Shapira, Regev, Pe'er, Botstein,

1004:   Koller, and Friedman]{segal2003}

1005: Segal, E., Shapira, M., Regev, A., Pe'er, D., Botstein, D., Koller, D., and

1006:   Friedman, N. (2003).

1007: \newblock Module networks: identifying regulatory modules and their

1008:   condition-specific regulators from gene expression data.

1009: \newblock {\em Nat Genet\/}, {\bf 34}, 166 -- 167.

1010:

1011: \bibitem[Spellman {\em et~al.}(1998)Spellman, Sherlock, Zhang, Iyer, Anders,

1012:   Eisen, Brown, Botstein, and Futcher]{spellmandata}

1013: Spellman, P.~T., Sherlock, G., Zhang, M.~Q., Iyer, V.~R., Anders, K., Eisen,

1014:   M.~B., Brown, P.~O., Botstein, D., and Futcher, B. (1998).

1015: \newblock {{C}omprehensive identification of cell cycle-regulated genes of the

1016:   yeast {S}accharomyces cerevisiae by microarray hybridization}.

1017: \newblock {\em Mol Biol Cell\/}, {\bf 9}(12), 3273--3297.

1018:

1019: \bibitem[Tamayo {\em et~al.}(1999)Tamayo, Slonim, Mesirov, Zhu, Kitareewan,

1020:   Dmitrovsky, Lander, and Golub]{pmid10077610}

1021: Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E.,

1022:   Lander, E.~S., and Golub, T.~R. (1999).

1023: \newblock {{I}nterpreting patterns of gene expression with self-organizing

1024:   maps: methods and application to hematopoietic differentiation}.

1025: \newblock {\em Proc Natl Acad Sci U S A\/}, {\bf 96}(6), 2907--2912.

1026:

1027: \bibitem[Tavazoie {\em et~al.}(1999)Tavazoie, Hughes, Campbell, Cho, and

1028:   Church]{pmid10391217}

1029: Tavazoie, S., Hughes, J.~D., Campbell, M.~J., Cho, R.~J., and Church, G.~M.

1030:   (1999).

1031: \newblock {{S}ystematic determination of genetic network architecture}.

1032: \newblock {\em Nat Genet\/}, {\bf 22}(3), 281--285.

1033:

1034: \bibitem[Van~den Bulcke {\em et~al.}(2006)Van~den Bulcke, Van~Leemput, Naudts,

1035:   van Remortel, Ma, Verschoren, De~Moor, and Marchal]{syntren}

1036: Van~den Bulcke, T., Van~Leemput, K., Naudts, B., van Remortel, P., Ma, H.,

1037:   Verschoren, A., De~Moor, B., and Marchal, K. (2006).

1038: \newblock {{S}yn{T}{R}e{N}: a generator of synthetic gene expression data for

1039:   design and analysis of structure learning algorithms}.

1040: \newblock {\em BMC Bioinformatics\/}, {\bf 7}, 43.

1041:

1042: \bibitem[Yeung {\em et~al.}(2001)Yeung, Fraley, Murua, Raftery, and

1043:   Ruzzo]{pmid11673243}

1044: Yeung, K.~Y., Fraley, C., Murua, A., Raftery, A.~E., and Ruzzo, W.~L. (2001).

1045: \newblock {{M}odel-based clustering and data transformations for gene

1046:   expression data}.

1047: \newblock {\em Bioinformatics\/}, {\bf 17}(10), 977--987.

1048:

1049: \end{thebibliography}

1050:

1051:

1052: \end{document}

1053:

1054: