0801.2033/gibbs_sampler_analysis.tex
1: \documentclass{bioinfo}
2: \copyrightyear{2007}
3: \pubyear{2007}
4: 
5: \usepackage{mathbbol,amssymb,latexsym,amsfonts,amsmath,amsthm}
6: \usepackage{graphicx}
7: \usepackage{float}
8: \usepackage{textcomp}
9: 
10: \newcommand{\ie}{\textit{i.e.}}
11: \newcommand{\G}{\mathcal{G}}
12: \newcommand{\C}{\mathcal{C}}
13: \newcommand{\D}{\mathcal{D}}
14: \newcommand{\E}{\mathcal{E}}
15: \DeclareMathOperator*{\argmax}{arg\, max}
16: 
17: \begin{document}
18: \firstpage{1}
19: 
20: \title{Analysis of a Gibbs sampler method for model based clustering
21:   of gene expression data}
22: 
23: \author[A. Joshi \textit{et~al}]{Anagha Joshi\,$^{\rm a,b}$, Yves Van
24:   de Peer\,$^{\rm a, b}$\footnote{Corresponding author,
25:     E-mail:yves.vandepeer@psb.ugent.be}, Tom Michoel\,$^{\rm a, b}$}
26: 
27: \address{$^{\rm a}$Department of Plant Systems Biology, VIB,
28:   Technologiepark 927, 9052 Gent, Belgium, $^{\rm b}$Department of
29:   Molecular Genetics, UGent, Technologiepark 927, 9052 Gent, Belgium}
30: \maketitle
31: 
32: \begin{abstract}
33: 
34:   \section{Motivation:} Over the last decade, a large variety of
35:   clustering algorithms have been developed to detect coregulatory
36:   relationships among genes from microarray gene expression data.
37:   Model based clustering approaches have emerged as statistically well
38:   grounded methods, but the properties of these algorithms when
39:   applied to large-scale data sets are not always well understood.  An
40:   in-depth analysis can reveal important insights about the
41:   performance of the algorithm, the expected quality of the output
42:   clusters, and the possibilities for extracting more relevant
43:   information out of a particular data set.
44:              
45:   \section{Results:} We have extended an existing algorithm for model
46:   based clustering of genes to simultaneously cluster genes and
47:   conditions, and used three large compendia of gene expression data
48:   for \emph{S.~cerevisiae} to analyze its properties. The algorithm
49:   uses a Bayesian approach and a Gibbs sampling procedure to
50:   iteratively update the cluster assignment of each gene and
51:   condition. For large-scale data sets, the posterior distribution is
52:   strongly peaked on a limited number of equiprobable clusterings.  A
53:   GO annotation analysis shows that these local maxima are all
54:   biologically equally significant, and that simultaneously clustering
55:   genes and conditions performs better than only clustering genes and
56:   assuming independent conditions.  A collection of distinct
57:   equivalent clusterings can be summarized as a weighted graph on the
58:   set of genes, from which we extract fuzzy, overlapping clusters
59:   using a graph spectral method.  The cores of these fuzzy clusters
60:   contain tight sets of strongly coexpressed genes, while the overlaps
61:   exhibit relations between genes showing only partial coexpression.
62: 
63:   \section{Availability:} \textsf{GaneSh}, a Java package for
64:   coclustering, is available under the terms of the GNU General Public
65:   License from our website at
66:   http://bioinformatics.psb.ugent.be/software.
67: 
68:   \section{Contact:} yves.vandepeer@psb.ugent.be
69: 
70:   \section{Supplementary information:} available on our website at\\
71:   http://bioinformatics.psb.ugent.be/supplementary\_data/anjos/gibbs
72: \end{abstract}
73: 
74: \section{Introduction}
75: 
76: Since the seminal paper by \citet{pmid9843981}, now almost a decade
77: ago, clustering forms the basis for extracting comprehensible
78: information out of large-scale gene expression data sets. Clusters of
79: coexpressed genes tend to be enriched for specific functional
80: categories \citep{pmid9843981}, share \textit{cis}-regulatory
81: sequences in their promoters \citep{pmid10391217}, or form the
82: building blocks for reconstructing transcription regulatory networks
83: \citep{segal2003}.
84: 
85: A variety of heuristic clustering methods have been used, such as
86: hierarchical clustering \citep{pmid9843981}, $k$-means
87: \citep{pmid10391217}, or self-organizing maps \citep{pmid10077610}.
88: Although these methods have had an enormous impact, their statistical
89: properties are generally not well understood and important parameters
90: such as the number of clusters are not determined automatically.
91: Therefore, there has been a shift in attention towards model based
92: clustering approaches in recent years
93: \citep{pmid11673243,fraley02,pmid12217911,pmid14871871,chinese,dahl2006}.
94: A model based approach assumes that the data is generated by a mixture
95: of probability distributions, one for each cluster, and takes
96: explicitly into account the noisyness of gene expression data. It
97: allows for a statistical assessment of the resulting clusters and
98: gives a formal estimate for the expected number of clusters.  To infer
99: model parameters and cluster assignments, standard statistical
100: techniques such as Expectation Maximization or Gibbs sampling are used
101: \citep{liu2002}.
102: 
103: In this paper we use a novel model based clustering method which
104: builds upon the method recently introduced by \citet{chinese}. We
105: address two key questions that have remained largely unanswered for
106: model based clustering methods in general, namely convergence of the
107: Gibbs sampler for very large data sets, and non-heuristic
108: reconstruction of gene clusters from the posterior probability
109: distribution of the statistical model.
110: 
111: In the model used by \cite{chinese}, it is assumed that the expression
112: levels of genes in one cluster are random samples drawn from a
113: Gaussian distribution and expression levels of different experimental
114: conditions are independent.  We have extended this model to allow
115: dependencies between different conditions in the same cluster.
116: \citet{pmid14871871} used a multivariate normal distribution to take
117: into account correlation among experimental conditions.  Our approach
118: consists of clustering the conditions within each gene cluster,
119: assuming that the expression levels of the genes in one gene cluster
120: for the conditions in one condition cluster are drawn from one
121: Gaussian distribution.  Hence our model is a model for
122: \emph{coclustering} or \emph{two-way clustering} of genes and
123: conditions. The same statistical model was also used in our recent
124: approach to reconstruct transcription regulatory networks
125: \citep{lemone}. The coclustering is carried out by a Gibbs sampler
126: which iteratively updates the assignment of each gene, and within each
127: gene cluster the assignment of each experimental condition, using the
128: full conditional distributions of the model.
129: 
130: It is known that a Gibbs sampler may have poor mixing properties if
131: the distribution being approximated is multi-modal and it will then
132: have a slow convergence rate \citep{liu2002}.  Previous studies of
133: Gibbs samplers for model based clustering have not reported
134: convergence difficulties \citep{pmid12217911,pmid14871871,dahl2006}.
135: In those studies, only data sets with a relatively small number of
136: genes (upto a few $100$) \citep{pmid12217911,pmid14871871}, or a small
137: number of experimental conditions (less than $10$) \citep{dahl2006}
138: were considered, and special sampling techniques such as reverse
139: annealing \citep{pmid14871871} or merge-split proposals
140: \citep{dahl2006} were sufficient to generate a well mixing Gibbs
141: sampler.  We observe that for data sets of increasing size the
142: correlation between two Gibbs sampler runs as well as the number of
143: cluster solutions visited in one run after burn-in steadily decreases.
144: This means that for large-scale data sets, the posterior distribution
145: is very strongly peaked on multiple local modes. Since the peaks are
146: so strong, we approximate the posterior distribution by averaging over
147: multiple runs performed in parallel, each converging quickly to a
148: single mode. By computing the correlation between different averages
149: of the same number of runs we are able to show that the number of
150: distinct modes is relatively small and accurate approximations to the
151: posterior distribution can be obtained with as few as $10$ modes for
152: around $6000$ genes.
153: 
154: To identify the final optimal clustering, the traditional approach is
155: to select out of all the clusterings visited by the Gibbs sampler the
156: one which maximizes the posterior distribution (maximum a posteriori
157: (MAP) clustering).  However, we show that for large data sets the
158: differences in likelihood between the different local maxima are
159: extremely small and statistically insignificant, such that the MAP
160: clustering is as good as taking any local maximum at random. A GO
161: \citep{ashb00} analysis of the different modes shows that also from
162: the biological point of view any difference between the local modes is
163: insignificant.  Taking into account the full posterior distribution is
164: more difficult since different clusterings may have a different number
165: of clusters and the labeling of clusters is not unique (the label
166: switching problem \citep{redner84}).  The common solution to this
167: problem is to consider pairwise probabilities for two genes being
168: clustered together or not \citep{pmid12217911,pmid14871871,dahl2006}.
169: A major question that has not yet recieved a final answer is how to
170: reconstruct gene clusters from these pairwise probabilities.
171: \cite{pmid12217911} and \cite{pmid14871871} use a heuristic
172: hierarchical clustering on the pairwise probability matrix to form a
173: final clustering estimate.  \cite{dahl2006} introduces a least-squares
174: method, which selects out of all clusterings visited by the Gibbs
175: sampler the one which minimizes a distance function to the pairwise
176: probability matrix. In both approaches, the probability matrix is
177: reduced to a single hard clustering. This necessarily removes
178: non-transitive relations between genes (such as a low probability for
179: a pair of genes to be clustered together even though they both have
180: relatively high probability to be clustered with the same third gene)
181: which may nevertheless be informative and biologically meaningful.
182: 
183: We propose that the pairwise probability matrix reflects a \emph{soft}
184: or \emph{fuzzy clustering} of the data, \ie, genes can belong to
185: multiple clusters with a certain probability.  To extract these fuzzy
186: clusters from the pairwise probabilities we use a method from pattern
187: recognition theory \citep{graphspectral}. This method iteratively
188: computes the largest eigenvalue and corresponding eigenvector of the
189: probability matrix, constructs a fuzzy cluster with the eigenvector,
190: and updates the probability matrix by removing from it the weight of
191: the genes assigned to the last cluster.  By only keeping genes which
192: belong to one fuzzy cluster with very high probability we obtain tight
193: clusters which show higher functional coherence compared to standard
194: clusters. Keeping also genes which belong with lower but still
195: significant probability to multiple fuzzy clusters, we can tentatively
196: identify multifunctional genes or relations between genes showing only
197: partial coexpression. We show that our results are in good agreement
198: with previous fuzzy clustering approaches to gene expression data
199: \citep{gaschfuzzy}. We believe that our fuzzy clustering method to
200: summarize the posterior distribution will be of general interest for
201: all model based clustering approaches and solves the problems
202: associated to heuristic clusterings of the pairwise probability
203: matrix.
204: 
205: All our analyses are performed on three large-scale public compendia
206: of gene expression data for \textit{S.~cerevisiae}
207: \citep{spellmandata,gaschdata,hughesdata}.
208: 
209: 
210: \begin{methods}
211: \section{Methods}
212: 
213: 
214: \subsection*{Mathematical model}
215: 
216: For an expression matrix with $N$ genes and $M$ conditions, we define
217: a coclustering as a partition of the genes into $K$ gene clusters
218: $\G_k$, together with for each gene cluster, a partition of the set of
219: conditions into $L_k$ condition clusters $\E_{k,l}$.  We assume that
220: all data points in a cocluster $\{(i,m)\colon i\in\G_k, m\in
221: \E_{k,l}\}$ are random samples from the same normal distribution. This
222: model generalizes the model used by \cite{chinese}, where the
223: partition of conditions is always fixed at the trivial partition into
224: singleton sets.
225: 
226: Given a set of means and precisions $(\mu_{kl},\tau_{kl})$, a
227: coclustering $\C$ defines a probability density on data matrices
228: $\D=(x_{im})$ by
229: \begin{align*}
230:   p\bigl(\D\mid\C,(\mu_{kl},\tau_{kl})\bigr) = \prod_{k=1}^K
231:   \prod_{l=1}^{L_k} \prod_{i\in\G_k}\prod_{m\in \E_{k,l}} p
232:   (x_{im}\mid \mu_{kl},\tau_{kl}).
233: \end{align*}
234: We use a uniform prior on the set of coclusterings with normal-gamma
235: conjugate priors for the parameters $\mu_{kl}$ and $\tau_{kl}$.  Using
236: Bayes' rule we find the probability of a coclustering $\C$ with
237: parameters $(\mu_{kl},\tau_{kl})$ given the data $\D$.  Then we take
238: the marginal probability over the parameters $(\mu_{kl},\tau_{kl})$ to
239: obtain the final probability of a coclustering $\C$ given the data
240: $\D$, upto a normalization constant:
241: \begin{equation}\label{eq:1}
242:   p(\C\mid\D) \propto \prod_{k=1}^K \prod_{l=1}^{L_k} \iint 
243:   p(\mu,\tau) \prod_{i\in\G_k}\prod_{m\in \E_{k,l}} p (x_{im}\mid
244:   \mu,\tau)\; d\mu d\tau,
245: \end{equation}
246: where $p(\mu,\tau)=p(\mu\mid\tau)p(\tau)$ with
247: \begin{align*}
248:   p(\mu\mid\tau)=\bigl(\frac{\lambda_0\tau}{2\pi}\bigr)^{1/2}
249:   e^{-\frac{\lambda_0\tau}2 (\mu-\mu_0)^2},\quad
250:   p(\tau) = \frac{\beta_0^{\alpha_0}}{\Gamma(\alpha_0)}
251:   \tau^{\alpha_0-1} e^{-\beta_0\tau},
252: \end{align*}
253: $\alpha_0,\beta_0,\lambda_0 > 0$ and $-\infty<\mu_0<\infty$ being the
254: parameters of the normal-gamma prior distribution.  We use the values
255: $\alpha_0=\beta_0=\lambda_0= 0.1$ and $\mu_0=0.0$, resulting in a
256: non-informative prior. We have compared the normal-gamma prior with
257: other non-informative, conjugate priors, but found no difference in
258: results (see Supplementary Information).  The double integral in eq.
259: (\ref{eq:1}) can be solved exactly in terms of the sufficient
260: statistics $T^{(n)}_{kl} = \sum_{i \in \G_k,m\in\E_{kl}} x_{im}^n$
261: ($n=0,1,2$) for each cocluster.  The log-likelihood or Bayesian score
262: decomposes as a sum of cocluster scores:
263: \begin{equation}\label{eq:7}
264:   S(\C) =\log p(\C\mid\D) = \sum_{k=1}^K \sum_{l=1}^{L_k} S_{kl},
265: \end{equation}
266: with
267: \begin{multline*}
268:   S_{kl} = -\tfrac12 T^{(0)}_{kl}\log(2\pi) + \tfrac12
269:   \log\bigl(\frac{\lambda_0}{\lambda_0 + T^{(0)}_{kl}}\bigr) 
270:    - \log\Gamma(\alpha_0)\\ + \log\Gamma(\alpha_0
271:   + \tfrac12 T^{(0)}_{kl})
272:   + \alpha_0\log\beta_0 -(\alpha_0 + \tfrac12 T^{(0)}_{kl})\log\beta_1
273: \end{multline*}
274: and
275: \begin{equation*}
276:   \beta_1 = \beta_0 + \frac12\Bigl[ T^{(2)}_{kl} -
277:   \frac{(T^{(1)}_{kl})^2}{T^{(0)}_{kl}} \Bigr]
278:   + \frac{\lambda_0 \bigl( T^{(1)}_{kl} - \mu_0 T^{(0)}_{kl}
279:     \bigr)^2}{2(\lambda_0 + T^{(0)}_{kl})T^{(0)}_{kl}}.
280: \end{equation*}
281: 
282: 
283: \subsection*{Gibbs sampler algorithm}
284: 
285: We use a Gibbs sampler to sample coclusterings from the posterior
286: distribution (\ref{eq:1}). The algorithm iteratively updates the
287: assignment of genes to gene clusters, and for each gene cluster, the
288: assignment of conditions to condition clusters as follows:
289: 
290: \begin{enumerate}
291: \item Initialization: randomly assign $N$ genes to a random $K_0$
292:   number of gene clusters, and for each cluster, randomly assign $M$
293:   conditions to a random $L_{k,0}$ number of condition clusters.
294: \item For $N$ cycles, remove a random gene $i$ from its current
295:   cluster.  For each gene cluster $k$, calculate the Bayesian score
296:   $S(\C_{i\to k})$, where $\C_{i\to k}$ denotes the coclustering
297:   obtained from $\C$ by assigning gene $i$ to cluster $k$, keeping all
298:   other assignments of genes and conditions equal, as well as the
299:   probability $S(\C_{i\to 0})$ for the gene to be alone in its own
300:   cluster.  Assign gene $i$ to one of the possible $K+1$ gene
301:   clusters, where $K$ is the current number of gene clusters,
302:   according to the probabilities $Q_k \propto e^{S(\C_{i\to k})}$,
303:   normalized such that $\sum_{k} Q_k=1$.
304: \item For each gene cluster $k$, for $M$ cycles, remove a random
305:   condition $m$ from its current cluster. For each condition cluster
306:   $l$, calculate the Bayesian score $S(\C_{k,m\to l})$. Assign
307:   condition $m$ to one of the possible $L_k+1$ clusters, where $L_k$
308:   is the current number of condition clusters for gene cluster $k$,
309:   according to the probabilities $Q_l \propto e^{S(\C_{k,m\to l})}$,
310:   normalized such that $\sum_{l} Q_l=1$.
311: \item Iterate step 2 and 3 until convergence. One iteration is defined
312:   as executing step 2 and 3 consecutively once, and hence consists of
313:   $N+K\times M$ sampling steps (with $K$ the number of gene clusters
314:   after Step 1 of that iteration).
315: \end{enumerate}
316: 
317: This coclustering algorithm simulates a Markov chain which satisfies
318: detailed balance with respect to the posterior distribution
319: (\ref{eq:1}), \ie, after a sufficient number of iterations, the
320: probability to visit a particular coclustering $\C$ is given exactly
321: by $p(\C\mid\D)$. The expectation value of any real function $f$ with
322: respect to the posterior distribution can be approximated by averaging
323: over the iterations of a sufficiently long Gibbs sampler run:
324: \begin{equation}\label{eq:2}
325:   E(f) = \sum_\C f(\C) p(\C\mid\D) \approx \frac1T \sum_{t=T_0+1}^{T_0+T}
326:   f(\C_t)
327: \end{equation}
328: where $\C_t$ is the coclustering visited at iteration $t$ and $T_0$ is
329: a possible burn-in period.  We say that the Gibbs sampler has
330: converged if two runs starting from different random initializations
331: return the same averages (\ref{eq:2}) for a suitable set of test
332: functions $f$. More precisely, if $\{f_n\}$ is a set of test
333: functions, define $a_n=E_1(f_n)$ the average of $f_n$ in the first
334: Gibbs sampler run, and $b_n=E_2(f_n)$ the average of $f_n$ in the
335: second Gibbs sampler run. We define a correlation measure $\rho$
336: ($0\leq\rho\leq1$) between two runs as
337: \begin{equation}\label{eq:5}
338:   \rho = \frac{|\sum_n a_n b_n|}{\sqrt{(\sum_n a_n^2) (\sum_n b_n^2)}}.
339: \end{equation}
340: Full convergence is reached if $\rho=1$.
341: 
342: \subsection*{Fuzzy clustering}
343: 
344: To keep track of the gene clusters, independent of the (varying)
345: number of clusters or their labeling, we consider functions
346: \begin{equation}\label{eq:3}
347:   f_{ij}(\C) =
348:   \begin{cases}
349:     1 & \text{if gene $i$ and $j$ belong to the same gene cluster in $\C$}\\
350:     0 & \text{otherwise}
351:   \end{cases}
352: \end{equation}
353: In general, the posterior distribution (\ref{eq:1}) is not
354: concentrated on a single coclustering and the matrix $F=(E(f_{ij}))$
355: of expectation values (see eq. (\ref{eq:2})) consists of probabilities
356: between $0$ and $1$. To quantify this fuzzyness, we use an entropy
357: measure
358: \begin{equation}\label{eq:4}
359:   H_{\text{fuzzy}} = \frac1{N^2\ln 2}\sum_{ij }h(F_{ij}),
360: \end{equation}
361: where $N$ is the dimension of the square matrix $F$ and
362: \begin{equation*}
363:   h(q)=-q\ln(q) - (1-q)\ln(1-q) \text{ for } 0\leq q\leq 1.
364: \end{equation*}
365: For a hard clustering ($F_{ij}=0$ or $1$ for all $i,j$),
366: $H_{\text{fuzzy}}=0$, and for a maximally fuzzy clustering
367: ($F_{ij}=0.5$ for all $i,j$), $H_{\text{fuzzy}}=1$. In reality, the
368: matrix $F$ is very sparse (most gene pairs will never be clustered
369: together), so $H_{\text{fuzzy}}$ remains small even for real fuzzy
370: clusterings.
371: 
372: We assume that a fuzzy gene-gene matrix $F$ is produced by a fuzzy
373: clustering of the genes, \ie, we assume that each gene $i$ has a
374: probability $p_{ik}$ to belong to each cluster $k$, such that $\sum_k
375: p_{ik}=1$. To extract these probabilities from $F$ we use a graph
376: spectral method \citep{graphspectral}, originally developed for
377: pattern recognition and image analysis, modified here to enforce the
378: normalization conditions on $p_{ik}$. A fuzzy cluster is represented
379: by a column vector $w=(w_1, \dots, w_N)^T$, with $w_i$ the weight of
380: gene $i$ in this cluster, normalized such that $\|w\|^2=w^Tw=\sum_i
381: w_i^2=1$.  The cohesiveness of the cluster with respect to the
382: gene-gene matrix $F$ is defined as $w^TFw = \sum_{ij}w_i F_{ij} w_j$.
383: By the Rayleigh-Ritz theorem,
384: \begin{align*}
385:   \max_{w\neq0} \frac{w^T F w}{w^Tw} = v_1^T F v_1 = \lambda_1,
386: \end{align*}
387: where $\lambda_1$ is the largest eigenvalue of $F$ and $v_1$ the
388: corresponding (normalized) eigenvector. Hence the maximally cohesive
389: cluster in $F$ is given by the eigenvector of the largest eigenvalue.
390: By the Perron-Frobenius theorem, this eigenvector is unique and all
391: its entries are nonnegative. We can then define the membership
392: probabilities to cluster $1$ by $p_{i1} =
393: \frac{v_{1,i}}{\max_j(v_{1,j})}$. Hence the gene with the highest
394: weight in $v_1$ is considered the prototypical gene for this cluster,
395: and it will not belong to any other cluster. The probability $p_{i1}$
396: measures to what extent other genes are coexpressed with this
397: prototypical gene.  To find the next most cohesive cluster, we remove
398: from $F$ the information already contained in the first cluster by
399: setting
400: \begin{align*}
401:   F^{(2)}_{ij}=\sqrt{1-p_{i1}} F_{ij} \sqrt{1-p_{j1}},
402: \end{align*}
403: and compute the largest eigenvalue and corresponding (normalized)
404: eigenvector $v_2$ for this matrix. The prototypical gene for this
405: cluster may already have some probability assigned to the previous
406: cluster, so we define the membership probabilities to the second
407: cluster by
408: \begin{align*}
409:   p_{i2} = \min\Bigl( \frac{v_{2,i}}{\max_j(v_{2,j})}
410:   (1-p_{i_{\text{max}}1}), 1-p_{i1}\Bigr).
411: \end{align*}
412: Here $i_{\text{max}}=\argmax_j(v_{2,j})$ is the prototypical gene for
413: the second cluster, and we take the `$\min$' to ensure that $\sum_k
414: p_{ik}$ will never exceed $1$.  
415: 
416: This procedure of reducing $F$ and computing the largest eigenvalue
417: and corresponding eigenvector to define the next cluster membership
418: probabilities is iterated until one of the following stopping criteria
419: is met:
420: \begin{enumerate}
421: \item All entries in the reduced matrix $F^{(k)}$ reach $0$, \ie, for
422:   all genes, $\sum_{k'<k} p_{ik'}=1$, and we have completely
423:   determined all fuzzy clusters and their membership probabilities.
424: \item The largest eigenvalue of the reduced matrix $F^{(k)}$ has rank
425:   $>1$. In this case the eigenvector is no longer unique and need no
426:   longer have nonnegative entries, so we cannot make new cluster
427:   membership probabilities out of it. This may happen if the
428:   (weighted) graph defined by connecting gene pairs with non-zero
429:   entries in $F^{(k)}$ is no longer strongly connected
430:   (Perron-Frobenius theorem).
431: \end{enumerate}
432: 
433: To compute one or more of the largest eigenvalues and eigenvectors for
434: large sparse matrices such as $F$ and its reductions $F^{(k)}$ we use
435: efficient sparse matrix routines, such as for instance implemented in
436: the Matlab$^{\text{\textregistered}}$ function \texttt{eigs}.
437: 
438: \subsection*{Data sets}
439: 
440: We use three large compendia of gene expression data for budding
441: yeast:
442: \begin{enumerate}
443: \item \citet{gaschdata} data set: expression in $173$ stress related
444:   conditions.
445: \item \citet{hughesdata} data set: compendium of expression profiles
446:   corresponding to $300$ diverse mutations and chemical treatments.
447: \item \citet{spellmandata} data set: $77$ conditions for alpha factor
448:   arrest, elutriation, and arrest of a cdc15 temperature-sensitive
449:   mutant.
450: \end{enumerate}
451: We select the genes present in all three data sets ($6052$ genes) and,
452: to be as unbiased as possible, no further postprocessing is done.  We
453: use SynTReN \citep{syntren} to generate simulated data sets with
454: varying number of conditions for a synthetic transcription regulatory
455: network with $1000$ genes (see also Supplementary Information).
456: 
457: 
458: \subsection*{Functional coherence}
459: 
460: To estimate the overall biological relevance of the clusters we use a
461: method which calculates the mutual information between clusters and GO
462: attributes \citep{clusterjudge}.  For each GOslim attribute, we create
463: a cluster-attribute contingency table where rows are clusters and
464: columns are attribute status (\emph{`Yes'} if the gene possesses the
465: attribute, \emph{`No'} if it is not known whether the gene possesses
466: the attribute).  The total mutual information is defined as the sum of
467: mutual informations between clusters and individual GO attributes:
468: \begin{equation}\label{eq:6}
469:   MI= \sum_A H(\C)+H(A)-H(\C,A) 
470: \end{equation}
471: where $\C$ is a clustering of the genes, $A$ is a GO attribute and $H$
472: is Shannon's entropy, $H=-\sum_i p_i\log(p_i)$, and the $p_i$ are
473: probabilities obtained from the contingency tables.
474: 
475: \end{methods}
476: 
477: \section{Results and discussion}
478: 
479: \subsection*{Convergence of the Gibbs sampler algorithm}
480: 
481: We study convergence using the test functions $f_{ij}$ which indicate
482: if gene $i$ and $j$ are clustered together or not (see eq.
483: (\ref{eq:3}) in the Methods) and compute the correlation measure
484: $\rho$ between different runs for this set of functions (see eq.
485: (\ref{eq:5}) in the Methods).  In addition to the correlation
486: measure, we also compute the entropy measure $H_{\text{fuzzy}}$
487: (see eq. (\ref{eq:4}) in the Methods). This parameter summarizes the
488: `shape' of the posterior distribution: a value of $0$ corresponds to
489: hard clustering which implies that the distribution is completely
490: supported on a single solution, the more positive $H_{\text{fuzzy}}$
491: is, the more the distribution is supported on multiple solutions.
492: 
493: In the analysis below we use subsets from the \citeauthor{gaschdata}
494: data set with a varying number of genes and conditions and perform
495: multiple Gibbs sampler runs with a large number of iterations.  One
496: iteration involves a reassignment of all genes and all conditions in
497: all clusters, and hence involves $N + M\times K$ sampling steps in the
498: Gibbs sampler, where $N$ is the number of genes, $M$ the number of
499: conditions, and $K$ the number of clusters at that iteration
500: (typically $K\sim\sqrt{N}$).
501: 
502: \begin{figure}[h]
503:   \centering
504:   \includegraphics[width=\linewidth]{Fig1-GeneExptConvergence.eps}
505:   \caption{Trace plot of the correlation measure $\rho$ between two
506:     different Gibbs sampler runs as a function of the number of
507:     iterations, for a small data set ($100$ genes, $10$ conditions,
508:     top curve) and a large data set ($1000$ genes, $173$ conditions,
509:     bottom curve).  Both data sets are subsets of the
510:     \citeauthor{gaschdata} data set.}
511:   \label{convergence}
512: \end{figure}
513: 
514: 
515: First we consider a very small data set ($100$ genes, $10$
516: conditions). We start two Gibbs sampler runs in parallel and compute
517: the correlation measure $\rho$ at each iteration, see Figure
518: \ref{convergence}. In this case, $\rho$ approaches its maximum value
519: $\rho=1$ in less than $5000$ iterations and the Gibbs sampler
520: generates a well mixing chain which can easily explore the whole
521: space. Non-zero values of the entropy measure $H_{\text{fuzzy}}$
522: ($0.105\pm0.003$) indicate that the posterior distribution is
523: supported on multiple clusterings of the genes.
524: 
525: Next we run the Gibbs sampler algorithm on a data set with $1000$
526: genes and all 173 conditions.  Unlike in the previous situation we
527: observe that the correlation between two Gibbs sampler runs saturates
528: well below $1$ (see Figure \ref{convergence}). Hence the Gibbs sampler
529: does not converge to the posterior distribution in one run.  We can
530: gain further understanding for the lack of convergence by looking in
531: more detail at a single Gibbs sampler run.  It turns out that the
532: correlation measure between two successive iterations reaches $1$ very
533: rapidly and remains unchanged afterwards (See Supplementary Figure
534: $2$).  Since each iteration involves a large number of sampling steps
535: (\ie, a large number of possible configuration changes), this implies
536: that the Gibbs sampler very rapidly finds a local maximum of the
537: posterior distribution from which it can no longer escape.  We
538: conclude that the posterior distribution is supported on multiple
539: local maxima which overlap only partially, and with valleys in between
540: that cannot be crossed by the Gibbs sampler.  These local maxima all
541: have approximately the same log-likelihood (see for instance the small
542: variance in Figure \ref{Spellman_conv} below) and are therefore all
543: equally meaningful.  The probability ratio between peaks and valleys
544: is so large (exponential in the size of the data set) that an accurate
545: approximation to the posterior distribution is given by averaging over
546: the local maxima only. Those can be uncovered by performing multiple
547: independent runs, each converging very quickly on one of the maxima,
548: and there is no need for special techniques to also sample in between
549: local maxima.  The number of local maxima (Gibbs sampler runs)
550: necessary for a good approximation can be estimated as follows. We
551: perform $150$ independent Gibbs sampler runs and compute for each the
552: pairwise gene-gene clustering probability matrix $F$ (see Methods).
553: For each $k=1,\dots,50$, we take two non-overlapping sets of $k$
554: solutions and compute the average of their pairwise probability
555: matrices $F$.  Then, we compute the correlation measure $\rho$ between
556: those two averages.  This is repeated several times, depending on the
557: number of non-overlapping sets that can be chosen from the pool of
558: $150$ solutions.  If for a given $k$ the correlation is always $1$,
559: then there are at most $k$ local maxima.  Figure \ref{merge} shows
560: that as $k$ increases, the correlation quickly reaches close to this
561: perfect value $1$. This implies that the number of local maxima is not
562: too large and a good approximation to the posterior distribution can
563: be obtained in this case already with $10$ to $20$ solutions.
564: Supplementary Figure $1$ shows an example of hard clusters formed as a
565: result of a single run and fuzzy clusters formed by merging the result
566: of $10$ independent runs.
567: 
568: \begin{figure}[h]
569: \centering
570: \includegraphics[width=\linewidth]{Fig2-merge.eps}
571: \caption{Correlation measure $\rho$ between different averages of
572:   the same number of local maxima for a data set of 1000 genes and 173
573:   conditions (subset of the \citeauthor{gaschdata} data set).}
574: \label{merge}
575: \end{figure}
576: 
577: In Figure \ref{corr_entropy}, we keep the same $1000$ genes and select
578: an increasing number of conditions. As the data set increases, the
579: entropy measure $H_{\text{fuzzy}}$ decreases, meaning the clusters
580: become increasingly hard. Simultaneously, the correlation measure
581: $\rho$ decreases from about $0.85$ to $0.55$ (see Supplementary Figure
582: $3$).  We conclude that the depth of the valleys between different
583: local maxima of the posterior distribution increases with the size of
584: the data set and it becomes increasingly more difficult for the Gibbs
585: sampler to escape from these maxima and visit the whole space in one
586: run.
587: 
588: \begin{figure}[h]
589:   \centering
590:   \includegraphics[width=\linewidth]{Fig3-entropy.eps}
591:   \caption{Entropy measure $H_{\text{fuzzy}}$ for data sets with 1000
592:     genes and varying number of conditions (subsets of the
593:     \citeauthor{gaschdata} data set).}
594:   \label{corr_entropy}
595: \end{figure}
596: 
597: 
598: \subsection*{Analysis of whole genome data sets}
599: 
600: 
601: If we run the Gibbs sampler algorithm on the three whole genome yeast
602: data sets, we are in the situation where the algorithm very rapidly
603: gets stuck in a local maximum. In Figure \ref{Spellman_conv} we plot
604: the average Bayesian log-likelihood score (see eq. (\ref{eq:7}) in the
605: Methods) for $10$ different Gibbs sampler runs for the
606: \citeauthor{spellmandata} data set. The rapid convergence of the
607: log-likelihood shows that the Gibbs sampler reaches the local maxima
608: very quickly and the low variance shows that the different local
609: maxima are all equally likely.  The average over $10$ runs of the GO
610: mutual information score (see eq.  (\ref{eq:6}) in the Methods) shows
611: the same rapid convergence and small variance (see Supplementary
612: Figure $6$), implying that the different maxima are biologically
613: equally meaningful according to this score. The correlation between
614: different averages of $10$ Gibbs sampler runs reaches $0.85$, a value
615: we consider high enough for a good approximation of the posterior
616: distribution.  The other two data sets show precisely the same
617: behavior (see Supplementary Figures $4$ and $5$).
618: 
619: 
620: \begin{figure}[h]
621:   \centering
622:   \includegraphics[width=\linewidth]{Fig4-Spellman_score.eps}
623:   \caption{Trace plot of the average log-likelihood score and standard
624:     deviation for $10$ Gibbs sampler runs for the
625:     \citeauthor{spellmandata} data set.}
626:   \label{Spellman_conv}
627: \end{figure}
628: 
629: 
630: 
631: \subsection*{Two-way clustering \textit{versus} one-way clustering}
632: 
633: Our coclustering algorithm extends the CRC algorithm of \cite{chinese}
634: by also clustering the conditions for each cluster of genes
635: (\emph{`two-way clustering'}), instead of assuming they are always
636: independent (\emph{`one-way clustering'}). We compare the clustering
637: of genes for the three yeast data sets using both methods, by
638: computing the average number of clusters inferred ($K$), the average
639: log-likelihood score and the average GO mutual information score for
640: $10$ independent runs of each algorithm.  The results are tabulated in
641: Table \ref{oneway} and \ref{twoway}.  For all three data sets, both
642: the log-likelihood score and the GO mutual information score are
643: higher (better) for our method. The increase in GO mutual information
644: score is especially significant in case of the \citeauthor{hughesdata}
645: data set.  This data set has very few overexpressed or repressed
646: values and if each condition is considered independent, there are very
647: few distinct profiles which results in the formation of very few
648: clusters ($\sim 15$ for $6052$ genes). Also clustering the conditions
649: gives more meaningful results since differentially expressed
650: conditions form separate clusters from one large background cluster of
651: non-differentially expressed conditions.
652: 
653: \begin{table}[t]
654:   \processtable{One-way clustering, averages for $10$ different 
655:     Gibbs sampler runs.\label{oneway}}
656:   {\begin{tabular}{lccc}\toprule
657:       Data set & Avg. $K$ & Avg. log-likelihood score & Avg. MI\\\midrule
658:       \citeauthor{gaschdata} & $52.9 (2.6)$ & $-6.101 (0.014) \times 10^{5}$ 
659:       & $1.771 (0.031)$\\
660:       \citeauthor {hughesdata} & $14.9 (0.5)$ & $2.530 (0.002) \times 10^6$ 
661:       & $0.588 (0.044)$\\
662:       \citeauthor{spellmandata} & $49.7 (2.2)$ & $-7.183 (0.037) \times 10^{4}$ 
663:       & $1.491 (0.032)$\\\botrule
664: \end{tabular}}{}
665: \end{table}
666: 
667: \begin{table}[t]
668:   \processtable{Two-way clustering, averages for $10$ different 
669:     Gibbs sampler runs.\label{twoway}}
670:   {\begin{tabular}{lccc}\toprule
671:       Data set & Avg. $K$ & Avg. log-likelihood score & Avg. MI\\\midrule
672:       \citeauthor{gaschdata} & $84.5(2.5)$ & $-5.586(0.012)\times 10^{5}$ 
673:       & $1.912(0.033)$\\
674:       \citeauthor {hughesdata} & $85.5(2.7)$ & $2.798(0.004)\times 10^6$ 
675:       & $1.511(0.045)$\\
676:       \citeauthor{spellmandata} & $65.4(4.2)$ & $-5.112(0.011)\times 10^{4}$ 
677:       & $1.612(0.032)$\\\botrule
678: \end{tabular}}{}
679: \end{table}
680: 
681: For simulated data sets, clusters are defined as sets of genes sharing
682: the same regulators in the synthetic regulatory network, and the true
683: number of clusters is known.  Here we consider a gene network whose
684: topology is subsampled from an \emph{E.~coli} transcriptional network
685: \citep{syntren} with $1000$ genes, of which $105$ transcription
686: factors, and $286$ clusters.  For two-way clustering, as we increase
687: the number of conditions in the simulated data set, more clusters are
688: formed and the number of clusters saturates close to the true number
689: (see Figure \ref{clusterOnewayTwoway}). For one-way clustering,
690: addition of conditions does not affect the inferred number of clusters
691: which is an order of magnitude smaller than the true number (see
692: Figure \ref{clusterOnewayTwoway}). For two-way clustering, due to the
693: clustering of conditions, the number of model parameters is reduced,
694: and greater statistical accuracy can be achieved, even when the number
695: of genes in a cluster becomes small.  
696: 
697: The correlation measure $\rho$ between true clusters and inferred
698: clusters also shows a higher value for two-way clustering over one-way
699: (Supplementary Figure 8).
700: 
701: Unlike for simulated data sets, the inferred number of clusters does
702: not depend much upon the number of conditions for real biological data
703: sets (Supplementary Figure $7$), \ie, even if more conditions are
704: added, the algorithm does not generate more clusters. This is because
705: in simulated data, every addition of a condition adds new information,
706: but for real data sets that might not be the case. In order to get the
707: true clusters from the expression data, we do not only need more
708: conditions but also that each new condition contributes information
709: different from the information already available from the previous
710: conditions. This might be a reason why the algorithm clusters $6052$
711: genes in only $\sim 80$ clusters (see Table \ref{twoway}).
712: 
713: \begin{figure}[h]
714:   \centering
715:   \includegraphics[width=\linewidth]{Fig5-OnevsTwo.eps}
716:   \caption{Number of gene clusters for a simulated data set with
717:     $1000$ genes and a varying number of conditions, for two-way
718:     clustering (top data points ($\times$)) and one-way clustering
719:     (bottom data points ($+$))}
720:   \label{clusterOnewayTwoway}
721: \end{figure}
722: 
723: \subsection*{Fuzzy clusters}
724: 
725: Our algorithm returns a summary of the posterior distribution in the
726: form of a gene-gene matrix whose entries are the probabilities that a
727: pair of genes is clustered together.  To convert these pairwise
728: probabilities back to clusters we use a graph spectral method as
729: explained in the Methods. The method produces fuzzy overlapping
730: clusters where each gene $i$ belongs to each fuzzy cluster $k$ with a
731: probability $p_{ik}$, such that $\sum_k p_{ik}=1$.  The size of a
732: fuzzy cluster $k$ is defined as $\sum_i p_{ik}$. The algorithm
733: iteratively produces new fuzzy clusters until all the information in
734: the pairwise matrix is converted into clusters ($1^{\text{st}}$
735: stopping criterium, see Methods), or until the mathematical conditions
736: underlying the algorithm cease to hold ($2^{\text{nd}}$ stopping
737: criterium, see Methods). We applied the algorithm to pairwise
738: probability matrices for each of the three data sets, obtained by
739: averaging over $10$ different Gibbs sampler runs.  For the
740: \citeauthor{gaschdata} and \citeauthor{hughesdata} data sets, full
741: fuzzy clustering is achieved with $500$ fuzzy clusters (all $6052$
742: genes have total assignment probability $\sum_k p_{ik}>0.98$).  For
743: the \citeauthor{spellmandata} data set the second stopping
744: criterium is met after producing $321$ fuzzy clusters.
745: 
746: In general, we observe that the algorithm first produces one very
747: large fuzzy cluster corresponding to an average expression profile
748: that almost all genes can relate to. This cluster is of no interest
749: for further analysis.  Then it produces a number of fuzzy clusters of
750: varying size which show interesting coexpression profiles and are
751: useful for further analysis. For the three data sets considered here,
752: this number is around $100$, consistent with the average number of
753: clusters in different Gibbs sampler runs (see Table \ref{twoway}). The
754: remaining fuzzy clusters are typically very small and consist mostly
755: of noise. Like the very first cluster, they are of no interest for
756: further analysis.
757: 
758: Since every gene belongs to every cluster, we use a probability cutoff
759: to remove from each cluster the genes which belong to it with a very
760: small probability. The smaller the cutoff, the more genes belong to a
761: cluster, which results into more fuzzy clusters and \textit{vice
762:   versa}.  Table \ref{cutoff} shows the total number of genes assigned
763: to at least one fuzzy cluster with different cutoff values and in
764: brackets the number of genes assigned to at least two fuzzy clusters.
765: 
766: The goal of merging different Gibbs sampler solutions and forming
767: fuzzy clusters is to extract additional information out of a data set
768: that is not captured by a single hard clustering solution. This can be
769: achieved in two ways. First, by obtaining tight clusters of few but
770: highly coexpressed genes with a high probability cutoff. Second, by
771: characterizing genes which belong to multiple clusters with a
772: significant probability.
773: 
774: \begin{table}[!t]
775:   \processtable{Number of genes clustered and number of genes belonging to 
776:     multiple clusters with different membership probability cutoff values.\label{cutoff}}
777:   {\begin{tabular}{lccc}\toprule
778:       Data set & $0.1$ &  $0.3$  & $0.5$\\ \midrule
779:       \citeauthor{gaschdata} & $6045$ $(4356)$  &  $4062$ $(344)$  &  $1781$ $(0)$\\
780:       \citeauthor{hughesdata} & $6052$ $(4554)$  & $3959$ $(34)$  &  $2254$ $(0)$\\
781:       \citeauthor{spellmandata} & $6052$ $(5187)$  & $3158$ $(139)$  & $1255$ $(0)$\\\botrule
782: \end{tabular}}{}
783: \end{table}
784: 
785: 
786: For all three data sets, at a probability cutoff of $0.5$, we get a
787: subset of genes which belong to only one cluster with high
788: probability. Table \ref{cutoff} shows that each data set retains at
789: least $20\%$ of its genes. These are sets of strongly coexpressed
790: genes which cluster together in almost every hard cluster solution.
791: Ribosomal genes show such a strong coexpression pattern in all the
792: three data sets where most genes belong to this cluster with a
793: probability close to $1$ (see Figure \ref{hughes_ribosome}). At least
794: $75\%$ of all the genes in cluster $2$ (\citeauthor{gaschdata} data),
795: cluster $3$ (\citeauthor{hughesdata} data) and cluster $2$
796: (\citeauthor{spellmandata} data) are located in ribosome.
797: 
798: \begin{figure}[h]
799: \centering
800: \includegraphics[width=\linewidth]{Fig6-hughes_cluster3part.eps}
801: \caption{Ribosomal genes form a tight cluster in the
802:   \citeauthor{hughesdata} data set. (Due to space constraints only the
803:   first few genes are shown; for the complete figure, see the
804:   Supplementary Information.)}
805: \label{hughes_ribosome}
806: \end{figure}
807: 
808: Local but very strong coexpression patterns can also be detected by
809: our method. Cluster $15$ of the \citeauthor{gaschdata} dataset
810: consists of only $4$ genes clustered together with probability $1$
811: (see Figure \ref{gasch_galactose}). These four genes, GAL1, GAL2,
812: GAL7, and GAL10, are enzymes in the galactose catabolic pathway and
813: respond to different carbon sources during steady state. They are
814: strongly upregulated when galactose is used as a carbon source
815: ($2^{\text{nd}}$ experiment cluster in Figure \ref{gasch_galactose})
816: and strongly downregulated with any other sugar as a carbon source
817: ($1^{\text{st}}$ experiment cluster in Figure \ref{gasch_galactose}).
818: In every
819: hard cluster solution, these $4$ genes are clustered together along
820: with other genes.  By merging these hard cluster solutions to form
821: fuzzy clusters, we get a tight but more meaningful cluster with only
822: $4$ genes.
823: 
824: 
825: \begin{figure}[h]
826: \centering
827: \includegraphics[width=\linewidth]{Fig7-gasch_cluster15.eps}
828: \caption{Four genes GAL1, GAL2, GAL7 and GAL10 form a tight cluster
829:   showing conditional coexpression in the \citeauthor{gaschdata} data set.}
830: \label{gasch_galactose}
831: \end{figure}
832: 
833: Table \ref{cutoff} shows that many genes belong to two or more
834: clusters with a significant probability.  For the
835: \citeauthor{gaschdata} data set, we find similar observations as in
836: \citep{gaschfuzzy}. Cluster 27 contains genes localized in endoplasmic
837: reticulum (ER) and induced under dithiothreitol (DTT) stress like
838: FKB2, JEM1, ERD2, ERP1, ERP2, RET2, RET3, SEC13, SEC21, SEC24 and
839: others.  Cluster 34 contains genes repressed under nitrogen stress and
840: stationary state.  20 percent of the genes in cluster 27 also belong
841: to cluster 34 with a significant membership.  These include genes
842: encoding for ER vesicle coat proteins like RET2, RET3, SEC13 and
843: others which are induced under DTT stress as well as repressed under
844: nitrogen stress and stationary state.  Also RIO1, an essential serine
845: kinase, belongs to two clusters with a significant probability.  It
846: clusters with genes involved in ribosomal biogenesis and assembly
847: (\citeauthor{gaschdata} data cluster $3$) as well as with genes
848: functioning as generators of precursor metabolites and energy
849: (\citeauthor{gaschdata} data cluster $7$). We find similar
850: observations for the \citeauthor{hughesdata} and
851: \citeauthor{spellmandata} datasets. Genes CLN1, CLN2 and other DNA
852: synthesis genes like CLB6 which are known to be regulated by SBF
853: during S1 phase \citep{cellcycle} belong to cluster $19$
854: (\citeauthor{spellmandata} data).  They also belong with significant
855: probability to cluster $4$ (\citeauthor{spellmandata} data). More than
856: one third of the genes in cluster $4$ are predicted to be cell cycle
857: regulated genes.
858: 
859: \section*{Conclusion}
860: 
861: We have developed an algorithm to simultaneously cluster genes and
862: conditions and sample such coclusterings from a Bayesian probabilistic
863: model.  For large data sets, the model is supported on multiple
864: equivalent local maxima. The average of these local maxima can be
865: represented by a matrix of pairwise gene-gene clustering probabilities
866: and we have introduced a new method for extracting fuzzy, overlapping
867: clusters from this matrix. This method is able to extract information
868: out of the data set that is not available from a single, hard
869: clustering.
870: 
871: 
872: \section*{Funding}
873: 
874: Early Stage Marie Curie Fellowship to A.J.; Postdoctoral Fellowship of
875: the Research Foundation Flanders (Belgium) to T.M.
876: 
877: \section*{Acknowledgement}
878: We thank Steven Maere and Vanessa Vermeirssen for helpful discussions.
879: 
880: 
881: % \bibliographystyle{natbib} 
882: % \bibliography{gibbs_sampler_analysis}   
883: 
884: \begin{thebibliography}{}
885: 
886: \bibitem[Ashburner {\em et~al.}(2000)Ashburner, Ball, Blake, Botstein, Butler,
887:   Cherry, Davis, Dolinski, Dwight, Eppig, Harris, Hill, Issel-Tarver,
888:   Kasarskis, Lewis, Matese, Richardson, Ringwald, Rubin, and Sherlock]{ashb00}
889: Ashburner, M., Ball, C.~A., Blake, J.~A., Botstein, D., Butler, H., Cherry,
890:   J.~M., Davis, A.~P., Dolinski, K., Dwight, S.~S., Eppig, J.~T., Harris,
891:   M.~A., Hill, D.~P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese,
892:   J.~C., Richardson, J.~E., Ringwald, M., Rubin, G.~M., and Sherlock, G.
893:   (2000).
894: \newblock {{G}ene ontology: tool for the unification of biology. {T}he {G}ene
895:   {O}ntology {C}onsortium}.
896: \newblock {\em Nat Genet\/}, {\bf 25}, 25--29.
897: 
898: \bibitem[Dahl(2006)Dahl]{dahl2006}
899: Dahl, D.~B. (2006).
900: \newblock Model-based clustering for expression data via a {D}irichlet process
901:   mixture model.
902: \newblock In K.-A. Do, P.~M\"uller, and M.~Vannucci, editors, {\em {B}ayesian
903:   inference for gene expression and proteomics\/}, pages 201--218. Cambridge
904:   University Press.
905: 
906: \bibitem[Eisen {\em et~al.}(1998)Eisen, Spellman, Brown, and
907:   Botstein]{pmid9843981}
908: Eisen, M.~B., Spellman, P.~T., Brown, P.~O., and Botstein, D. (1998).
909: \newblock {{C}luster analysis and display of genome-wide expression patterns}.
910: \newblock {\em Proc Natl Acad Sci U S A\/}, {\bf 95}(25), 14863--14868.
911: 
912: \bibitem[Fraley and Raftery(2002)Fraley and Raftery]{fraley02}
913: Fraley, C. and Raftery, A.~E. (2002).
914: \newblock Model-based clustering, discriminant analysis, and density
915:   estimation.
916: \newblock {\em J Amer Statistical Assoc\/}, {\bf 97}, 611--631.
917: 
918: \bibitem[Gasch and Eisen(2002)Gasch and Eisen]{gaschfuzzy}
919: Gasch, A.~P. and Eisen, M.~B. (2002).
920: \newblock {{E}xploring the conditional coregulation of yeast gene expression
921:   through fuzzy k-means clustering}.
922: \newblock {\em Genome Biol\/}, {\bf 3}(11), RESEARCH0059.
923: 
924: \bibitem[Gasch {\em et~al.}(2000)Gasch, Spellman, Kao, Carmel-Harel, Eisen,
925:   Storz, Botstein, and Brown]{gaschdata}
926: Gasch, A.~P., Spellman, P.~T., Kao, C.~M., Carmel-Harel, O., Eisen, M.~B.,
927:   Storz, G., Botstein, D., and Brown, P.~O. (2000).
928: \newblock {{G}enomic expression programs in the response of yeast cells to
929:   environmental changes}.
930: \newblock {\em Mol Biol Cell\/}, {\bf 11}(12), 4241--4257.
931: 
932: \bibitem[Gibbons and Roth(2002)Gibbons and Roth]{clusterjudge}
933: Gibbons, F.~D. and Roth, F.~P. (2002).
934: \newblock {{J}udging the quality of gene expression-based clustering methods
935:   using gene annotation}.
936: \newblock {\em Genome Res\/}, {\bf 12}(10), 1574--1581.
937: 
938: \bibitem[Hughes {\em et~al.}(2000)Hughes, Marton, Jones, Roberts, Stoughton,
939:   Armour, Bennett, Coffey, Dai, He, Kidd, King, Meyer, Slade, Lum, Stepaniants,
940:   Shoemaker, Gachotte, Chakraburtty, Simon, Bard, and Friend]{hughesdata}
941: Hughes, T.~R., Marton, M.~J., Jones, A.~R., Roberts, C.~J., Stoughton, R.,
942:   Armour, C.~D., Bennett, H.~A., Coffey, E., Dai, H., He, Y.~D., Kidd, M.~J.,
943:   King, A.~M., Meyer, M.~R., Slade, D., Lum, P.~Y., Stepaniants, S.~B.,
944:   Shoemaker, D.~D., Gachotte, D., Chakraburtty, K., Simon, J., Bard, M., and
945:   Friend, S.~H. (2000).
946: \newblock {{F}unctional discovery via a compendium of expression profiles}.
947: \newblock {\em Cell\/}, {\bf 102}(1), 109--126.
948: 
949: \bibitem[Inoue and Urahama(1999)Inoue and Urahama]{graphspectral}
950: Inoue, K. and Urahama, K. (1999).
951: \newblock Sequential fuzzy cluster extraction by a graph spectral method.
952: \newblock {\em Pattern Recogn. Lett.}, {\bf 20}(7), 699--705.
953: 
954: \bibitem[Koch {\em et~al.}(1996)Koch, Schleiffer, Ammerer, and
955:   Nasmyth]{cellcycle}
956: Koch, C., Schleiffer, A., Ammerer, G., and Nasmyth, K. (1996).
957: \newblock {{S}witching transcription on and off during the yeast cell cycle:
958:   {C}ln/{C}dc28 kinases activate bound transcription factor {S}{B}{F}
959:   ({S}wi4/{S}wi6) at start, whereas {C}lb/{C}dc28 kinases displace it from the
960:   promoter in {G}2}.
961: \newblock {\em Genes Dev\/}, {\bf 10}(2), 129--141.
962: 
963: \bibitem[Liu(2002)Liu]{liu2002}
964: Liu, J.~S. (2002).
965: \newblock {\em {M}onte {C}arlo strategies in scientific computing\/}.
966: \newblock Springer.
967: 
968: \bibitem[Medvedovic and Sivaganesan(2002)Medvedovic and
969:   Sivaganesan]{pmid12217911}
970: Medvedovic, M. and Sivaganesan, S. (2002).
971: \newblock {{B}ayesian infinite mixture model based clustering of gene
972:   expression profiles}.
973: \newblock {\em Bioinformatics\/}, {\bf 18}(9), 1194--1206.
974: 
975: \bibitem[Medvedovic {\em et~al.}(2004)Medvedovic, Yeung, and
976:   Bumgarner]{pmid14871871}
977: Medvedovic, M., Yeung, K.~Y., and Bumgarner, R.~E. (2004).
978: \newblock {{B}ayesian mixture model based clustering of replicated microarray
979:   data}.
980: \newblock {\em Bioinformatics\/}, {\bf 20}(8), 1222--1232.
981: 
982: \bibitem[Michoel {\em et~al.}(2007)Michoel, Maere, Bonnet, Joshi, Saeys,
983:   Van~den Bulcke, Van~Leemput, van Remortel, Kuiper, Marchal, and Van~de
984:   Peer]{lemone}
985: Michoel, T., Maere, S., Bonnet, E., Joshi, A., Saeys, Y., Van~den Bulcke, T.,
986:   Van~Leemput, K., van Remortel, P., Kuiper, M., Marchal, K., and Van~de Peer,
987:   Y. (2007).
988: \newblock {{V}alidating module network learning algorithms using simulated
989:   data}.
990: \newblock {\em BMC Bioinformatics\/}, {\bf 8 Suppl 2}, S5.
991: 
992: \bibitem[Qin(2006)Qin]{chinese}
993: Qin, Z.~S. (2006).
994: \newblock {{C}lustering microarray gene expression data using weighted
995:   {C}hinese restaurant process}.
996: \newblock {\em Bioinformatics\/}, {\bf 22}(16), 1988--1997.
997: 
998: \bibitem[Redner and Walker(1984)Redner and Walker]{redner84}
999: Redner, R.~A. and Walker, H.~F. (1984).
1000: \newblock Mixture densities, maximum likelihood, and the {EM} algorithm.
1001: \newblock {\em SIAM Review\/}, {\bf 26}(2), 195--239.
1002: 
1003: \bibitem[Segal {\em et~al.}(2003)Segal, Shapira, Regev, Pe'er, Botstein,
1004:   Koller, and Friedman]{segal2003}
1005: Segal, E., Shapira, M., Regev, A., Pe'er, D., Botstein, D., Koller, D., and
1006:   Friedman, N. (2003).
1007: \newblock Module networks: identifying regulatory modules and their
1008:   condition-specific regulators from gene expression data.
1009: \newblock {\em Nat Genet\/}, {\bf 34}, 166 -- 167.
1010: 
1011: \bibitem[Spellman {\em et~al.}(1998)Spellman, Sherlock, Zhang, Iyer, Anders,
1012:   Eisen, Brown, Botstein, and Futcher]{spellmandata}
1013: Spellman, P.~T., Sherlock, G., Zhang, M.~Q., Iyer, V.~R., Anders, K., Eisen,
1014:   M.~B., Brown, P.~O., Botstein, D., and Futcher, B. (1998).
1015: \newblock {{C}omprehensive identification of cell cycle-regulated genes of the
1016:   yeast {S}accharomyces cerevisiae by microarray hybridization}.
1017: \newblock {\em Mol Biol Cell\/}, {\bf 9}(12), 3273--3297.
1018: 
1019: \bibitem[Tamayo {\em et~al.}(1999)Tamayo, Slonim, Mesirov, Zhu, Kitareewan,
1020:   Dmitrovsky, Lander, and Golub]{pmid10077610}
1021: Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E.,
1022:   Lander, E.~S., and Golub, T.~R. (1999).
1023: \newblock {{I}nterpreting patterns of gene expression with self-organizing
1024:   maps: methods and application to hematopoietic differentiation}.
1025: \newblock {\em Proc Natl Acad Sci U S A\/}, {\bf 96}(6), 2907--2912.
1026: 
1027: \bibitem[Tavazoie {\em et~al.}(1999)Tavazoie, Hughes, Campbell, Cho, and
1028:   Church]{pmid10391217}
1029: Tavazoie, S., Hughes, J.~D., Campbell, M.~J., Cho, R.~J., and Church, G.~M.
1030:   (1999).
1031: \newblock {{S}ystematic determination of genetic network architecture}.
1032: \newblock {\em Nat Genet\/}, {\bf 22}(3), 281--285.
1033: 
1034: \bibitem[Van~den Bulcke {\em et~al.}(2006)Van~den Bulcke, Van~Leemput, Naudts,
1035:   van Remortel, Ma, Verschoren, De~Moor, and Marchal]{syntren}
1036: Van~den Bulcke, T., Van~Leemput, K., Naudts, B., van Remortel, P., Ma, H.,
1037:   Verschoren, A., De~Moor, B., and Marchal, K. (2006).
1038: \newblock {{S}yn{T}{R}e{N}: a generator of synthetic gene expression data for
1039:   design and analysis of structure learning algorithms}.
1040: \newblock {\em BMC Bioinformatics\/}, {\bf 7}, 43.
1041: 
1042: \bibitem[Yeung {\em et~al.}(2001)Yeung, Fraley, Murua, Raftery, and
1043:   Ruzzo]{pmid11673243}
1044: Yeung, K.~Y., Fraley, C., Murua, A., Raftery, A.~E., and Ruzzo, W.~L. (2001).
1045: \newblock {{M}odel-based clustering and data transformations for gene
1046:   expression data}.
1047: \newblock {\em Bioinformatics\/}, {\bf 17}(10), 977--987.
1048: 
1049: \end{thebibliography}
1050: 
1051: 
1052: \end{document}
1053: 
1054: