a09a887d5f6be938.tex
1: \begin{abstract}
2: We analyze the convergence rate of a simplified version of a popular
3: Gibbs sampling method used for statistical discovery of gene
4: regulatory binding motifs in DNA sequences. This sampler satisfies a
5: very strong form of ergodicity (uniform). However, we show that, due
6: to multimodality of the posterior distribution, the rate of
7: convergence often decreases exponentially as a function of the length
8: of the DNA sequence. Specifically, we show that this occurs whenever
9: there is more than one true repeating pattern in the data. In practice
10: there are typically multiple such patterns in biological data, the
11: goal being to detect the most well-conserved and frequently-occurring
12: of these. Our findings match empirical results, in which the
13: motif-discovery Gibbs sampler has exhibited such poor convergence that
14: it is used only for finding modes of the posterior distribution
15: (candidate motifs) rather than for obtaining samples from that
16: distribution. Ours are some of the first meaningful bounds on the
17: convergence rate of a Markov chain method for sampling from a
18: multimodal posterior distribution, as a function of statistical
19: quantities like the number of observations.
20: \end{abstract}