abstract:a09a887d5f6be938.tex

1: \begin{abstract}

2: We analyze the convergence rate of a simplified version of a popular

3: Gibbs sampling method used for statistical discovery of gene

4: regulatory binding motifs in DNA sequences. This sampler satisfies a

5: very strong form of ergodicity (uniform). However, we show that, due

6: to multimodality of the posterior distribution, the rate of

7: convergence often decreases exponentially as a function of the length

8: of the DNA sequence. Specifically, we show that this occurs whenever

9: there is more than one true repeating pattern in the data. In practice

10: there are typically multiple such patterns in biological data, the

11: goal being to detect the most well-conserved and frequently-occurring

12: of these. Our findings match empirical results, in which the

13: motif-discovery Gibbs sampler has exhibited such poor convergence that

14: it is used only for finding modes of the posterior distribution

15: (candidate motifs) rather than for obtaining samples from that

16: distribution. Ours are some of the first meaningful bounds on the

17: convergence rate of a Markov chain method for sampling from a

18: multimodal posterior distribution, as a function of statistical

19: quantities like the number of observations.

20: \end{abstract}