abstract:a093b977e5457467.tex

1: \begin{abstract}

2: In large-scale genomic applications vast numbers of molecular features are

3: scanned in order to find a small number of candidates which are linked to a

4: particular disease or phenotype. This is a variable selection problem in the

5: ``large $p$, small $n$'' paradigm where many more variables than samples are

6: available. Additionally, a complex dependence structure is often observed among

7: the markers/genes due to their joint involvement in biological processes and

8: pathways.

9:

10: Bayesian variable selection methods that introduce sparseness through

11: additional priors on the model size are well suited to the problem. However,

12: the model space is very large and standard Markov chain Monte Carlo (MCMC) algorithms such as a Gibbs

13: sampler sweeping over all $p$ variables in each iteration are often

14: computationally infeasible. We propose to employ the dependence structure in

15: the data to decide which variables should always be updated together and which

16: are nearly conditionally independent and hence do not need to be considered

17: together.

18:

19: Here, we focus on binary classification applications. We follow the

20: implementation of the Bayesian probit regression model by \citet{albert93} and the Bayesian logistic regression model by \citet{holmes06}

21: which both lead to marginal Gaussian distributions. We investigate several MCMC

22: samplers using the dependence structure in different ways. The mixing and

23: convergence performances of the resulting Markov chains are evaluated and

24: compared to standard samplers in two simulation studies and in an application

25: to a real gene expression data set.

26: \end{abstract}

27: