1: \begin{abstract}
2: Single-cell RNA sequencing (scRNA-seq) is powerful technology that allows researchers to understand gene expression patterns at the single-cell level. However, analysing scRNA-seq data is challenging due to issues and biases in data collection. In this work, we construct an integrated Bayesian model that simultaneously addresses normalization, imputation and batch effects and also nonparametrically clusters cells into groups across multiple datasets.
3: %Specifically, the Hierarchical Dirichlet process (HDP) is used to discover clusters of cells with similar mean-expression and dispersion patterns that may be unique or shared across datasets.
4: %In addition, the mean-variance relationship is directly accounted for through an informative regression model, which provides robust estimates, particularly for sparse data and/or small clusters.
5: A Gibbs sampler based on a finite-dimensional approximation of the HDP is developed for posterior inference.
6: %On simulated datasets, we show that our model is robust in terms of the ability to capture the clustering structure and the true relationship between the mean-expression and dispersion parameters. Our work is motivated by experimental data collected to study prenatal development of cells under conditions when the transcription factor, PAX6, is knocked out in mutant mice. In this case, our model is used to identify clusters of cells which behave differently under the experimental conditions.
7: \end{abstract}
8: