4c11b4b32dcaaa33.tex
1: \begin{abstract}
2: Compositional data sets are ubiquitous in science, including 
3: geology, ecology, and microbiology. 
4: In microbiome research, compositional data primarily 
5: arise from high-throughput sequence-based profiling 
6: experiments. These data comprise microbial compositions in 
7: their natural habitat and are often paired with covariate 
8: measurements 
9: that characterize physicochemical habitat properties or
10: the physiology of the host. Inferring parsimonious statistical
11: associations between microbial compositions and habitat- or
12: host-specific covariate data is an important step in exploratory
13: data analysis. A standard statistical model linking compositional
14: covariates to continuous outcomes is the linear log-contrast model.
15: This model describes the response as a linear combination of
16: log-ratios of the original compositions and has been extended 
17: to the high-dimensional setting via regularization. In
18: this contribution, we propose a general convex optimization model 
19: for linear log-contrast regression which includes many previous
20: proposals as special cases. We introduce a proximal algorithm that
21: solves the resulting constrained optimization problem exactly
22: with rigorous convergence guarantees. 
23: We illustrate the versatility of our approach by
24: investigating the performance of several model instances on
25: soil and gut microbiome data analysis tasks. 
26: 
27: \keywords{compositional data \and 
28: convex optimization \and log-contrast model 
29: \and  microbiome 
30: \and perspective function \and proximal algorithm}
31: % \PACS{PACS code1 \and PACS code2 \and more}
32: % \subclass{MSC code1 \and MSC code2 \and more}
33: \end{abstract}
34: