abstract:90224919bc2f2495.tex

1: \begin{abstract}

2: Regularized  regression models, such as the lasso and variants,

3: %are standard tools in applied machine learning and statistics. These methods

4: are well studied and,

5: under appropriate conditions,

6: offer fast and statistically interpretable results.

7: However, large data in many applications are

8: heterogeneous in the sense of harboring distributional differences between latent groups. Then,

9: the assumption that the conditional distribution of response $Y$ given features $X$

10: is the same for all samples may not hold (even approximately).

11: Furthermore, in scientific applications, the covariance structure of the features

12: may contain important signals and

13: its learning is also affected by latent group structure.

14: %The two issues -- heterogeneity in feature distributions and in regression models -- are linked, since both aspects may provide signals relevant to understanding the latent structure.

15: We propose a class of  regularized mixture models for

16: paired data of the form $(X,Y)$ that

17: couples together

18: the  distribution of $X$ (modeled using sparse  graphical models)

19: and the conditional $Y \mid X$ (modeled using sparse regression).

20: Both the regression and graphical models are specific to the latent groups and model parameters are estimated jointly (hence we

21: call the approach ``regularized joint mixtures").

22: This  allows

23: signals in either or both of the feature distribution and regression model to inform learning of latent structure and

24: % This joint strategy deals with suspected distributional shifts and

25: provides automatic control of confounding by such structure. Estimation is handled via an expectation-maximization algorithm, whose convergence is established theoretically. We illustrate the key ideas via empirical examples.

26: \end{abstract}

27: