abstract:763164fae8c5dd7e.tex

1: \begin{abstract}

2: A central goal of machine learning is to learn robust representations that capture the causal relationship between inputs features and output labels.

3: % While machine learning models are able to learn c omplex prediction rules by minimizing the training error, they also

4: However, minimizing empirical risk over finite or biased datasets often results in models latching on to \emph{spurious correlations} between the training input/output pairs that are not fundamental to the problem at hand.

5: % Models that fit these correlations often fail on inputs where the spurious correlation does not hold.

6: In this paper, we define and analyze robust and spurious representations using the information-theoretic concept of \emph{minimal sufficient statistics}.

7: We prove that even when there is only bias of the input distribution (i.e.~\emph{covariate shift}), models can still pick up spurious features from their training data.

8: Group distributionally robust optimization (DRO) provides an effective tool to alleviate covariate shift by minimizing the \emph{worst-case} training loss over a set of pre-defined groups.

9: Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations that occur in the data.

10: % under ``imperfect'' partitions where groups are not created from exact spurious factors.

11: To address this, we further propose to minimize the worst-case losses over a more flexible set of distributions that are defined on the \emph{joint distribution} of groups and instances, instead of treating each group as a whole at optimization time.

12: Through extensive experiments on one image and two language tasks, we show that our model is significantly more robust than comparable baselines under various partitions.

13: Our code is available at \url{https://github.com/violet-zct/group-conditional-DRO}.

14:

15: \end{abstract}

16: