abstract:e33b4f81b8c72f71.tex

1: \begin{abstract}

2:

3:   We consider the variable selection problem, which seeks to identify

4:   important variables influencing a response $Y$ out of many candidate

5:   features $X_1, \ldots, X_p$. We wish to do so while offering

6:   finite-sample guarantees about the fraction of false

7:   positives---selected variables $X_j$ that in fact have no effect on

8:   $Y$ after the other features are known.  When the number of features

9:   $p$ is large (perhaps even larger than the sample size $n$), and we

10:   have no prior knowledge regarding the type of dependence between $Y$

11:   and $X$, the model-X knockoffs framework nonetheless allows us to

12:   select a model with a guaranteed bound on the false discovery rate,

13:   as long as the distribution of the feature vector

14:   $X=(X_1,\dots,X_p)$ is exactly known. This model selection procedure

15:   operates by constructing ``knockoff copies'' of each of the $p$

16:   features, which are then used as a control group to ensure that the

17:   model selection algorithm is not choosing too many irrelevant

18:   features.  In this work, we study the practical setting where the

19:   distribution of $X$ could only be estimated, rather than known

20:   exactly, and the knockoff copies of the $X_j$'s are therefore

21:   constructed somewhat incorrectly.  Our results, which are free of

22:   any modeling assumption whatsoever, show that the resulting model

23:   selection procedure incurs an inflation of the false discovery rate

24:   that is proportional to our errors in estimating the distribution of

25:   each feature $X_j$ conditional on the remaining features

26:   $\{X_k:k\neq j\}$.  The model-X knockoffs framework is therefore

27: robust to errors in the underlying assumptions on the distribution of

28: $X$, making it an effective method for many practical applications,

29: such as genome-wide association studies, where the underlying

30: distribution on the features $X_1,\dots,X_p$ is estimated accurately

31: but not known exactly.

32: \end{abstract}

33: