e33b4f81b8c72f71.tex
1: \begin{abstract}
2: 
3:   We consider the variable selection problem, which seeks to identify
4:   important variables influencing a response $Y$ out of many candidate
5:   features $X_1, \ldots, X_p$. We wish to do so while offering
6:   finite-sample guarantees about the fraction of false
7:   positives---selected variables $X_j$ that in fact have no effect on
8:   $Y$ after the other features are known.  When the number of features
9:   $p$ is large (perhaps even larger than the sample size $n$), and we
10:   have no prior knowledge regarding the type of dependence between $Y$
11:   and $X$, the model-X knockoffs framework nonetheless allows us to
12:   select a model with a guaranteed bound on the false discovery rate,
13:   as long as the distribution of the feature vector
14:   $X=(X_1,\dots,X_p)$ is exactly known. This model selection procedure
15:   operates by constructing ``knockoff copies'' of each of the $p$
16:   features, which are then used as a control group to ensure that the
17:   model selection algorithm is not choosing too many irrelevant
18:   features.  In this work, we study the practical setting where the
19:   distribution of $X$ could only be estimated, rather than known
20:   exactly, and the knockoff copies of the $X_j$'s are therefore
21:   constructed somewhat incorrectly.  Our results, which are free of
22:   any modeling assumption whatsoever, show that the resulting model
23:   selection procedure incurs an inflation of the false discovery rate
24:   that is proportional to our errors in estimating the distribution of
25:   each feature $X_j$ conditional on the remaining features
26:   $\{X_k:k\neq j\}$.  The model-X knockoffs framework is therefore
27: robust to errors in the underlying assumptions on the distribution of
28: $X$, making it an effective method for many practical applications,
29: such as genome-wide association studies, where the underlying
30: distribution on the features $X_1,\dots,X_p$ is estimated accurately
31: but not known exactly.
32: \end{abstract}
33: