1: \begin{abstract}%
2: We introduce a new algorithm that is robust to data
3: with missing features, a situation that arises in many practical
4: applications. Imputation methods are commonly used to fill in missing
5: values, however, they generally ignore the learning algorithm which is
6: used with the imputed dataset. We present a joint optimization which
7: selects both an imputation function and predictor simultaneously.
8: Despite the non-convexity of the initial formulation, we derive an
9: effective convex relation over a strictly larger hypothesis class. We
10: prove Rademacher complexity bounds for the larger class, which
11: guarantees convergence to the best in class, given sufficient training
12: data. The algorithm is tested on several UCI datasets, showing
13: superior performance over baselines.
14: \end{abstract}
15: