1: \begin{abstract}
2: We study optimization for data-driven decision-making when we have observations of the uncertain parameters within the optimization model together with concurrent observations of covariates.
3: Given a new covariate observation, the goal is to choose a decision that minimizes the expected cost conditioned on this observation.
4: We investigate three data-driven frameworks that integrate a machine learning prediction model within a stochastic programming sample average approximation (SAA) for approximating the solution to this problem.
5: Two of the SAA frameworks are new and use out-of-sample residuals of leave-one-out prediction models for scenario generation.
6: The frameworks we investigate are flexible and accommodate parametric, nonparametric, and semiparametric regression techniques.
7: We derive conditions on the data generation process, the prediction model, and the stochastic program under which solutions of these data-driven SAAs are consistent and asymptotically optimal, and also derive convergence rates and finite sample guarantees.
8: Computational experiments validate our theoretical results, demonstrate the potential advantages of our data-driven formulations over existing approaches (even when the prediction model is misspecified), and illustrate the benefits of our new data-driven formulations in the \mbox{limited data regime}. \\[0.1in]
9: \keywords{Data-driven stochastic programming, covariates, regression, sample average approximation, {jackknife}, large deviations}
10: \end{abstract}
11: