6d4b6f42180b590b.tex
1: \begin{abstract}
2: We study the use of gradient descent with   backtracking line search (GD-BLS)  to solve the noisy optimization problem $\theta_\star:=\argmin_{\theta\in\R^d} \E[f(\theta,Z)]$, imposing  that the function $F(\theta):=\E[f(\theta,Z)]$ is strictly convex. Assuming that $\E[\|\nabla_\theta f(\theta_\star,Z)\|^2]<\infty$ and that $F$ is locally $L$-smooth, we first prove that sample average approximation  based on GD-BLS allows to estimate $\theta_\star$ with an error of size  $\bigO_\P(B^{-0.25})$, where $B$ is the available computational budget. We then show that we can improve upon this rate by stopping the optimization process earlier when the gradient of the objective function is sufficiently close to zero, and use the residual computational budget to optimize, again with GD-BLS, a finer approximation of $F$. By  iteratively applying this strategy $J$ times we establish that we can estimate $\theta_\star$  with an error of size $\bigO_\P(B^{-\frac{1}{2}(1-\delta^{J})})$, where $\delta\in(1/2,1)$  is  a user-specified  parameter. More generally, we show that if $\E[\|\nabla_\theta f(\theta_\star,Z)\|^{1+\alpha}]<\infty$ for some  known $\alpha\in (0,1]$ then this approach, which can be seen as a retrospective approximation algorithm with a fixed computational budget,  allows to learn $\theta_\star$ with an error of size $\bigO_\P(B^{-\frac{\alpha}{1+\alpha}(1-\delta^{J})})$, where  $\delta\in(2\alpha/(1+3\alpha),1)$ is a tuning parameter.   Beyond knowing $\alpha$, achieving the aforementioned convergence rates do not require to tune the algorithms' parameters according to the specific functions  $F$ and $f$ at hand, and  we exhibit a simple noisy optimization problem for which stochastic gradient is not guaranteed to converge while the algorithms discussed in this work are.
3: 
4: 
5: %\textit{Keywords:} Noisy optimization, stochastic gradient, sample average approximation, retrospective approximation. 
6: \end{abstract}
7: