abstract:6fc403cebb6d4167.tex

1: \begin{abstract}

2: Large-scale machine learning problems make the cost of hyperparameter tuning

3: ever more prohibitive. This creates a need for algorithms that can tune

4: themselves on-the-fly. We formalize the notion of \emph{``tuning-free''}

5: algorithms that can match the performance of optimally-tuned optimization

6: algorithms up to polylogarithmic factors given only loose hints on the relevant

7: problem parameters. We consider in particular algorithms that can match

8: optimally-tuned Stochastic Gradient Descent (SGD). When the domain of

9: optimization is bounded, we show tuning-free matching of SGD is possible and

10: achieved by several existing algorithms. We prove that for the task of

11: minimizing a convex and smooth or Lipschitz function over an unbounded domain,

12: tuning-free optimization is impossible. We discuss conditions under which

13: tuning-free optimization is possible even over unbounded domains. In particular,

14: we show that the recently proposed DoG and DoWG algorithms are tuning-free when

15: the noise distribution is sufficiently well-behaved. For the task of finding a

16: stationary point of a smooth and potentially nonconvex function, we give a

17: variant of SGD that matches the best-known high-probability convergence rate for

18: tuned SGD at only an additional polylogarithmic cost. However, we also give an

19: impossibility result that shows no algorithm can hope to match the optimal

20: expected convergence rate for tuned SGD with high probability.

21: \end{abstract}

22: