6fc403cebb6d4167.tex
1: \begin{abstract}
2: Large-scale machine learning problems make the cost of hyperparameter tuning
3: ever more prohibitive. This creates a need for algorithms that can tune
4: themselves on-the-fly. We formalize the notion of \emph{``tuning-free''}
5: algorithms that can match the performance of optimally-tuned optimization
6: algorithms up to polylogarithmic factors given only loose hints on the relevant
7: problem parameters. We consider in particular algorithms that can match
8: optimally-tuned Stochastic Gradient Descent (SGD). When the domain of
9: optimization is bounded, we show tuning-free matching of SGD is possible and
10: achieved by several existing algorithms. We prove that for the task of
11: minimizing a convex and smooth or Lipschitz function over an unbounded domain,
12: tuning-free optimization is impossible. We discuss conditions under which
13: tuning-free optimization is possible even over unbounded domains. In particular,
14: we show that the recently proposed DoG and DoWG algorithms are tuning-free when
15: the noise distribution is sufficiently well-behaved. For the task of finding a
16: stationary point of a smooth and potentially nonconvex function, we give a
17: variant of SGD that matches the best-known high-probability convergence rate for
18: tuned SGD at only an additional polylogarithmic cost. However, we also give an
19: impossibility result that shows no algorithm can hope to match the optimal
20: expected convergence rate for tuned SGD with high probability.
21: \end{abstract}
22: