abstract:81b49c1f4ef9a509.tex

1: \begin{abstract}%

2: %Stochastic gradient-based optimization is a crucial technology in machine learning to optimize neural networks.

3: Stochastic gradient-based optimization is crucial to optimize neural networks.

4: While popular approaches heuristically adapt the step size and direction by rescaling gradients, a more principled approach to improve optimizers requires second-order information.

5: Such methods precondition the gradient using the objective's Hessian.

6: Yet, computing the Hessian is usually expensive and effectively using second-order information in the stochastic gradient setting is non-trivial.

7: We propose using Information-Theoretic Trust Region Optimization (\ittr) for improved updates with uncertain second-order information.

8: By modeling the network parameters as a  Gaussian distribution and using a Kullback-Leibler divergence-based trust region, our approach takes bounded steps accounting for the objective's curvature and uncertainty in the parameters.

9: Before each update, it solves the trust region problem for an optimal step size, resulting in a more stable and faster optimization process.

10: We approximate the diagonal elements of the Hessian from stochastic gradients using a simple recursive least squares approach, constructing a model of the expected Hessian over time using only first-order information.

11: We show that \ittr{} combines the fast convergence of adaptive moment-based optimization with the generalization capabilities of SGD.

12: \end{abstract}

13: