1: \begin{abstract}
2: Recently a majorization method for optimizing partition functions of
3: log-linear models was proposed alongside a novel quadratic
4: variational upper-bound. In the batch setting, it outperformed
5: state-of-the-art first- and second-order optimization methods on
6: various learning tasks. We propose a stochastic version of this
7: bound majorization method as well as a low-rank modification for
8: high-dimensional data-sets. The resulting stochastic second-order
9: method outperforms stochastic gradient descent (across variations
10: and various tunings) both in terms of the number of iterations and
11: computation time till convergence while finding a better quality
12: parameter setting. The proposed method bridges first- and
13: second-order stochastic optimization methods by maintaining a
14: computational complexity that is linear in the data dimension and
15: while exploiting second order information about the pseudo-global
16: curvature of the objective function (as opposed to the local
17: curvature in the Hessian).
18: \end{abstract}
19: