744ebd304d31d2f6.tex
1: \begin{abstract} Motivated by applications arising from large scale
2: optimization and machine learning, we consider stochastic quasi-Newton
3: (SQN) methods for solving unconstrained convex optimization problems.
4: Much of the convergence analysis of SQN methods, in both full and
5: limited-memory regimes, requires the objective function to be strongly
6: convex. However, this assumption is fairly restrictive and does not
7: hold in many applications. To the best of our knowledge, no rate
8: statements currently exist for SQN methods in the absence of such an
9: assumption. Also, among the existing first-order methods for addressing
10: stochastic optimization problems with merely convex objectives, those
11: equipped with provable convergence rates employ averaging. However,
12: this averaging technique has a detrimental impact on inducing sparsity.
13: Motivated by these gaps, we consider optimization problems with
14: non-strongly convex objectives with Lipschitz but possibly unbounded
15: gradients.  The main contributions of the paper are as follows: (i) To
16: address large scale stochastic optimization problems, we develop an
17: iteratively regularized stochastic limited-memory BFGS (IRS-LBFGS)
18: algorithm, where the stepsize, regularization parameter, and the
19: Hessian inverse \fy{approximation} are updated iteratively. We
20: establish convergence of the iterates (with no averaging) to an optimal
21: solution of the original problem both in an almost-sure sense and in a
22: mean sense. The convergence rate is derived in terms of the objective
23: function values and is shown to be
24: $\mathcal{O}\left(1/k^{\left({1}/{3}-\e\right)}\right)$, where $\e$ is
25: an arbitrary small positive scalar; (ii) In deterministic regimes, we
26: show that \fys{the algorithm} displays a rate
27: \fys{$\mathcal{O}({1}/{k^{1-\e}})$. We present numerical experiments}
28: performed on a large-scale text classification problem \fy{and compare
29: IRS-LBFGS with standard SQN methods as well as first-order methods such
30: as SAGA and IAG}. 
31: \end{abstract}
32: