abstract:744ebd304d31d2f6.tex

1: \begin{abstract} Motivated by applications arising from large scale

2: optimization and machine learning, we consider stochastic quasi-Newton

3: (SQN) methods for solving unconstrained convex optimization problems.

4: Much of the convergence analysis of SQN methods, in both full and

5: limited-memory regimes, requires the objective function to be strongly

6: convex. However, this assumption is fairly restrictive and does not

7: hold in many applications. To the best of our knowledge, no rate

8: statements currently exist for SQN methods in the absence of such an

9: assumption. Also, among the existing first-order methods for addressing

10: stochastic optimization problems with merely convex objectives, those

11: equipped with provable convergence rates employ averaging. However,

12: this averaging technique has a detrimental impact on inducing sparsity.

13: Motivated by these gaps, we consider optimization problems with

14: non-strongly convex objectives with Lipschitz but possibly unbounded

15: gradients.  The main contributions of the paper are as follows: (i) To

16: address large scale stochastic optimization problems, we develop an

17: iteratively regularized stochastic limited-memory BFGS (IRS-LBFGS)

18: algorithm, where the stepsize, regularization parameter, and the

19: Hessian inverse \fy{approximation} are updated iteratively. We

20: establish convergence of the iterates (with no averaging) to an optimal

21: solution of the original problem both in an almost-sure sense and in a

22: mean sense. The convergence rate is derived in terms of the objective

23: function values and is shown to be

24: $\mathcal{O}\left(1/k^{\left({1}/{3}-\e\right)}\right)$, where $\e$ is

25: an arbitrary small positive scalar; (ii) In deterministic regimes, we

26: show that \fys{the algorithm} displays a rate

27: \fys{$\mathcal{O}({1}/{k^{1-\e}})$. We present numerical experiments}

28: performed on a large-scale text classification problem \fy{and compare

29: IRS-LBFGS with standard SQN methods as well as first-order methods such

30: as SAGA and IAG}.

31: \end{abstract}

32: