5d876dfd379b67bd.tex
1: \begin{abstract}
2:   The minimization of convex objectives coming from linear supervised learning problems, such as
3:   penalized generalized linear models, can be formulated as finite sums of convex functions.
4:   For such problems, a large set of stochastic first-order solvers based on the idea of variance
5:   reduction are available and combine both computational efficiency and sound theoretical
6:   guarantees (linear convergence rates) \cite{johnson2013accelerating},
7:   \cite{schmidt2013minimizing}, \cite{shalev2013stochastic}, \cite{defazio2014saga}.
8:   Such rates are obtained under both gradient-Lipschitz and strong convexity
9:   assumptions.
10:   Motivated by learning problems that do not meet the gradient-Lipschitz assumption, such as
11:   linear Poisson regression, we work under another smoothness assumption, and
12:   obtain a linear convergence rate for a shifted version of Stochastic Dual Coordinate Ascent
13:   (SDCA) \cite{shalev2013stochastic} that improves the current state-of-the-art.
14:   Our motivation for considering a solver working on the Fenchel-dual problem comes from the
15:   fact that such objectives include many linear constraints, that are easier to deal with in the
16:   dual.
17:   Our approach and theoretical findings are validated on several datasets, for Poisson regression
18:   and another objective coming from the negative log-likelihood of the Hawkes process, which is a
19:   family of models which proves extremely useful for the modeling of information
20:   propagation in social networks and causality
21:   inference \cite{de2016learning}, \cite{farajtabar2015coevolve}.
22: \end{abstract}
23: