1: \begin{abstract}
2: The minimization of convex objectives coming from linear supervised learning problems, such as
3: penalized generalized linear models, can be formulated as finite sums of convex functions.
4: For such problems, a large set of stochastic first-order solvers based on the idea of variance
5: reduction are available and combine both computational efficiency and sound theoretical
6: guarantees (linear convergence rates) \cite{johnson2013accelerating},
7: \cite{schmidt2013minimizing}, \cite{shalev2013stochastic}, \cite{defazio2014saga}.
8: Such rates are obtained under both gradient-Lipschitz and strong convexity
9: assumptions.
10: Motivated by learning problems that do not meet the gradient-Lipschitz assumption, such as
11: linear Poisson regression, we work under another smoothness assumption, and
12: obtain a linear convergence rate for a shifted version of Stochastic Dual Coordinate Ascent
13: (SDCA) \cite{shalev2013stochastic} that improves the current state-of-the-art.
14: Our motivation for considering a solver working on the Fenchel-dual problem comes from the
15: fact that such objectives include many linear constraints, that are easier to deal with in the
16: dual.
17: Our approach and theoretical findings are validated on several datasets, for Poisson regression
18: and another objective coming from the negative log-likelihood of the Hawkes process, which is a
19: family of models which proves extremely useful for the modeling of information
20: propagation in social networks and causality
21: inference \cite{de2016learning}, \cite{farajtabar2015coevolve}.
22: \end{abstract}
23: