1: \begin{abstract}
2: In this work, we generalized and unified recent two completely different works of Jascha~\cite{sohl2014fast} and Lee~\cite{lee2012proximal} respectively into one by proposing the \textbf{prox}imal s\textbf{to}chastic \textbf{N}ewton-type gradient (PROXTONE) method for optimizing the sums of two convex functions: one is the average of a huge number of smooth convex functions, and the other is a non-smooth convex function. While a set of recently proposed proximal stochastic gradient methods, include MISO, Prox-SDCA, Prox-SVRG, and SAG, converge at linear rates, the PROXTONE incorporates second order information to obtain stronger convergence results, that it achieves a linear convergence rate not only in the value of the objective function, but also in the \emph{solution}. The proof is simple and intuitive, and the results and technique can be served as a initiate for the research on the proximal stochastic methods that employ second order information.
3: %We establish the $q$-superlinear convergence rate for PROXTONE.
4: %Such problems often arise in machine learning, known as regularized empirical risk minimization.
5: %This paper will focus on the convergence analysis, and applications will be put in another paper.
6: %We first establish $O(1/\sqrt{T})$ regret bounds for batch Douglas-Rachford splitting method.
7: \end{abstract}
8: