b455ca134dcb71c5.tex
1: \begin{abstract}
2: While classical forms of stochastic gradient descent algorithm treat the different coordinates in the same way, a framework allowing for adaptive (\textit{non uniform}) coordinate sampling is developed to leverage structure in data. In a non-convex setting and including zeroth order gradient estimate, almost sure convergence as well as non-asymptotic bounds are established. Within the proposed framework, we develop an algorithm, MUSKETEER, based on a reinforcement strategy: after collecting information on the noisy gradients, it samples the most promising coordinate (\textit{all for one}); then it moves along the one direction yielding an important decrease of the objective (\textit{one for all}). Numerical experiments on both synthetic and real data examples confirm the effectiveness of MUSKETEER in large scale problems.
3: \end{abstract}
4: