1: \begin{abstract}
2: Interest in stochastic zeroth-order (SZO) methods has recently been revived in black-box optimization scenarios such as adversarial black-box attacks to deep neural networks. SZO methods only require the ability to evaluate the objective function at random input points, however, their weakness is the dependency of their convergence speed on the dimensionality of the function to be evaluated. We present a sparse SZO optimization method that reduces this factor to the expected dimensionality of the random perturbation during learning. We give a proof that justifies this reduction for sparse SZO optimization for non-convex functions. Furthermore, we present experimental results for neural networks on MNIST and CIFAR that show empirical sparsity of true gradients, and faster convergence in training loss and test accuracy and a smaller distance of the gradient approximation to the true gradient in sparse SZO compared to dense SZO. %We find similar performance for inducing sparsity by magnitude masking or random masking, but find improved convergence for a variant that applies weight freezing instead of pruning to non-perturbed weights.
3:
4: \keywords{Nonconvex Optimization \and Gradient-free Optimization \and Zeroth-order Optimization.}
5: \end{abstract}
6: