1: \begin{abstract}
2: Federated learning heavily relies on distributed gradient descent techniques. In the situation where gradient information is not available, the gradients need to be estimated from zeroth-order information, which typically involves computing finite-differences along isotropic random directions.
3: This method suffers from high estimation errors, as the geometric features of the objective landscape may be overlooked during the isotropic sampling.
4: In this work, we propose a non-isotropic sampling method to improve the gradient estimation procedure.
5: Gradients in our method are estimated in a subspace spanned by historical trajectories of solutions, aiming to encourage the exploration of promising regions and hence improve the convergence.
6: The proposed method uses a covariance matrix for sampling which is a convex combination of two parts. The first part is a thin projection matrix containing the basis of the subspace which is designed to improve the exploitation ability. The second part is the historical trajectories.
7: We implement this method in zeroth-order federated settings, and show that the convergence rate aligns with existing ones while introducing no significant overheads in communication or local computation.
8: The effectiveness of our proposal is verified on several numerical experiments in comparison to several commonly-used zeroth-order federated optimization algorithms.
9: \end{abstract}
10: