9586bb62058ae402.tex
1: \begin{abstract}
2: In this paper, we propose a first-order distributed optimization algorithm that is provably robust to Byzantine failures—arbitrary and potentially adversarial behavior, where all the participating agents are prone to failure. We model each agent's state over time as a two state Markov chain that indicates Byzantine or trustworthy behaviours at different time instants. We set no restrictions on the maximum number of Byzantine agents at any given time. We design our method based on three layers of defense: 1) Temporal gradient averaging, 2) robust aggregation, and 3) gradient normalization. 
3: % We first employ a robust mean estimator on each agent's past gradient data over a finite window to compute a robustified gradient. Next, we estimate the aggregate gradient by utilizing the robust mean estimator among all the agents' robustified gradients. Lastly, we normalize the aggregate gradient so that we only use the directional information to prevent large updates in case corrupt gradients get past the first two layers of defense. 
4: We study two settings for stochastic optimization, namely Sample Average Approximation and Stochastic Approximation, and prove that for strongly convex and smooth non-convex cost functions, our algorithm achieves order-optimal statistical error and convergence rates. 
5: \end{abstract}
6: