1: \begin{abstract}
2: Federated Learning (FL) is a distributed learning paradigm that scales on-device learning collaboratively and privately.
3: Standard FL algorithms such as \fedavg are primarily geared towards \emph{smooth unconstrained} settings.
4: In this paper, we study the \emph{Federated Composite Optimization} (FCO) problem, in which the loss function contains a non-smooth regularizer.
5: Such problems arise naturally in FL applications that involve sparsity, low-rank, monotonicity, or more general constraints.
6: We first show that straightforward extensions of primal algorithms such as \fedavg are not well-suited for FCO since they suffer from the ``curse of primal averaging,'' resulting in poor convergence.
7: As a solution, we propose a new primal-dual algorithm, \emph{Federated Dual Averaging} (\feddualavg), which by employing a novel server dual averaging procedure
8: circumvents the curse of primal averaging.
9: Our theoretical analysis and empirical experiments demonstrate that \feddualavg outperforms the other baselines.
10: \end{abstract}
11: