1: \begin{abstract}
2: This study investigates clustered federated learning (FL), one of the formulations of FL with non-i.i.d. data, where the devices are partitioned into clusters and each cluster optimally fits its data with a localized model. We propose a novel clustered FL framework, which applies a nonconvex penalty to pairwise differences of parameters. This framework can automatically identify clusters without \emph{a priori} knowledge of the number of clusters and the set of devices in each cluster. To implement the proposed framework, we develop a novel clustered FL method called {FPFC}. Advancing from the standard ADMM, our method is implemented in parallel, updates only a subset of devices at each communication round, and allows each participating device to perform a variable amount of work. This greatly reduces the communication cost while simultaneously preserving privacy, making it practical for FL. We also propose a new warmup strategy for hyperparameter tuning under FL settings and consider the asynchronous variant of {FPFC} ({asyncFPFC}). Theoretically, we provide convergence guarantees of {FPFC} for general nonconvex losses and establish the statistical convergence rate under a linear model with squared loss. Our extensive experiments demonstrate the advantages of {FPFC} over existing methods.
3: \end{abstract}
4: