abstract:a2e117dcaf0656cc.tex

1: \begin{abstract}

2: \Ac{FL} is a machine learning approach where nodes collaboratively train a global model.

3: As more nodes participate in a round of \ac{FL}, the effectiveness of individual model updates by nodes also diminishes.

4: In this study, we increase the effectiveness of client updates by dividing the network into smaller partitions, or \emph{cohorts}.

5: We introduce Cohort-Parallel Federated Learning (\sys): a novel learning approach where each cohort independently trains a global model using \ac{FL}, until convergence, and the produced models by each cohort are then unified using one-shot \ac{KD} and a cross-domain, unlabeled dataset.

6: The insight behind \sys is that smaller, isolated networks converge quicker than in a one-network setting where all nodes participate.

7: Through exhaustive experiments involving realistic traces and non-IID data distributions on the \cifar and \femnist image classification tasks, we investigate the balance between the number of cohorts, model accuracy, training time, and compute and communication resources.

8: Compared to traditional \ac{FL}, \sys with four cohorts, non-IID data distribution, and \cifar yields a 1.9$\times$ reduction in train time and a 1.3$\times$ reduction in resource usage, with a minimal drop in test accuracy.

9: \end{abstract}

10: