1: \begin{abstract}
2: Heterogeneity across clients in federated learning (FL) usually hinders the optimization convergence and generalization performance when the aggregation of clients' knowledge occurs in the gradient space.
3: For example, clients may differ in terms of data distribution, network latency, input/output space, and/or model architecture, which can easily lead to the misalignment of their local gradients.
4: To improve the tolerance to heterogeneity, we propose a novel federated prototype learning (\algfont{FedProto}) framework in which the clients and server communicate the abstract class prototypes instead of the gradients.
5: \algfont{FedProto} aggregates the local prototypes collected from different clients, and then sends the global prototypes back to all clients to regularize the training of local models.
6: The training on each client aims to minimize the classification error on the local data while keeping the resulting local prototypes sufficiently close to the corresponding global ones.
7: Moreover, we provide a theoretical analysis to the convergence rate of \algfont{FedProto} under non-convex objectives.
8: In experiments, we propose a benchmark setting tailored for heterogeneous FL, with \algfont{FedProto} outperforming several recent FL approaches on multiple datasets.
9: \end{abstract}
10: