1: \begin{abstract}
2: Learning in multi-agent systems is highly challenging due to the inherent complexity introduced by agents' interactions. We tackle systems with a huge population of interacting agents (e.g., swarms) via \emph{Mean-Field Control (MFC)}. MFC considers an asymptotically infinite population of identical agents that aim to collaboratively maximize the collective reward.
3: Specifically, we consider the case of \emph{unknown} system dynamics where the goal is to simultaneously optimize for the rewards and learn from experience. We propose an efficient \emph{model-based} reinforcement learning algorithm \mfhucrl that runs in episodes and \emph{provably} solves this problem.
4: \mfhucrl uses \emph{upper-confidence} bounds to balance exploration and exploitation during policy learning. Our main theoretical contributions are the first general regret bounds for model-based RL for MFC, obtained via a novel mean-field type analysis. \mfhucrl can be instantiated with different models such as neural networks or Gaussian Processes, and effectively combined with neural network policy learning. We empirically demonstrate the convergence of \mfhucrl on the swarm motion problem of controlling an infinite population of agents seeking to maximize location-dependent reward and avoid congested areas.
5: \looseness=-1
6: \end{abstract}
7: