1: \begin{abstract}
2: \noindent In reinforcement learning (RL), the consideration of multivariate reward signals has led to fundamental advancements in multi-objective decision-making, transfer learning, and representation learning.
3: This work introduces the first oracle-free and computationally-tractable algorithms for provably convergent multivariate \emph{distributional} dynamic programming and temporal difference learning.
4: Our convergence rates match the familiar rates in the scalar reward setting, and additionally provide new insights into the fidelity of approximate return distribution representations as a function of the reward dimension.
5: Surprisingly, when the reward dimension is larger than $1$, we show that standard analysis of categorical TD learning fails, which we resolve with a novel projection onto the space of mass-$1$ signed measures.
6: Finally, with the aid of our technical results and simulations, we identify tradeoffs between distribution representations that influence the performance of multivariate distributional RL in practice.
7: \end{abstract}
8: