abstract:553efb7fa3d7d914.tex

1: \begin{abstract}

2: 	We describe a new approach for managing aleatoric uncertainty in the Reinforcement Learning (\textsc{rl}) paradigm. Instead of selecting actions according to a single statistic, we propose a distributional method based on the second-order stochastic dominance (\textsc{ssd}) relation. This compares the inherent dispersion of random returns induced by actions, producing a comprehensive evaluation of the environment's uncertainty. The necessary conditions for \textsc{ssd} require estimators to predict accurate second moments. To accommodate this, we map the distributional \textsc{rl} problem to a Wasserstein gradient flow, treating the distributional Bellman residual as a potential energy functional. We propose a particle-based algorithm for which we prove optimality and convergence. Our experiments characterize the algorithm's performance and demonstrate how uncertainty and performance are better balanced using an \textsc{ssd} policy than with other risk measures.

3: \end{abstract}

4: