abstract:87eda4d2c7be9f0e.tex

1: \begin{abstract}

2:     To mitigate the limitation that the classical reinforcement learning (RL) framework heavily relies on identical training and test environments, Distributionally Robust RL (DRRL) has been proposed to enhance performance across a range of environments, possibly including unknown test environments.

3:     As a price for robustness gain, DRRL involves optimizing over a set of distributions, which is inherently more challenging than optimizing over a fixed distribution in the non-robust case.

4:     Existing DRRL algorithms are either model-based or fail to learn from a single sample trajectory.

5:     In this paper, we design a first fully model-free DRRL algorithm, called \emph{distributionally robust Q-learning with single trajectory (DRQ)}.

6:     We delicately design a multi-timescale framework to fully utilize each incrementally arriving sample and directly learn the optimal distributionally robust policy without modeling the environment, thus the algorithm can be trained along a single trajectory in a model-free fashion.

7:     Despite the algorithm's complexity, we provide asymptotic convergence guarantees by generalizing classical stochastic approximation tools.

8:     Comprehensive experimental results demonstrate the superior robustness and sample complexity of our proposed algorithm, compared to non-robust methods and other robust RL algorithms.

9: \end{abstract}

10: