abstract:3a38d68b0297438a.tex

1: \begin{abstract}

2: Multi-objective reinforcement learning (MORL) is an extension of ordinary,

3: single-objective reinforcement learning (RL)

4: that is applicable to many real-world tasks where multiple objectives exist without known relative costs.

5: % Existing MORL methods  poor scaling with the number of objectives.

6: We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives.

7: Existing methods require strong assumptions such as exact knowledge of the multi-objective Markov decision process,

8: and are analyzed in the limit of infinite data and time.

9: We propose a new algorithm called \emph{model-based envelop value iteration (EVI)},

10: which generalizes the enveloped multi-objective $Q$-learning algorithm in \citet{yang2019generalized}.

11: Our method can learn a near-optimal value function with polynomial sample complexity and linear convergence speed.

12: To the best of our knowledge, this is the first finite-sample analysis of MORL algorithms.

13: \end{abstract}

14: