3a38d68b0297438a.tex
1: \begin{abstract}
2: Multi-objective reinforcement learning (MORL) is an extension of ordinary,
3: single-objective reinforcement learning (RL)
4: that is applicable to many real-world tasks where multiple objectives exist without known relative costs.
5: % Existing MORL methods  poor scaling with the number of objectives.
6: We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives.
7: Existing methods require strong assumptions such as exact knowledge of the multi-objective Markov decision process,
8: and are analyzed in the limit of infinite data and time.
9: We propose a new algorithm called \emph{model-based envelop value iteration (EVI)},
10: which generalizes the enveloped multi-objective $Q$-learning algorithm in \citet{yang2019generalized}.
11: Our method can learn a near-optimal value function with polynomial sample complexity and linear convergence speed.
12: To the best of our knowledge, this is the first finite-sample analysis of MORL algorithms. 
13: \end{abstract}
14: