abstract:16d88823c3f79382.tex

1: \begin{abstract}

2:

3: In this study, we consider the application of max-plus-linear approximators for Q-function in offline reinforcement learning of discounted Markov decision processes.

4: In particular, we incorporate these approximators to propose novel fitted Q-iteration (FQI) algorithms with provable convergence.

5: Exploiting the compatibility of the Bellman operator with max-plus operations, we show that the max-plus-linear regression within each iteration of the proposed FQI algorithm reduces to simple max-plus matrix-vector multiplications.

6: We also consider the variational implementation of the proposed algorithm which leads to a per-iteration complexity that is independent of the number of samples.

7:

8: \end{abstract}

9: