abstract:f7432788e1221bb9.tex

1: \begin{abstract}

2:     We consider the problem of

3:     continuous-time policy evaluation.

4:     This consists in learning through observations

5:     the value function associated

6:     to an uncontrolled

7:     continuous-time stochastic dynamic and a reward function.

8:     We propose two original variants of the

9:     well-known TD(0) method

10:     using vanishing time steps.

11:     One is model-free and

12:     the other is model-based.

13:     For both methods,

14:     we prove theoretical convergence rates

15:     that we subsequently verify through numerical simulations.

16:     Alternatively,

17:     those methods can be interpreted

18:     as novel reinforcement learning approaches

19:     for approximating solutions of

20:     linear PDEs (partial differential equations)

21:     or linear BSDEs (backward stochastic differential equations).

22: \end{abstract}

23: