1: \begin{abstract}
2: We consider the problem of
3: continuous-time policy evaluation.
4: This consists in learning through observations
5: the value function associated
6: to an uncontrolled
7: continuous-time stochastic dynamic and a reward function.
8: We propose two original variants of the
9: well-known TD(0) method
10: using vanishing time steps.
11: One is model-free and
12: the other is model-based.
13: For both methods,
14: we prove theoretical convergence rates
15: that we subsequently verify through numerical simulations.
16: Alternatively,
17: those methods can be interpreted
18: as novel reinforcement learning approaches
19: for approximating solutions of
20: linear PDEs (partial differential equations)
21: or linear BSDEs (backward stochastic differential equations).
22: \end{abstract}
23: