f7432788e1221bb9.tex
1: \begin{abstract}
2:     We consider the problem of 
3:     continuous-time policy evaluation.
4:     This consists in learning through observations
5:     the value function associated
6:     to an uncontrolled
7:     continuous-time stochastic dynamic and a reward function.
8:     We propose two original variants of the
9:     well-known TD(0) method
10:     using vanishing time steps.
11:     One is model-free and
12:     the other is model-based.
13:     For both methods,
14:     we prove theoretical convergence rates
15:     that we subsequently verify through numerical simulations.
16:     Alternatively, 
17:     those methods can be interpreted
18:     as novel reinforcement learning approaches
19:     for approximating solutions of
20:     linear PDEs (partial differential equations)
21:     or linear BSDEs (backward stochastic differential equations).
22: \end{abstract}
23: