1: \begin{abstract}
2: We prove the convergence of the law of grid-valued random walks, which can be seen as time-space Markov chains, to the law of a general diffusion process.
3: This includes processes with sticky features, reflecting or absorbing boundaries and skew behavior.
4: We prove that the convergence occurs at any rate strictly inferior to $(1/4) \wedge (1/p)$ in terms of the maximum cell size of the grid, for any $p$-Wasserstein distance.
5: We also show that it is possible to achieve any rate strictly inferior to $(1/2) \wedge (2/p)$ if the grid is adapted to the speed measure of the diffusion, which is optimal for $p\le 4 $.
6: This result allows us to set up asymptotically optimal approximation schemes for general diffusion processes.
7: Last, we experiment numerically on diffusions that exhibit various features.
8: \end{abstract}
9: