abstract:9ebc0fef26fb5391.tex

1: \begin{abstract}

2: One of the fundamental assumptions in stochastic control of continuous time processes is that the dynamics of the underlying (diffusion) process is known.

3: This is, however, usually obviously not fulfilled in practice.

4: On the other hand, over the last decades, a rich theory for nonparametric estimation of the drift (and volatility) for continuous time processes has been developed.

5: The aim of this paper is bringing together techniques from stochastic control with methods from statistics for stochastic processes to find a way to both learn the dynamics of the underlying process and control in a reasonable way at the same time.

6: More precisely, we study a long-term average impulse control problem, a stochastic version of the classical Faustmann timber harvesting problem.

7: One of the problems that immediately arises is an exploration-exploitation dilemma as is well known for problems in machine learning.

8: We propose a way to deal with this issue by combining exploration and exploitation periods in a suitable way.

9: Our main finding is that this construction can be based on the rates of convergence of estimators for the invariant density.

10: Using this, we obtain that the average cumulated regret is of uniform order $O({T^{-1/3}})$.

11: \end{abstract}