1: \begin{abstract}
2: One of the fundamental assumptions in stochastic control of continuous time processes is that the dynamics of the underlying (diffusion) process is known.
3: This is, however, usually obviously not fulfilled in practice.
4: On the other hand, over the last decades, a rich theory for nonparametric estimation of the drift (and volatility) for continuous time processes has been developed.
5: The aim of this paper is bringing together techniques from stochastic control with methods from statistics for stochastic processes to find a way to both learn the dynamics of the underlying process and control in a reasonable way at the same time.
6: More precisely, we study a long-term average impulse control problem, a stochastic version of the classical Faustmann timber harvesting problem.
7: One of the problems that immediately arises is an exploration-exploitation dilemma as is well known for problems in machine learning.
8: We propose a way to deal with this issue by combining exploration and exploitation periods in a suitable way.
9: Our main finding is that this construction can be based on the rates of convergence of estimators for the invariant density.
10: Using this, we obtain that the average cumulated regret is of uniform order $O({T^{-1/3}})$.
11: \end{abstract}