abstract:8785e1f63421368a.tex

1: \begin{abstract}

2: We study the exploratory Hamilton--Jacobi--Bellman (HJB) equation arising from the entropy-regularized exploratory control problem, which was formulated by Wang, Zariphopoulou and Zhou (J. Mach. Learn. Res., 21, 2020) in the context of reinforcement learning in continuous time and space.

3: We establish the well-posedness and regularity of the viscosity solution to

4: the equation, as well as the convergence of the exploratory control problem to the classical stochastic control problem when the level of exploration decays to zero.

5: We then apply the general results to the exploratory temperature control problem, which was introduced by Gao, Xu and Zhou (arXiv:2005.04057, 2020) to design an endogenous temperature schedule for simulated annealing (SA) in the context of non-convex optimization.

6: We derive an explicit rate of convergence for this problem as exploration diminishes to zero, and find that the steady state of the optimally controlled process exists, which is however

7: neither a Dirac mass  on the global optimum  nor  a Gibbs measure.

8: \end{abstract}

9: