8785e1f63421368a.tex
1: \begin{abstract}
2: We study the exploratory Hamilton--Jacobi--Bellman (HJB) equation arising from the entropy-regularized exploratory control problem, which was formulated by Wang, Zariphopoulou and Zhou (J. Mach. Learn. Res., 21, 2020) in the context of reinforcement learning in continuous time and space.
3: We establish the well-posedness and regularity of the viscosity solution to 
4: the equation, as well as the convergence of the exploratory control problem to the classical stochastic control problem when the level of exploration decays to zero.
5: We then apply the general results to the exploratory temperature control problem, which was introduced by Gao, Xu and Zhou (arXiv:2005.04057, 2020) to design an endogenous temperature schedule for simulated annealing (SA) in the context of non-convex optimization.
6: We derive an explicit rate of convergence for this problem as exploration diminishes to zero, and find that the steady state of the optimally controlled process exists, which is however 
7: neither a Dirac mass  on the global optimum  nor  a Gibbs measure.
8: \end{abstract}
9: