8f8c9df6e8a58983.tex
1: \begin{abstract}
2: We analyse the regret arising from learning the price sensitivity parameter $\kappa$ of liquidity takers in the ergodic version of the Avellaneda--Stoikov market making model.
3: We show that a learning algorithm based on a regularised maximum-likelihood estimator for the parameter achieves the regret upper bound of order $\ln^2 T$ in expectation.
4: To obtain the result we need two key ingredients. 
5: The first are tight upper bounds on the derivative of the ergodic constant in the  Hamilton--Jacobi--Bellman (HJB) equation with respect to $\kappa$.
6: The second is the learning rate of the maximum-likelihood estimator which is obtained from concentration inequalities for Bernoulli signals. 
7: Numerical experiment confirms the convergence and the robustness of the proposed algorithm. 
8: \end{abstract}
9: