abstract:64d148851e0e21fb.tex

1: \begin{abstract}

2: We consider a model in which a trader aims to maximize expected risk-adjusted profit while trading a single security.  In our model, each price change is a linear combination of observed factors, impact resulting from the trader's current and prior activity, and unpredictable random effects.  The trader must learn coefficients of a price impact model while trading.  We propose a new method for simultaneous execution and learning -- the confidence-triggered regularized adaptive certainty equivalent (CTRACE) policy -- and establish a poly-logarithmic finite-time expected regret bound.  This bound implies that CTRACE is {\it efficient} in the sense that the $(\epsilon,\delta)$-convergence time is bounded by a polynomial function of $1/\epsilon$ and $\log(1/\delta)$ with high probability.  In addition, we demonstrate via Monte Carlo simulation that CTRACE outperforms the certainty equivalent policy and a recently proposed reinforcement learning algorithm that is designed to explore efficiently in linear-quadratic control problems.

3: \end{abstract}

4: