1: \begin{abstract}
2: In \ac{OLTR} the aim is to find an optimal ranking model by interacting with users.
3: When learning from user behavior, systems must interact with users while simultaneously learning from those interactions. Unlike other \ac{LTR} settings, existing research in this field has been limited to linear models. This is due to the speed-quality tradeoff that arises when selecting models: complex models are more expressive and can find the best rankings but need more user interactions to do so, a requirement that risks frustrating users during training. Conversely, simpler models can be optimized on fewer interactions and thus provide a better user experience, but they will converge towards suboptimal rankings. This tradeoff creates a deadlock, since novel models will not be able to improve either the user experience or the final convergence point, without sacrificing the other.
4:
5: Our contribution is twofold. First, we introduce a fast \ac{OLTR} model called Sim-MGD that addresses the speed aspect of the speed-quality tradeoff. Sim-MGD ranks documents based on similarities with reference documents. It converges rapidly and, hence, gives a better user experience but it does not converge towards the optimal rankings.
6: Second, we contribute \ac{C-MGD} for \acs{OLTR} that directly addresses the speed-quality tradeoff by using a cascade that enables combinations of the best of two worlds: fast learning and high quality final convergence. \ac{C-MGD} can provide the better user experience of Sim-MGD while maintaining the same convergence as the state-of-the-art \acs{MGD} model. This opens the door for future work to design new models for \acs{OLTR} without having to deal with the speed-quality tradeoff.
7: \end{abstract}
8: