0557706bbc12b110.tex
1: \begin{abstract}
2: %Methods in \ac{OLTR} have so far only allowed for the optimization of linear models. We introduce a novel method with a differentiable loss, that allows for any differentiable model to be optimized.
3: 
4: \acf{OLTR} methods optimize rankers based on user interactions.
5: % efficiently and reliably.
6: State-of-the-art \acs{OLTR} methods are built specifically for linear models. 
7: Their approaches do not extend well to non-linear models such as neural networks.
8: We introduce an entirely novel approach to \acs{OLTR} that constructs a weighted differentiable pairwise loss after each interaction: \acf{\OurMethod}.
9: \ac{\OurMethod} breaks away from the traditional approach that relies on interleaving or multileaving and extensive sampling of models to estimate gradients.
10: Instead, its gradient is based on inferring preferences between document pairs from user clicks and can optimize any differentiable model.
11: We prove that the gradient of \acs{\OurMethod} is unbiased w.r.t.\ user document pair preferences.
12: Our experiments on the largest publicly available \ac{LTR} datasets show considerable and significant improvements under all levels of interaction noise.
13: \acs{\OurMethod} outperforms existing \ac{OLTR} methods both in terms of learning speed as well as final convergence.
14: Furthermore, unlike previous \ac{OLTR} methods, \ac{\OurMethod} also allows for non-linear models to be optimized effectively.
15: Our results show that using a neural network leads to even better performance at convergence than a linear model.
16: In summary, \acs{\OurMethod} is an efficient and unbiased \ac{OLTR} approach that provides a better user experience than previously possible.
17: \end{abstract}
18: