abstract:ed2a062c2372ced7.tex

1: \begin{abstract}

2: \Ac{CLTR} relies on exposure-based \ac{IPS}, a \acs{LTR}-specific adaptation of \ac{IPS} to correct for position bias.

3: While \ac{IPS} can provide unbiased and consistent estimates, it often suffers from high variance.

4: Especially when little click data is available, this variance can cause \ac{CLTR} to learn sub-optimal ranking behavior.

5: Consequently, existing \ac{CLTR} methods bring significant risks with them, as naively deploying their models can result in very negative user experiences.

6:

7: We introduce a novel risk-aware \ac{CLTR} method with theoretical guarantees for safe deployment.

8: We apply a novel exposure-based concept of risk regularization to \ac{IPS} estimation for \acs{LTR}.

9: Our risk regularization penalizes the mismatch between the ranking behavior of a learned model and a given safe model.

10: Thereby, it ensures that learned ranking models stay close to a trusted model, when there is high uncertainty in \ac{IPS} estimation,

11: which greatly reduces the risks during deployment.

12: Our experimental results demonstrate the efficacy of our proposed method, which is effective at avoiding initial periods of bad performance when little date is available, while also maintaining high performance at convergence.

13: For the \ac{CLTR} field, our novel exposure-based risk minimization method enables practitioners to adopt \acs{CLTR} methods in a safer manner that mitigates many of the risks attached to previous methods.

14: \end{abstract}

15: