1: \begin{abstract}
2: Recent work has proposed stochastic Plackett-Luce (PL) ranking models as a robust choice for optimizing relevance and fairness metrics.
3: Unlike their deterministic counterparts that require heuristic optimization algorithms, PL models are fully differentiable.
4: Theoretically, they can be used to optimize ranking metrics via stochastic gradient descent.
5: However, in practice, the computation of the gradient is infeasible because it requires one to iterate over all possible permutations of items.
6: Consequently, actual applications rely on approximating the gradient via sampling techniques.
7:
8: In this paper, we introduce a novel algorithm: PL-Rank, that estimates the gradient of a PL ranking model w.r.t.\ both relevance and fairness metrics.
9: Unlike existing approaches that are based on policy gradients, PL-Rank makes use of the specific structure of PL models and ranking metrics.
10: Our experimental analysis shows that PL-Rank has a greater sample-efficiency and is computationally less costly than existing policy gradients, resulting in faster convergence at higher performance.
11: PL-Rank further enables the industry to apply PL models for more relevant and fairer real-world ranking systems.
12: \end{abstract}
13: