296908c6a4863c47.tex
1: \begin{abstract}
2: Large language models (LLMs) exhibit positional bias in how they use context, which especially affects listwise ranking.
3: To address this, we propose \textit{permutation self-consistency}, a form of self-consistency over the ranking list outputs of black-box LLMs.
4: Our key idea is to marginalize out different list orders in the prompt to produce an order-independent ranking with less positional bias.
5: First, given some input prompt, we repeatedly shuffle the list in the prompt and pass it through the LLM while holding the instructions the same.
6: Next, we aggregate the resulting sample of rankings by computing the central ranking closest in distance to all of them, marginalizing out prompt order biases in the process.
7: Theoretically, we prove the robustness of our method, showing convergence to the true ranking under random perturbations.
8: Empirically, on five datasets in sorting and passage reranking, our approach improves scores from conventional inference by up to 34--52\% for Mistral, 7--18\% for GPT-3.5, 8--16\% for LLaMA v2 (70B).
9: Our code is at \url{https://github.com/castorini/perm-sc}.
10: \end{abstract}
11: