abstract:296908c6a4863c47.tex

1: \begin{abstract}

2: Large language models (LLMs) exhibit positional bias in how they use context, which especially affects listwise ranking.

3: To address this, we propose \textit{permutation self-consistency}, a form of self-consistency over the ranking list outputs of black-box LLMs.

4: Our key idea is to marginalize out different list orders in the prompt to produce an order-independent ranking with less positional bias.

5: First, given some input prompt, we repeatedly shuffle the list in the prompt and pass it through the LLM while holding the instructions the same.

6: Next, we aggregate the resulting sample of rankings by computing the central ranking closest in distance to all of them, marginalizing out prompt order biases in the process.

7: Theoretically, we prove the robustness of our method, showing convergence to the true ranking under random perturbations.

8: Empirically, on five datasets in sorting and passage reranking, our approach improves scores from conventional inference by up to 34--52\% for Mistral, 7--18\% for GPT-3.5, 8--16\% for LLaMA v2 (70B).

9: Our code is at \url{https://github.com/castorini/perm-sc}.

10: \end{abstract}

11: