Large language models (LLMs) exhibit positional bias in how they use context, which especially complicates listwise ranking. To address this, we propose permutation self-consistency, a form of self-consistency over ranking list outputs of black-box LLMs. Our key idea is to marginalize out different list orders in the prompt to produce an order-independent ranking with less positional bias. First, given some input prompt, we repeatedly shuffle the list in the prompt and pass it through the LLM while holding the instructions the same. Next, we aggregate the resulting sample of rankings by computing the central ranking closest in distance to all of them, marginalizing out prompt order biases in the process. Theoretically, we prove the robustness of our method, showing convergence to the true ranking in the presence of random perturbations. Empirically, on five list-ranking datasets in sorting and passage reranking, our approach improves scores from conventional inference by up to 7-18% for GPT-3.5 and 8-16% for LLaMA v2 (70B), surpassing the previous state of the art in passage reranking. Our code is at https://github.com/castorini/perm-sc.
翻译:大型语言模型(LLM)在利用上下文时存在位置偏差,这尤其使列表排序任务复杂化。为解决此问题,我们提出排列自一致性——一种针对黑盒LLM列表排序输出的自一致性方法。其核心思想是通过在提示中边缘化不同列表顺序,生成无位置偏差的排序结果。具体而言,给定输入提示后,我们重复随机打乱列表顺序并保持指令不变,将每次结果输入LLM;随后,通过计算与所有排序样本距离最小的中心排序来聚合结果,从而消除提示顺序偏差。理论上,我们证明了该方法对随机扰动的鲁棒性,能够收敛至真实排序。在五个列表排序数据集(包括排序与段落重排序任务)上的实验表明,该方法相较于传统推理,使GPT-3.5的得分提升7-18%,LLaMA v2(70B)提升8-16%,并在段落重排序中超越此前最优结果。代码见https://github.com/castorini/perm-sc。