We consider the inference for the ranking of large language models (LLMs). Alignment arises as a big challenge to mitigate hallucinations in the use of LLMs. Ranking LLMs has been shown as a well-performing tool to improve alignment based on the best-of-$N$ policy. In this paper, we propose a new inferential framework for testing hypotheses and constructing confidence intervals of the ranking of language models. We consider the widely adopted Bradley-Terry-Luce (BTL) model, where each item is assigned a positive preference score that determines its pairwise comparisons' outcomes. We further extend it into the contextual setting, where the score of each model varies with the prompt. We show the convergence rate of our estimator. By extending the current Gaussian multiplier bootstrap theory to accommodate the supremum of not identically distributed empirical processes, we construct the confidence interval for ranking and propose a valid testing procedure. We also introduce the confidence diagram as a global ranking property. We conduct numerical experiments to assess the performance of our method.
翻译:本文研究大语言模型(LLM)排序的推断问题。在LLM应用中,对齐是缓解幻觉现象的重大挑战。基于最佳-$N$策略的LLM排序已被证明是提升对齐效果的有效工具。本文提出一种新的推断框架,用于检验语言模型排序的假设并构建其置信区间。我们采用广泛使用的Bradley-Terry-Luce(BTL)模型,其中每个项目被赋予决定其成对比较结果的正偏好分数。我们进一步将其扩展至上下文情境,使每个模型的分数随提示变化。我们证明了估计量的收敛速率。通过将现有高斯乘子自助法理论拓展至可处理非独立同分布经验过程的上确界,我们构建了排序的置信区间并提出了有效的检验流程。同时引入置信图作为全局排序特性的表征。我们通过数值实验评估了所提方法的性能。