Language models (LMs) are known to be prone to response biases, which present as option preference biases in fixed-response questions. It is therefore imperative to develop low-cost and effective response bias correction methods to improve LM performance and enable more accurate evaluations of model abilities. Here, we propose a simple response bias correction strategy ($\texttt{RBCorr}$) and test it on 12 open-weight language models using yes-no, entailment, and multiple choice questions. We show that response bias is prevalent in LMs pre-correction and that $\texttt{RBCorr}$ effectively eliminates bias and boosts model performance. We also explore the generalizability of bias behavior across models, datasets, and prompt formats, showing that LogProbs-based correction is highly dependent on all three of these aspects. Overall, $\texttt{RBCorr}$ is an easy-to-use method that can boost the performance of smaller LMs and ensure that LM performance on closed-response benchmarks aligns more closely with their true capabilities.
翻译:众所周知,语言模型(LMs)容易产生响应偏差,这种偏差在固定响应问题中表现为选项偏好偏差。因此,亟需开发低成本且有效的响应偏差校正方法,以提升语言模型的性能,并实现对模型能力更准确的评估。本文提出了一种简单的响应偏差校正策略($\texttt{RBCorr}$),并在12个开源权重语言模型上使用是非问题、蕴含问题和多项选择题进行了测试。我们证明,在校正前,语言模型中普遍存在响应偏差,而$\texttt{RBCorr}$能有效消除偏差并提升模型性能。我们还探讨了偏差行为在模型、数据集和提示格式之间的泛化性,表明基于LogProbs的校正高度依赖于这三个方面。总体而言,$\texttt{RBCorr}$是一种易于使用的方法,可以提升较小语言模型的性能,并确保语言模型在封闭响应基准测试上的表现与其真实能力更为一致。