Multiple choice questions (MCQs) serve as a common yet important task format in the research of large language models (LLMs). This work shows that LLMs are vulnerable to option position changes in MCQs due to their inherent "selection bias", namely, they prefer to select specific option IDs as answers (like "Option A"). Through extensive empirical analyses with 20 LLMs on three benchmarks, we pinpoint that this behavioral bias primarily stems from LLMs' token bias, where the model a priori assigns more probabilistic mass to specific option ID tokens (e.g., A/B/C/D) when predicting answers from the option IDs. To mitigate selection bias, we propose a label-free, inference-time debiasing method, called PriDe, which separates the model's prior bias for option IDs from the overall prediction distribution. PriDe first estimates the prior by permutating option contents on a small number of test samples, which is then applied to debias the subsequent samples. We demonstrate that PriDe achieves superior debiasing effectiveness and computational efficiency to strong baselines. Furthermore, the prior estimated by PriDe is interpretable and can generalize well across different domains, highlighting its practical potential in broader scenarios.
翻译:多选题(MCQs)是大语言模型(LLMs)研究中常见且重要的任务形式。本研究表明,由于LLMs固有的"选择偏差"——即模型倾向于选择特定选项ID作为答案(如"选项A"),导致其对MCQ中的选项位置变化十分敏感。通过对三个基准测试中20个LLMs的广泛实证分析,我们证实这种行为偏差主要源于LLMs的词元偏差:模型在通过选项ID预测答案时,会先验地将更多概率质量分配给特定选项ID词元(如A/B/C/D)。为缓解选择偏差,我们提出一种无标签、推理时的去偏方法PriDe,该方法通过从整体预测分布中分离出模型对选项ID的先验偏差。PriDe首先在小样本测试集上通过排列选项内容估计先验偏差,随后将其应用于后续样本的去偏处理。我们证明PriDe在去偏效果和计算效率上均显著优于强基线方法。此外,PriDe估计的先验偏差具有可解释性,并能跨领域泛化,凸显了其在更广泛应用场景中的实践潜力。