Large language models (LLMs) often show unwarranted preference for certain choice options when responding to multiple-choice questions, posing significant reliability concerns in LLM-automated systems. To mitigate this selection bias problem, previous solutions utilized debiasing methods to adjust the model's input and/or output. Our work, in contrast, investigates the model's internal representation of the selection bias. Specifically, we introduce a novel debiasing approach, Bias Node Pruning (BNP), which eliminates the linear layer parameters that contribute to the bias. Furthermore, we present Auxiliary Option Injection (AOI), a simple yet effective input modification technique for debiasing, which is compatible even with black-box LLMs. To provide a more systematic evaluation of selection bias, we review existing metrics and introduce Choice Kullback-Leibler Divergence (CKLD), which addresses the insensitivity of the commonly used metrics to label imbalance. Experiments show that our methods are robust and adaptable across various datasets when applied to three LLMs.
翻译:大型语言模型(LLM)在回答多项选择题时,常对某些选项表现出无根据的偏好,这在LLM自动化系统中引发了严重的可靠性问题。为缓解此类选择偏差,先前研究多采用去偏方法调整模型的输入和/或输出。与之不同,本研究深入探究了模型内部对选择偏差的表征机制。具体而言,我们提出了一种新颖的去偏方法——偏差节点剪枝(BNP),该方法通过消除导致偏差的线性层参数来实现去偏。此外,我们提出了辅助选项注入(AOI),这是一种简单有效的输入修正去偏技术,即使对于黑盒LLM也具备兼容性。为更系统评估选择偏差,我们在梳理现有指标的基础上,提出了选择Kullback-Leibler散度(CKLD),该指标解决了常用度量对标签不平衡不敏感的问题。实验表明,当应用于三种LLM时,我们的方法在多种数据集上均表现出良好的鲁棒性与适应性。