Current evaluation of large language models (LLMs) overwhelmingly prioritizes accuracy; however, in real-world and safety-critical applications, the ability to abstain when uncertain is equally vital for trustworthy deployment. We introduce MedAbstain, a unified benchmark and evaluation protocol for abstention in medical multiple-choice question answering (MCQA) -- a discrete-choice setting that generalizes to agentic action selection -- integrating conformal prediction, adversarial question perturbations, and explicit abstention options. Our systematic evaluation of both open- and closed-source LLMs reveals that even state-of-the-art, high-accuracy models often fail to abstain with uncertain. Notably, providing explicit abstention options consistently increases model uncertainty and safer abstention, far more than input perturbations, while scaling model size or advanced prompting brings little improvement. These findings highlight the central role of abstention mechanisms for trustworthy LLM deployment and offer practical guidance for improving safety in high-stakes applications.
翻译:当前对大语言模型(LLM)的评估几乎完全以准确性为导向;然而,在现实世界和安全关键型应用中,模型在不确定时能够主动"弃答"的能力对于可信部署同样至关重要。我们提出了MedAbstain,这是一个用于医学多项选择题问答(MCQA)——一种可泛化至智能体行动选择的离散决策场景——中弃答行为的统一基准与评估框架,它整合了共形预测、对抗性题目扰动以及显式弃答选项。我们对开源与闭源LLM进行的系统评估表明,即使是最先进的、高准确率的模型也常常无法在不确定时有效弃答。值得注意的是,提供显式弃答选项能持续增加模型的不确定性并促成更安全的弃答行为,其效果远超过输入扰动,而扩大模型规模或采用高级提示策略则收效甚微。这些发现凸显了弃答机制对于LLM可信部署的核心作用,并为提升高风险应用的安全性提供了实用指导。