Benchmarking and Mitigating Sycophancy in Medical Vision Language Models

Visual language models (VLMs) have the potential to transform medical workflows. However, the deployment is limited by sycophancy. Despite this serious threat to patient safety, a systematic benchmark remains lacking. This paper addresses this gap by introducing a Medical benchmark that applies multiple templates to VLMs in a hierarchical medical visual question answering task. We find that current VLMs are highly susceptible to visual cues, with failure rates showing a correlation to model size or overall accuracy. we discover that perceived authority and user mimicry are powerful triggers, suggesting a bias mechanism independent of visual data. To overcome this, we propose a Visual Information Purification for Evidence based Responses (VIPER) strategy that proactively filters out non-evidence-based social cues, thereby reinforcing evidence based reasoning. VIPER reduces sycophancy while maintaining interpretability and consistently outperforms baseline methods, laying the necessary foundation for the robust and secure integration of VLMs.

翻译：视觉语言模型（VLMs）有望变革医疗工作流程。然而，其部署受到谄媚行为的限制。尽管这对患者安全构成严重威胁，但目前仍缺乏系统性的基准测试。本文通过引入一个医学基准测试来填补这一空白，该基准测试在分层医学视觉问答任务中对VLMs应用多种模板。我们发现，当前VLMs极易受视觉线索影响，其失败率与模型大小或整体准确性存在相关性。我们进一步发现，感知权威性与用户模仿是强有力的触发因素，表明存在一种独立于视觉数据的偏见机制。为克服这一问题，我们提出了一种基于视觉信息纯化的证据驱动响应（VIPER）策略，该策略主动过滤非基于证据的社会线索，从而强化证据驱动的推理。VIPER在减少谄媚行为的同时保持可解释性，并持续优于基线方法，为VLMs的稳健与安全整合奠定了必要基础。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

[ICML 2026] 看见的还是思考的？用奖励机制区分“看错”与“想错”：视觉语言模型奖励感知

专知会员服务

10+阅读 · 5月15日

基于大语言模型的医疗推理研究：综述与 MR-Bench 基准测试

专知会员服务

16+阅读 · 4月13日

在无标注条件下适配视觉—语言模型：全面综述

专知会员服务

13+阅读 · 2025年8月9日

【ICML2025】层级对齐：在视觉语言模型中检验图像编码器层的安全对齐

专知会员服务

7+阅读 · 2025年5月2日