Visual language models (VLMs) have the potential to transform medical workflows. However, the deployment is limited by sycophancy. Despite this serious threat to patient safety, a systematic benchmark remains lacking. This paper addresses this gap by introducing a Medical benchmark that applies multiple templates to VLMs in a hierarchical medical visual question answering task. We find that current VLMs are highly susceptible to visual cues, with failure rates showing a correlation to model size or overall accuracy. we discover that perceived authority and user mimicry are powerful triggers, suggesting a bias mechanism independent of visual data. To overcome this, we propose a Visual Information Purification for Evidence based Responses (VIPER) strategy that proactively filters out non-evidence-based social cues, thereby reinforcing evidence based reasoning. VIPER reduces sycophancy while maintaining interpretability and consistently outperforms baseline methods, laying the necessary foundation for the robust and secure integration of VLMs.
翻译:视觉语言模型(VLMs)有望变革医疗工作流程。然而,其部署受到谄媚行为的限制。尽管这对患者安全构成严重威胁,但目前仍缺乏系统性的基准测试。本文通过引入一个医学基准测试来填补这一空白,该基准测试在分层医学视觉问答任务中对VLMs应用多种模板。我们发现,当前VLMs极易受视觉线索影响,其失败率与模型大小或整体准确性存在相关性。我们进一步发现,感知权威性与用户模仿是强有力的触发因素,表明存在一种独立于视觉数据的偏见机制。为克服这一问题,我们提出了一种基于视觉信息纯化的证据驱动响应(VIPER)策略,该策略主动过滤非基于证据的社会线索,从而强化证据驱动的推理。VIPER在减少谄媚行为的同时保持可解释性,并持续优于基线方法,为VLMs的稳健与安全整合奠定了必要基础。