Vision-language models (VLMs) achieve strong performance on many benchmarks, yet a basic reliability question remains underexplored: when visual evidence conflicts with commonsense, do models follow what is shown or what commonsense suggests? A characteristic failure in this setting is that the model overrides visual evidence and outputs the commonsense alternative. We term this phenomenon \textbf{commonsense-driven hallucination} (CDH). To evaluate it, we introduce \textbf{CDH-Bench}, a benchmark designed to create explicit \textbf{visual evidence--commonsense conflicts}. CDH-Bench covers three dimensions: \textit{counting anomalies}, \textit{relational anomalies}, and \textit{attribute anomalies}. We evaluate frontier VLMs under \textit{binary Question Answering (QA)} and \textit{multiple-choice QA}, and report metrics including \textit{Counterfactual Accuracy} (CF-Acc), \textit{Commonsense Accuracy} (CS-Acc), \textit{Counterfactual Accuracy Drop} (CFAD), \textit{Commonsense Collapse Rate} (CCR), and \textit{Relative Prior Dependency} (RPD). Results show that even strong models remain vulnerable to prior-driven normalization under visual evidence--commonsense conflict. CDH-Bench provides a controlled diagnostic of visual fidelity under visual evidence--commonsense conflict.
翻译:视觉语言模型(VLMs)在诸多基准测试中表现优异,但一个基础可靠性问题仍待探索:当视觉证据与常识相冲突时,模型究竟依据所见内容还是常识判断?此类场景下典型失效模式是模型覆写视觉证据并输出常识性结果。我们将此现象定义为**常识驱动幻觉**(CDH)。为评估该现象,我们提出**CDH-Bench**——一个旨在构建显式**视觉证据-常识冲突**的基准测试集。CDH-Bench涵盖三个维度:**计数异常**、**关系异常**和**属性异常**。我们在**二分类问答**和**多项选择问答**任务上评估前沿VLM模型,并报告包括**反事实准确率**(CF-Acc)、**常识准确率**(CS-Acc)、**反事实准确率下降率**(CFAD)、**常识坍缩率**(CCR)和**相对先验依赖度**(RPD)在内的指标。实验表明,即便强模型在面对视觉证据-常识冲突时仍易受先验归一化影响。CDH-Bench为视觉证据-常识冲突下的视觉保真度评估提供了受控诊断工具。