Vision-Language Models (VLMs) are essential for embodied AI and safety-critical applications, such as robotics and autonomous systems. However, existing benchmarks primarily focus on static or curated visual inputs, neglecting the challenges posed by adversarial conditions, value misalignment, and error propagation in continuous deployment. Current benchmarks either overlook the impact of real-world perturbations, or fail to account for the cumulative effect of inconsistent reasoning over time. To address these gaps, we introduce the Degraded Image Quality Leading to Hallucinations (DIQ-H) benchmark, the first to evaluate VLMs under adversarial visual conditions in continuous sequences. DIQ-H simulates real-world stressors including motion blur, sensor noise, and compression artifacts, and measures how these corruptions lead to persistent errors and misaligned outputs across time. The benchmark explicitly models error propagation and its long-term value consistency. To enhance scalability and reduce costs for safety-critical evaluation, we propose the Value-Guided Iterative Refinement (VIR) framework, which automates the generation of high-quality, ethically aligned ground truth annotations. VGIR leverages lightweight VLMs to detect and refine value misalignment, improving accuracy from 72.2% to 83.3%, representing a 15.3% relative improvement. The DIQ-H benchmark and VGIR framework provide a robust platform for embodied AI safety assessment, revealing vulnerabilities in error recovery, ethical consistency, and temporal value alignment.
翻译:视觉语言模型(VLM)对于具身人工智能和安全关键型应用(如机器人与自主系统)至关重要。然而,现有基准测试主要关注静态或精心策划的视觉输入,忽略了对抗条件、价值错位以及持续部署中错误传播所带来的挑战。当前的基准测试要么忽略了现实世界扰动的冲击,要么未能考虑随时间推移不一致推理的累积效应。为弥补这些空白,我们提出了导致幻觉的图像质量退化(DIQ-H)基准测试,这是首个在连续序列的对抗性视觉条件下评估VLM的基准。DIQ-H模拟了运动模糊、传感器噪声和压缩伪影等现实世界压力源,并衡量这些退化如何随时间导致持续错误和错位输出。该基准测试明确建模了错误传播及其长期价值一致性。为增强可扩展性并降低安全关键型评估成本,我们提出了价值引导的迭代优化(VIR)框架,该框架可自动生成高质量、符合伦理的真实标注。VIR利用轻量级VLM检测并优化价值错位,将准确率从72.2%提升至83.3%,实现了15.3%的相对改进。DIQ-H基准测试和VIR框架为具身人工智能安全评估提供了稳健平台,揭示了在错误恢复、伦理一致性和时间价值对齐方面的脆弱性。