ORCA: An Agentic Reasoning Framework for Hallucination and Adversarial Robustness in Vision-Language Models

Large Vision-Language Models (LVLMs) exhibit strong multimodal capabilities but remain vulnerable to hallucinations from intrinsic errors and adversarial attacks from external exploitations, limiting their reliability in real-world applications. We present ORCA, an agentic reasoning framework that improves the factual accuracy and adversarial robustness of pretrained LVLMs through inference-time structured inference reasoning with a suite of small vision models (less than 3B parameters). ORCA operates via an Observe-Reason-Critique-Act loop, querying multiple visual tools with evidential questions, validating cross-model inconsistencies, and refining predictions iteratively without access to model internals or retraining. ORCA also stores intermediate reasoning traces, which supports auditable decision-making. Though designed primarily to mitigate object-level hallucinations, ORCA also exhibits emergent adversarial robustness without requiring adversarial training or defense mechanisms. We evaluate ORCA across three settings: (1) clean images on hallucination benchmarks, (2) adversarially perturbed images without defense, and (3) adversarially perturbed images with defense applied. On the POPE hallucination benchmark, ORCA improves standalone LVLMs performance by +3.64% to +40.67% across different subsets. Under adversarial perturbations on POPE, ORCA achieves an average accuracy gain of +20.11% across LVLMs. When combined with defense techniques on adversarially perturbed AMBER images, ORCA further improves standalone LVLM performance, with gains ranging from +1.20% to +48.00% across metrics. These results demonstrate that ORCA offers a promising path toward building more reliable and robust multimodal systems.

翻译：大型视觉-语言模型（LVLMs）展现出强大的多模态能力，但仍易受内在误差导致的幻觉与外部利用引发的对抗攻击的影响，这限制了其在真实场景中的可靠性。我们提出ORCA，一种通过推理时结构化推理与一套小型视觉模型（参数量小于3B）协同工作，提升预训练LVLMs事实准确性与对抗鲁棒性的智能推理框架。ORCA采用“观察-推理-批判-行动”（Observe-Reason-Critique-Act）循环机制：通过证据性问题查询多种视觉工具、验证跨模型不一致性、并在不访问模型内部结构或重训练的条件下迭代优化预测。ORCA还存储中间推理轨迹，支持可审计决策。尽管ORCA主要设计用于缓解目标级幻觉，其在不依赖对抗训练或防御机制的情况下，也展现出涌现性对抗鲁棒性。我们在三种场景下评估ORCA：（1）干净图像上的幻觉基准测试；（2）无防御机制下的对抗扰动图像；（3）施加防御机制后的对抗扰动图像。在POPE幻觉基准测试中，ORCA在不同子集上使独立LVLMs性能提升+3.64%至+40.67%。针对POPE上的对抗扰动，ORCA在各LVLM上实现平均准确率提升+20.11%。当结合防御技术处理对抗扰动的AMBER图像时，ORCA进一步提升了独立LVLM性能，各指标增益范围达+1.20%至+48.00%。这些结果表明，ORCA为构建更可靠、鲁棒的多模态系统提供了一条有前景的路径。