Large Vision-Language Models (LVLMs) exhibit strong multimodal capabilities but remain vulnerable to hallucinations from intrinsic errors and adversarial attacks from external exploitations, limiting their reliability in real-world applications. We present ORCA, an agentic reasoning framework that improves the factual accuracy and adversarial robustness of pretrained LVLMs through inference-time structured inference reasoning with a suite of small vision models (less than 3B parameters). ORCA operates via an Observe-Reason-Critique-Act loop, querying multiple visual tools with evidential questions, validating cross-model inconsistencies, and refining predictions iteratively without access to model internals or retraining. ORCA also stores intermediate reasoning traces, which supports auditable decision-making. Though designed primarily to mitigate object-level hallucinations, ORCA also exhibits emergent adversarial robustness without requiring adversarial training or defense mechanisms. We evaluate ORCA across three settings: (1) clean images on hallucination benchmarks, (2) adversarially perturbed images without defense, and (3) adversarially perturbed images with defense applied. On the POPE hallucination benchmark, ORCA improves standalone LVLMs performance by +3.64% to +40.67% across different subsets. Under adversarial perturbations on POPE, ORCA achieves an average accuracy gain of +20.11% across LVLMs. When combined with defense techniques on adversarially perturbed AMBER images, ORCA further improves standalone LVLM performance, with gains ranging from +1.20% to +48.00% across metrics. These results demonstrate that ORCA offers a promising path toward building more reliable and robust multimodal systems.
翻译:大型视觉-语言模型(LVLMs)展现出强大的多模态能力,但仍易受内在误差导致的幻觉与外部利用引发的对抗攻击的影响,这限制了其在真实场景中的可靠性。我们提出ORCA,一种通过推理时结构化推理与一套小型视觉模型(参数量小于3B)协同工作,提升预训练LVLMs事实准确性与对抗鲁棒性的智能推理框架。ORCA采用“观察-推理-批判-行动”(Observe-Reason-Critique-Act)循环机制:通过证据性问题查询多种视觉工具、验证跨模型不一致性、并在不访问模型内部结构或重训练的条件下迭代优化预测。ORCA还存储中间推理轨迹,支持可审计决策。尽管ORCA主要设计用于缓解目标级幻觉,其在不依赖对抗训练或防御机制的情况下,也展现出涌现性对抗鲁棒性。我们在三种场景下评估ORCA:(1)干净图像上的幻觉基准测试;(2)无防御机制下的对抗扰动图像;(3)施加防御机制后的对抗扰动图像。在POPE幻觉基准测试中,ORCA在不同子集上使独立LVLMs性能提升+3.64%至+40.67%。针对POPE上的对抗扰动,ORCA在各LVLM上实现平均准确率提升+20.11%。当结合防御技术处理对抗扰动的AMBER图像时,ORCA进一步提升了独立LVLM性能,各指标增益范围达+1.20%至+48.00%。这些结果表明,ORCA为构建更可靠、鲁棒的多模态系统提供了一条有前景的路径。