This paper presents a pilot study aimed at introducing multi-agent debate into multimodal reasoning. The study addresses two key challenges: the trivialization of opinions resulting from excessive summarization and the diversion of focus caused by distractor concepts introduced from images. These challenges stem from the inductive (bottom-up) nature of existing debating schemes. To address the issue, we propose a deductive (top-down) debating approach called Blueprint Debate on Graphs (BDoG). In BDoG, debates are confined to a blueprint graph to prevent opinion trivialization through world-level summarization. Moreover, by storing evidence in branches within the graph, BDoG mitigates distractions caused by frequent but irrelevant concepts. Extensive experiments validate BDoG, achieving state-of-the-art results in Science QA and MMBench with significant improvements over previous methods.
翻译:本文提出一项探索性研究,旨在将多智能体辩论引入多模态推理。该研究解决两个关键挑战:因过度总结导致的观点平庸化,以及因图像中引入的干扰概念引发的注意力分散。这些挑战源于现有辩论方案的归纳(自下而上)特性。为解决这一问题,我们提出一种演绎(自上而下)式辩论方法——图蓝图辩论(BDoG)。在BDoG中,辩论被限定于蓝图图上,通过全局性总结防止观点平庸化。此外,通过将证据存储在图的分支结构中,BDoG缓解了由频繁出现但无关概念造成的干扰。广泛实验验证了BDoG的有效性,在Science QA和MMBench上取得了显著优于先前方法的最新成果。