Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training

Multimodal reasoning is a challenging task that requires models to reason across multiple modalities to answer questions. Existing approaches have made progress by incorporating language and visual modalities into a two-stage reasoning framework, separating rationale generation from answer inference. However, these approaches often fall short due to the inadequate quality of the generated rationales. In this work, we delve into the importance of rationales in model reasoning. We observe that when rationales are completely accurate, the model's accuracy significantly improves, highlighting the need for high-quality rationale generation. Motivated by this, we propose MC-CoT, a self-consistency training strategy that generates multiple rationales and answers, subsequently selecting the most accurate through a voting process. This approach not only enhances the quality of generated rationales but also leads to more accurate and robust answers. Through extensive experiments, we demonstrate that our approach significantly improves model performance across various benchmarks. Remarkably, we show that even smaller base models, when equipped with our proposed approach, can achieve results comparable to those of larger models, illustrating the potential of our approach in harnessing the power of rationales for improved multimodal reasoning. The code is available at https://github.com/chengtan9907/mc-cot.

翻译：多模态推理是一项具有挑战性的任务，要求模型跨多种模态进行推理以回答问题。现有方法通过将语言和视觉模态整合到两阶段推理框架中，将理由生成与答案推理分离，取得了一定进展。然而，这些方法常因生成的理由质量不足而效果有限。在本工作中，我们深入探究了理由在模型推理中的重要性。我们观察到，当理由完全准确时，模型的准确率显著提升，这凸显了高质量理由生成的需求。受此启发，我们提出MC-CoT，一种自一致性训练策略，该策略生成多个理由和答案，随后通过投票过程选择最准确的组合。此方法不仅提升了生成理由的质量，还带来了更准确且鲁棒的答案。通过大量实验，我们证明该方法在各种基准测试中显著提升了模型性能。值得注意的是，即使较小的基础模型采用我们提出的方法，也能达到与大型模型相媲美的结果，这展示了我们方法在利用理由提升多模态推理能力方面的潜力。代码已开源：https://github.com/chengtan9907/mc-cot。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/