The recent success of reinforcement learning (RL) in large reasoning models has inspired the growing adoption of RL for post-training Multimodal Large Language Models (MLLMs) to enhance their visual reasoning capabilities. Although many studies have reported improved performance, it remains unclear whether RL training truly enables models to learn from visual information. In this work, we propose the Hallucination-as-Cue Framework, an analytical framework designed to investigate the effects of RL-based post-training on multimodal reasoning models from the perspective of model hallucination. Specifically, we introduce hallucination-inductive, modality-specific corruptions that remove or replace essential information required to derive correct answers, thereby forcing the model to reason by hallucination. By applying these corruptions during both training and evaluation, our framework provides a unique perspective for diagnosing RL training dynamics and understanding the intrinsic properties of datasets. Through extensive experiments and analyses across multiple multimodal reasoning benchmarks, we reveal that the role of model hallucination for RL-training is more significant than previously recognized. For instance, we find that RL post-training under purely hallucination-inductive settings can still significantly improve models' reasoning performance, and in some cases even outperform standard training. These findings challenge prevailing assumptions about MLLM reasoning training and motivate the development of more modality-aware RL-based training designs.
翻译:近期强化学习(RL)在大规模推理模型中的成功,推动了其在多模态大语言模型(MLLM)后训练中的广泛应用,以增强其视觉推理能力。尽管已有许多研究报道了性能提升,但RL训练是否真正使模型能够从视觉信息中学习仍不明确。本文提出幻觉作为线索框架(Hallucination-as-Cue Framework),这是一个从模型幻觉视角研究基于RL的后训练对多模态推理模型影响的分析框架。具体而言,我们引入幻觉诱导型模态特定破坏,通过移除或替换推导正确答案所需的关键信息,迫使模型依赖幻觉进行推理。通过在训练和评估中应用这些破坏,本框架为诊断RL训练动态和揭示数据集内在属性提供了独特视角。基于多个多模态推理基准的广泛实验与分析,我们发现模型幻觉在RL训练中的作用比先前认知更为显著。例如,在纯幻觉诱导设置下的RL后训练仍能显著提升模型的推理性能,某些情况下甚至优于标准训练。这些发现挑战了关于MLLM推理训练的既有假设,并推动了更模态感知的RL训练设计的发展。