In this work, we aim to develop effective data synthesis techniques that autonomously synthesize multimodal training data for enhancing MLLMs in solving complex real-world tasks. To this end, we propose Collective Adversarial Data Synthesis (CADS), a novel and general approach to synthesize high-quality, diverse and challenging multimodal data for MLLMs. The core idea of CADS is to leverage collective intelligence to ensure high-quality and diverse generation, while exploring adversarial learning to synthesize challenging samples for effectively driving model improvement. Specifically, CADS operates with two cyclic phases, i.e., Collective Adversarial Data Generation (CAD-Generate) and Collective Adversarial Data Judgment (CAD-Judge). CAD-Generate leverages collective knowledge to jointly generate new and diverse multimodal data, while CAD-Judge collaboratively assesses the quality of synthesized data. In addition, CADS introduces an Adversarial Context Optimization mechanism to optimize the generation context to encourage challenging and high-value data generation. With CADS, we construct MMSynthetic-20K and train our model R1-SyntheticVL, which demonstrates superior performance on various benchmarks.
翻译:在本工作中,我们旨在开发有效的数据合成技术,以自主合成多模态训练数据,从而增强多模态大语言模型(MLLMs)解决复杂现实任务的能力。为此,我们提出了集体对抗数据合成(CADS),这是一种新颖且通用的方法,用于为MLLMs合成高质量、多样化且具有挑战性的多模态数据。CADS的核心思想是利用集体智能确保生成的高质量和多样性,同时探索对抗学习以合成具有挑战性的样本,从而有效驱动模型改进。具体而言,CADS在两个循环阶段中运行,即集体对抗数据生成(CAD-Generate)和集体对抗数据评判(CAD-Judge)。CAD-Generate利用集体知识共同生成新颖且多样化的多模态数据,而CAD-Judge则协作评估合成数据的质量。此外,CADS引入了对抗上下文优化机制,以优化生成上下文,鼓励生成具有挑战性和高价值的数据。通过CADS,我们构建了MMSynthetic-20K数据集,并训练了我们的模型R1-SyntheticVL,该模型在各种基准测试中均展现出卓越性能。