扩散智能体：通过多智能体强化学习增强扩散语言模型的结构化数据生成能力（扩展版） (Agents of Diffusion: Enhancing Diffusion Language Models with Multi-Agent Reinforcement Learning for Structured Data Generation (Extended Version))

Generating high-quality structured data such as JSON records, remains a fundamental challenge for large language models (LLMs), particularly when semantic richness must coexist with strict schema adherence. While autoregressive LLMs offer strong structural consistency, they often struggle with semantic variation and output diversity. In contrast, diffusion language models (DLMs) introduce powerful mechanisms for semantic richness and bidirectional decoding, yet lack the inductive biases needed for reliable structure preservation. We present Agents of Diffusion (AoD), a novel framework that unifies the generative flexibility of DLMs with the reasoning capabilities of autoregressive models through language-mediated reinforcement learning. AoD frames structured text generation as a multi-agent alignment process, where a prompt optimization agent collaborates with a judge agent to iteratively guide a DLM using natural language feedback. This approach enables controllable, schema-consistent generation without modifying model parameters or relying on handcrafted constraints. AoD advances the state of controllable generation by demonstrating that diffusion models, when supervised by cooperative agents, can achieve both high semantic novelty and structural fidelity. Across multiple structured data benchmarks, AoD consistently outperforms diffusion and autoregressive baselines, establishing a new path forward for structure-aware, diversity-enhanced text synthesis.

翻译：生成高质量的结构化数据（如JSON记录）仍然是大型语言模型（LLM）面临的一项基本挑战，尤其是在语义丰富性与严格模式遵循必须共存的情况下。自回归LLM虽能提供较强的结构一致性，但常难以实现语义变化和输出多样性。相比之下，扩散语言模型（DLM）引入了实现语义丰富性和双向解码的强大机制，却缺乏可靠保持结构所需的归纳偏置。我们提出扩散智能体（AoD）这一新颖框架，它通过语言介导的强化学习，将DLM的生成灵活性与自回归模型的推理能力相统一。AoD将结构化文本生成构建为一个多智能体对齐过程：提示优化智能体与评判智能体协作，利用自然语言反馈迭代引导DLM。该方法无需修改模型参数或依赖人工约束，即可实现可控、模式一致的生成。AoD通过证明扩散模型在协作智能体监督下能同时实现高语义新颖性与结构保真度，推动了可控生成技术的发展。在多个结构化数据基准测试中，AoD持续优于扩散模型与自回归基线模型，为结构感知、多样性增强的文本合成开辟了新路径。