DLLM Agent: See Farther, Run Faster

Huiling Zhen,Weizhe Lin,Renxi Liu,Kai Han,Yiming Li,Yuchuan Tian,Hanting Chen,Xiaoguang Li,Xiaosong Li,Chen Chen,Xianzhi Yu,Mingxuan Yuan,Youliang Yan,Peifeng Qin,Jun Wang,Yu Wang,Dacheng Tao,Yunhe Wang

Diffusion large language models (DLLMs) have emerged as an alternative to autoregressive (AR) decoding with appealing efficiency and modeling properties, yet their implications for agentic multi-step decision making remain underexplored. We ask a concrete question: when the generation paradigm is changed but the agent framework and supervision are held fixed, do diffusion backbones induce systematically different planning and tool-use behaviors, and do these differences translate into end-to-end efficiency gains? We study this in a controlled setting by instantiating DLLM and AR backbones within the same agent workflow (DeepDiver) and performing matched agent-oriented fine-tuning on the same trajectory data, yielding diffusion-backed DLLM Agents and directly comparable AR agents. Across benchmarks and case studies, we find that, at comparable accuracy, DLLM Agents are on average over 30% faster end to end than AR agents, with some cases exceeding 8x speedup. Conditioned on correct task completion, DLLM Agents also require fewer interaction rounds and tool invocations, consistent with higher planner hit rates that converge earlier to a correct action path with less backtracking. We further identify two practical considerations for deploying diffusion backbones in tool-using agents. First, naive DLLM policies are more prone to structured tool-call failures, necessitating stronger tool-call-specific training to emit valid schemas and arguments. Second, for multi-turn inputs interleaving context and action spans, diffusion-style span corruption requires aligned attention masking to avoid spurious context-action information flow; without such alignment, performance degrades. Finally, we analyze attention dynamics across workflow stages and observe paradigm-specific coordination patterns, suggesting stronger global planning signals in diffusion-backed agents.

翻译：扩散大语言模型（DLLMs）已成为自回归解码的一种替代方案，具有吸引人的效率和建模特性，但其在智能体多步决策中的影响仍未得到充分探索。我们提出一个具体问题：当生成范式改变但智能体框架和监督保持不变时，扩散主干是否会引发系统性的不同规划与工具使用行为，这些差异是否会转化为端到端的效率提升？我们在受控环境中对此进行研究，将DLLM和自回归主干实例化于相同的智能体工作流（DeepDiver）中，并对相同的轨迹数据进行匹配的智能体导向微调，从而得到基于扩散的DLLM智能体与可直接比较的自回归智能体。在多个基准测试和案例研究中，我们发现，在准确率相当的情况下，DLLM智能体的端到端速度平均比自回归智能体快30%以上，某些情况下加速比超过8倍。在任务正确完成的前提下，DLLM智能体所需的交互轮次和工具调用也更少，这与更高的规划命中率相一致，表明其能以更少的回溯更早地收敛到正确的行动路径。我们进一步确定了在工具使用智能体中部署扩散主干的两个实际考量。首先，未经优化的DLLM策略更容易出现结构化工具调用失败，因此需要更强的工具调用专项训练来生成有效的模式与参数。其次，对于交织上下文与行动片段的多轮输入，扩散式的片段损坏需要对齐的注意力掩码，以避免虚假的上下文-行动信息流；若缺乏这种对齐，性能会下降。最后，我们分析了工作流各阶段的注意力动态，观察到范式特定的协调模式，这表明基于扩散的智能体中存在更强的全局规划信号。