MagicAgent: Towards Generalized Agent Planning

Xuhui Ren,Shaokang Dong,Chen Yang,Qing Gao,Yunbin Zhao,Yongsheng Liu,Xinwei Geng,Xiang Li,Demei Yan,Yanqing Li,Chenhao Huang,Dingwei Zhu,Junjie Ye,Boxuan Yue,Yingnan Fu,Mengzhe Lv,Zezeng Feng,Boshen Zhou,Bocheng Wang,Xuanjing Huang,Yu-Gang Jiang,Tao Gui,Qi Zhang,Yunke Zhang

The evolution of Large Language Models (LLMs) from passive text processors to autonomous agents has established planning as a core component of modern intelligence. However, achieving generalized planning remains elusive, not only by the scarcity of high-quality interaction data but also by inherent conflicts across heterogeneous planning tasks. These challenges result in models that excel at isolated tasks yet struggle to generalize, while existing multi-task training attempts suffer from gradient interference. In this paper, we present \textbf{MagicAgent}, a series of foundation models specifically designed for generalized agent planning. We introduce a lightweight and scalable synthetic data framework that generates high-quality trajectories across diverse planning tasks, including hierarchical task decomposition, tool-augmented planning, multi-constraint scheduling, procedural logic orchestration, and long-horizon tool execution. To mitigate training conflicts, we propose a two-stage training paradigm comprising supervised fine-tuning followed by multi-objective reinforcement learning over both static datasets and dynamic environments. Empirical results demonstrate that MagicAgent-32B and MagicAgent-30B-A3B deliver superior performance, achieving accuracies of $75.1\%$ on Worfbench, $55.9\%$ on NaturalPlan, $57.5\%$ on $τ^2$-Bench, $86.9\%$ on BFCL-v3, and $81.2\%$ on ACEBench, as well as strong results on our in-house MagicEval benchmarks. These results substantially outperform existing sub-100B models and even surpass leading closed-source models.

翻译：大型语言模型（LLM）从被动文本处理器向自主智能体的演进，已使规划成为现代智能的核心组成部分。然而，实现通用化规划仍面临挑战，这不仅源于高质量交互数据的稀缺，更因异构规划任务间存在内在冲突。这些问题导致模型虽能胜任孤立任务却难以泛化，而现有的多任务训练尝试又常受梯度干扰所困。本文提出\textbf{MagicAgent}系列基础模型，专为通用智能体规划而设计。我们引入轻量级可扩展的合成数据框架，可生成涵盖多种规划任务的高质量轨迹，包括分层任务分解、工具增强规划、多约束调度、流程逻辑编排及长程工具执行。为缓解训练冲突，我们提出两阶段训练范式：先进行监督微调，再基于静态数据集与动态环境开展多目标强化学习。实验结果表明，MagicAgent-32B与MagicAgent-30B-A3B表现出卓越性能，在Worfbench准确率达$75.1\%$，NaturalPlan达$55.9\%$，$τ^2$-Bench达$57.5\%$，BFCL-v3达$86.9\%$，ACEBench达$81.2\%$，并在自建的MagicEval基准测试中取得强劲结果。这些成绩显著超越现有百亿参数以下模型，甚至优于领先的闭源模型。