MagicAgent: Towards Generalized Agent Planning

Xuhui Ren,Shaokang Dong,Chen Yang,Qing Gao,Yunbin Zhao,Yongsheng Liu,Xinwei Geng,Xiang Li,Demei Yan,Yanqing Li,Chenhao Huang,Dingwei Zhu,Junjie Ye,Boxuan Yue,Yingnan Fu,Mengzhe Lv,Zezeng Feng,Boshen Zhou,Bocheng Wang,Xuanjing Huang,Yu-Gang Jiang,Tao Gui,Qi Zhang,Yunke Zhang

The evolution of Large Language Models (LLMs) from passive text processors to autonomous agents has established planning as a core component of modern intelligence. However, achieving generalized planning remains elusive, not only by the scarcity of high-quality interaction data but also by inherent conflicts across heterogeneous planning tasks. These challenges result in models that excel at isolated tasks yet struggle to generalize, while existing multi-task training attempts suffer from gradient interference. In this paper, we present \textbf{MagicAgent}, a series of foundation models specifically designed for generalized agent planning. We introduce a lightweight and scalable synthetic data framework that generates high-quality trajectories across diverse planning tasks, including hierarchical task decomposition, tool-augmented planning, multi-constraint scheduling, procedural logic orchestration, and long-horizon tool execution. To mitigate training conflicts, we propose a two-stage training paradigm comprising supervised fine-tuning followed by multi-objective reinforcement learning over both static datasets and dynamic environments. Empirical results show that MagicAgent-32B and MagicAgent-30B-A3B achieve superior performance across diverse open-source benchmarks (\emph{e.g.}, $75.1\%$ on Worfbench and $86.9\%$ on BFCL-v3), as well as strong results on our in-house MagicEval benchmarks, substantially outperforming existing sub-100B models and surpassing leading ultra-scale models, including GPT-5.2, Kimi-K2 and GLM-4.7.

翻译：大型语言模型（LLM）从被动文本处理器向自主智能体的演进，已使规划成为现代智能的核心组成部分。然而，实现通用规划仍然面临挑战，这不仅源于高质量交互数据的稀缺，更因异构规划任务间存在内在冲突。这些挑战导致模型虽能出色完成孤立任务，却难以泛化；而现有的多任务训练尝试则受梯度干扰所困。本文提出\textbf{MagicAgent}系列基础模型，专为通用智能体规划而设计。我们引入一个轻量级、可扩展的合成数据框架，能够跨多样规划任务生成高质量轨迹，涵盖分层任务分解、工具增强规划、多约束调度、流程逻辑编排及长程工具执行等场景。为缓解训练冲突，我们提出两阶段训练范式：先进行监督微调，再基于静态数据集与动态环境开展多目标强化学习。实验结果表明，MagicAgent-32B与MagicAgent-30B-A3B在多样开源基准测试（例如Worfbench上$75.1\%$、BFCL-v3上$86.9\%$）及内部MagicEval基准上均取得卓越性能，显著超越现有百亿参数以下模型，并领先于包括GPT-5.2、Kimi-K2与GLM-4.7在内的顶尖超大规模模型。