The evolution of Large Language Models (LLMs) from passive text processors to autonomous agents has established planning as a core component of modern intelligence. However, achieving generalized planning remains elusive, not only by the scarcity of high-quality interaction data but also by inherent conflicts across heterogeneous planning tasks. These challenges result in models that excel at isolated tasks yet struggle to generalize, while existing multi-task training attempts suffer from gradient interference. In this paper, we present \textbf{MagicAgent}, a series of foundation models specifically designed for generalized agent planning. We introduce a lightweight and scalable synthetic data framework that generates high-quality trajectories across diverse planning tasks, including hierarchical task decomposition, tool-augmented planning, multi-constraint scheduling, procedural logic orchestration, and long-horizon tool execution. To mitigate training conflicts, we propose a two-stage training paradigm comprising supervised fine-tuning followed by multi-objective reinforcement learning over both static datasets and dynamic environments. Empirical results show that MagicAgent-32B and MagicAgent-30B-A3B achieve superior performance across diverse open-source benchmarks (\emph{e.g.}, $75.1\%$ on Worfbench and $86.9\%$ on BFCL-v3), as well as strong results on our in-house MagicEval benchmarks, substantially outperforming existing sub-100B models and surpassing leading ultra-scale models, including GPT-5.2, Kimi-K2 and GLM-4.7.
翻译:大型语言模型(LLM)从被动文本处理器向自主智能体的演进,已使规划成为现代智能的核心组成部分。然而,实现通用规划仍然面临挑战,这不仅源于高质量交互数据的稀缺,更因异构规划任务间存在内在冲突。这些挑战导致模型虽能出色完成孤立任务,却难以泛化;而现有的多任务训练尝试则受梯度干扰所困。本文提出\textbf{MagicAgent}系列基础模型,专为通用智能体规划而设计。我们引入一个轻量级、可扩展的合成数据框架,能够跨多样规划任务生成高质量轨迹,涵盖分层任务分解、工具增强规划、多约束调度、流程逻辑编排及长程工具执行等场景。为缓解训练冲突,我们提出两阶段训练范式:先进行监督微调,再基于静态数据集与动态环境开展多目标强化学习。实验结果表明,MagicAgent-32B与MagicAgent-30B-A3B在多样开源基准测试(例如Worfbench上$75.1\%$、BFCL-v3上$86.9\%$)及内部MagicEval基准上均取得卓越性能,显著超越现有百亿参数以下模型,并领先于包括GPT-5.2、Kimi-K2与GLM-4.7在内的顶尖超大规模模型。