A$^2$FM：一种面向工具感知混合推理的自适应智能体基础模型 (A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning)

Qianben Chen,Jingyi Cao,Jiayu Zhang,Tianrui Qin,Xiaowan Li,King Zhu,Dingfeng Shi,He Zhu,Minghao Liu,Xiaobo Liang,Xin Gui,Ge Zhang,Jian Yang,Yuchen Eleanor Jiang,Wangchunshu Zhou

from arxiv, 12 pages, 6 figures

Large language models split into two families: reasoning-centric LLMs, which strengthen internal chain-of-thought reasoning but cannot invoke external tools, and agentic LLMs, which learn to interact with environments and leverage tools but often lag in deep reasoning. This divide arises from fundamentally different training objectives, leading to mismatched strengths and inefficiency on simple queries, where both families tend to overthink or over-call tools. In this work, we present Adaptive Agent Foundation Model (A$^2$FM), a unified framework that follows a route-then-align principle: the model first learns task-aware routing and then aligns mode-specific trajectories under a shared backbone. To address the inefficiency gap, we introduce a third mode-instant-that handles simple queries directly, preventing unnecessary reasoning or tool calls while complementing the agentic and reasoning modes. To jointly enhance accuracy and efficiency, we propose Adaptive Policy Optimization (APO), which enforces adaptive sampling across modes and applies a cost-regularized reward. On the 32B scale, A$^2$FM achieves 13.4% on BrowseComp, 70.4% on AIME25, and 16.7% on HLE, setting new SOTA among comparable models and performing competitively with frontier LLMs across agentic, reasoning, and general benchmarks. Notably, the adaptive execution achieves a cost of pass of only $0.00487 per correct answer-cutting cost by 45.2% relative to reasoning and 33.5% relative to agentic, thus delivering substantially higher cost efficiency while maintaining comparable accuracy.

翻译：大型语言模型主要分为两个类别：以推理为中心的LLMs，它们强化了内部的思维链推理能力但无法调用外部工具；以及智能体式LLMs，它们学习与环境交互并利用工具，但在深度推理方面往往表现滞后。这种分野源于根本不同的训练目标，导致两者优势错配，并且在处理简单查询时效率低下——这两类模型都倾向于过度思考或过度调用工具。在本工作中，我们提出了自适应智能体基础模型（A$^2$FM），这是一个遵循“先路由后对齐”原则的统一框架：模型首先学习任务感知的路由，然后在共享骨干网络下对齐特定模式的轨迹。为了解决效率差距，我们引入了第三种模式——即时模式——直接处理简单查询，从而避免不必要的推理或工具调用，同时补充智能体模式和推理模式。为了共同提升准确性和效率，我们提出了自适应策略优化（APO），它强制跨模式的自适应采样，并应用成本正则化的奖励。在320亿参数规模上，A$^2$FM在BrowseComp上达到13.4%，在AIME25上达到70.4%，在HLE上达到16.7%，在同类模型中创造了新的最高性能，并在智能体、推理和通用基准测试中与前沿LLMs竞争性地匹敌。值得注意的是，自适应执行实现了每次正确答案仅$0.00487的成本——相较于推理模式成本降低45.2%，相较于智能体模式降低33.5%，从而在保持相当准确性的同时，显著提高了成本效益。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日