Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning

Filippos Christianos,Georgios Papoudakis,Matthieu Zimmer,Thomas Coste,Zhihao Wu,Jingxuan Chen,Khyati Khandelwal,James Doran,Xidong Feng,Jiacheng Liu,Zheng Xiong,Yicheng Luo,Jianye Hao,Kun Shao,Haitham Bou-Ammar,Jun Wang

from arxiv, paper and appendix, 27 pages

A key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL). However, constructing a standalone RL policy that maps perception to action directly encounters severe problems, chief among them being its lack of generality across multiple tasks and the need for a large amount of training data. The leading cause is that it cannot effectively integrate prior information into the perception-action cycle when devising the policy. Large language models (LLMs) emerged as a fundamental way to incorporate cross-domain knowledge into AI agents but lack crucial learning and adaptation toward specific decision problems. This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies. Our methodology is motivated by the modularity found in the human brain. The framework utilises the construction of intrinsic and extrinsic functions to add previous understandings of reasoning structures. It also provides the adaptive ability to learn models inside every module or function, consistent with the modular structure of cognitive processes. We describe the framework in-depth and compare it with other AI pipelines and existing frameworks. The paper explores practical applications, covering experiments that show the effectiveness of our method. Our results indicate that AI agents perform and adapt far better when organised reasoning and prior knowledge are embedded. This opens the door to more resilient and general AI agent systems.

翻译：构建人工智能智能体的关键方法之一是强化学习。然而，直接构建从感知到行动的独立强化学习策略会面临严重问题，主要体现在跨任务通用性不足以及需要大量训练数据。根本原因在于，在设计策略时无法有效将先验信息整合到感知-行动循环中。大语言模型作为将跨领域知识融入人工智能智能体的基础方式出现，但缺乏针对特定决策问题的关键学习与自适应能力。本文提出一个通用框架模型，用于将结构化推理整合并学习到人工智能智能体的策略中。该方法受人类大脑模块化特性启发，通过构建内在与外在函数来融入既有推理结构的先验理解，同时提供在每个模块或函数内部进行模型学习的自适应能力，这与认知过程的模块化结构一致。我们深入描述了该框架，并将其与其他人工智能流水线和现有框架进行了比较。本文探索了实际应用，并通过实验证明了该方法的有效性。结果表明，当组织化推理与先验知识被嵌入时，人工智能智能体的性能与自适应能力显著提升。这为构建更具韧性和通用性的人工智能智能体系统开辟了道路。