To be helpful assistants, AI agents must be aware of their own capabilities and limitations. This includes knowing when to answer from parametric knowledge versus using tools, when to trust tool outputs, and when to abstain or hedge. Such capabilities are hard to teach through supervised fine-tuning because they require constructing examples that reflect the agent's specific capabilities. We therefore propose a radically new approach to teaching agents what they know: \emph{collaborative self-play}. We construct multi-agent collaborations in which the group is rewarded for collectively arriving at correct answers. The desired meta-knowledge emerges from the incentives built into the structure of the interaction. We focus on small societies of agents that have access to heterogeneous tools (corpus-specific retrieval), and therefore must collaborate to maximize their success while minimizing their effort. Experiments show that group-level rewards for multi-agent communities can induce policies that \emph{transfer} to improve tool use and selective prediction in settings where individual agents are deployed in isolation.
翻译:要成为有用的助手,AI智能体必须清楚自身的能力与局限。这包括知道何时应基于参数化知识作答、何时需借助工具、何时应信任工具输出,以及何时应弃答或保留意见。此类能力难以通过监督微调进行教授,因其需要构建反映智能体特定能力的示例。为此,我们提出一种全新的智能体认知教学范式:\emph{协作自博弈}。我们构建了多智能体协作环境,使群体能因共同获得正确答案而获得奖励。期望的元知识从交互结构内嵌的激励机制中自然涌现。我们聚焦于小型智能体社会,其成员可使用异构工具(面向特定语料库的检索系统),因而必须通过协作在最小化努力的同时最大化成功率。实验表明,针对多智能体社群的群体级奖励能够催生可\emph{迁移}的策略,这些策略在智能体独立部署的场景中能有效提升工具使用效能与选择性预测能力。