We introduce a novel framework of LLM agents named AGILE (AGent that Interacts and Learns from Environments) designed to perform complex conversational tasks with users, leveraging LLMs, memory, tools, and interactions with experts. The agent's abilities include not only conversation but also reflection, utilization of tools, and consultation with experts. We formulate the construction of such an LLM agent as a reinforcement learning problem, in which the LLM serves as the policy model. We fine-tune the LLM using labeled data of actions and the PPO algorithm. We focus on question answering and release a dataset for agents called ProductQA, comprising challenging questions in online shopping. Our extensive experiments on ProductQA and MedMCQA show that AGILE agents based on 13B and 7B LLMs trained with PPO can outperform GPT-4 agents. Our ablation study highlights the indispensability of memory, tools, consultation, reflection, and reinforcement learning in achieving the agent's strong performance.
翻译:我们提出了一种名为AGILE(与环境交互并学习的智能体)的新型大语言模型智能体框架,该框架利用大语言模型、记忆模块、工具调用以及与专家交互的能力,旨在执行复杂的用户对话任务。该智能体不仅具备对话能力,还包括反思、工具利用及专家咨询等功能。我们将此类大语言模型智能体的构建形式化为强化学习问题,其中大语言模型作为策略模型。我们使用带标注的动作数据和PPO算法对大语言模型进行微调。我们专注于问答任务,并发布了名为ProductQA的智能体专用数据集,其中包含在线购物场景中的挑战性问题。我们在ProductQA和MedMCQA数据集上的大量实验表明,基于13B和7B参数大语言模型、通过PPO训练的AGILE智能体能够超越GPT-4智能体。消融实验凸显了记忆模块、工具调用、专家咨询、反思机制以及强化学习对于实现智能体优异性能的不可或缺性。