Language agents have shown some ability to interact with an external environment, e.g., a virtual world such as ScienceWorld, to perform complex tasks, e.g., growing a plant, without the startup costs of reinforcement learning. However, despite their zero-shot capabilities, these agents to date do not continually improve over time beyond performance refinement on a specific task. Here we present CLIN, the first language-based agent to achieve this, so that it continually improves over multiple trials, including when both the environment and task are varied, and without requiring parameter updates. Our approach is to use a persistent, dynamic, textual memory centered on causal abstractions (rather than general "helpful hints") that is regularly updated after each trial so that the agent gradually learns useful knowledge for new trials. In the ScienceWorld benchmark, CLIN is able to continually improve on repeated trials on the same task and environment, outperforming state-of-the-art reflective language agents like Reflexion by 23 absolute points. CLIN can also transfer its learning to new environments (or new tasks), improving its zero-shot performance by 4 points (13 for new tasks) and can further improve performance there through continual memory updates, enhancing performance by an additional 17 points (7 for new tasks). This suggests a new architecture for agents built on frozen models that can still continually and rapidly improve over time.
翻译:语言智能体已展现出与外部环境(如ScienceWorld虚拟世界)交互以执行复杂任务(如种植植物)的能力,且无需强化学习的启动成本。然而,尽管具备零样本能力,现有智能体在特定任务性能优化之外,尚无法随时间持续提升。本文提出CLIN——首个实现这一目标的基于语言的智能体,它能在多次试验中持续改进(包括环境和任务变化时),且无需参数更新。我们的方法是使用以因果抽象(而非通用“实用提示”)为核心的持久动态文本记忆,每次试验后定期更新,使智能体逐步积累对新试验有用的知识。在ScienceWorld基准测试中,CLIN能在同一任务和环境的重复试验中持续提升,以23个绝对百分点的优势超越Reflexion等最先进的反思型语言智能体。CLIN还能将学习迁移到新环境(或新任务),使其零样本性能提升4个百分点(新任务为13个百分点),并通过持续记忆更新进一步增强——额外提升17个百分点(新任务为7个百分点)。这为基于冻结模型构建的智能体提供了一种新架构,使其能够持续且快速地进行改进。