Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control

Building agents with large language models (LLMs) for computer control is a burgeoning research area, where the agent receives computer states and performs actions to complete complex tasks. Previous computer agents have demonstrated the benefits of in-context learning (ICL); however, their performance is hindered by several issues. First, the limited context length of LLMs and complex computer states restrict the number of exemplars, as a single webpage can consume the entire context. Second, the exemplars in current methods, such as high-level plans and multi-choice questions, cannot represent complete trajectories, leading to suboptimal performance in long-horizon tasks. Third, existing computer agents rely on task-specific exemplars and overlook the similarity among tasks, resulting in poor generalization to novel tasks. To address these challenges, we introduce Synapse, a computer agent featuring three key components: i) state abstraction, which filters out task-irrelevant information from raw states, allowing more exemplars within the limited context, ii) trajectory-as-exemplar prompting, which prompts the LLM with complete trajectories of the abstracted states and actions to improve multi-step decision-making, and iii) exemplar memory, which stores the embeddings of exemplars and retrieves them via similarity search for generalization to novel tasks. We evaluate Synapse on MiniWoB++, a standard task suite, and Mind2Web, a real-world website benchmark. In MiniWoB++, Synapse achieves a 99.2% average success rate (a 10% relative improvement) across 64 tasks using demonstrations from only 48 tasks. Notably, Synapse is the first ICL method to solve the book-flight task in MiniWoB++. Synapse also exhibits a 56% relative improvement in average step success rate over the previous state-of-the-art prompting scheme in Mind2Web.

翻译：基于大语言模型构建计算机控制代理是一个新兴研究领域，其中代理接收计算机状态并执行动作以完成复杂任务。以往的计算机代理已展示出上下文学习的优势，但其性能受若干问题制约：首先，大语言模型有限的上下文长度和复杂的计算机状态限制了示例数量，单个网页可能消耗整个上下文空间；其次，当前方法中的示例（如高层计划与多项选择题）无法表征完整轨迹，导致长程任务表现欠佳；第三，现有计算机代理依赖任务特定示例而忽略任务间相似性，造成对新颖任务的泛化能力薄弱。为解决这些挑战，我们提出Synapse——一种包含三个关键组件的计算机代理：i)状态抽象，从原始状态中过滤任务无关信息，使得有限上下文可容纳更多示例；ii)轨迹作为示例提示，将抽象状态与动作的完整轨迹输入大语言模型以改进多步决策；iii)示例记忆，存储示例嵌入并通过相似性搜索检索以实现对新颖任务的泛化。我们在标准任务套件MiniWoB++及真实网站基准Mind2Web上评估Synapse。在MiniWoB++中，Synapse利用仅48个任务的示范，在64个任务上实现99.2%的平均成功率（相对提升10%）。值得注意的是，Synapse是首个解决MiniWoB++中机票预订任务的上下文学习方法。在Mind2Web中，Synapse相较于先前最优提示方案的平均步骤成功率相对提升56%。