Large language models (LLMs) have been successfully adapted for interactive decision-making tasks like web navigation. While achieving decent performance, previous methods implicitly assume a forward-only execution mode for the model, where they only provide oracle trajectories as in-context examples to teach the model how to reason in the interactive environment. Consequently, the model could not handle more challenging scenarios not covered in the in-context examples, e.g., mistakes, leading to sub-optimal performance. To address this issue, we propose to model the interactive task as state space exploration, where the LLM agent transitions among a pre-defined set of states by performing actions to complete the task. This formulation enables flexible back-tracking, allowing the model to easily recover from errors. We evaluate our proposed LLM Agent with State-Space ExploRation (LASER) on the WebShop task. Experimental results show that our LASER agent significantly outperforms previous methods and closes the gap with human performance on the web navigation task.
翻译:大型语言模型(LLMs)已成功应用于交互式决策任务,例如网页导航。尽管取得了不错的性能,但先前的方法隐含地假设模型采用前向执行的模式,仅通过提供专家轨迹作为上下文示例来教导模型如何在交互环境中推理。因此,模型无法处理上下文示例未涵盖的更复杂场景(例如错误),导致性能次优。为解决这一问题,我们提出将交互式任务建模为状态空间探索,其中LLM智能体通过执行动作在预定义的状态集合间转换以完成任务。这种形式化方法支持灵活的回溯,使模型能够轻松从错误中恢复。我们在WebShop任务上评估了所提出的基于状态空间探索的LLM智能体(LASER)。实验结果表明,我们的LASER智能体显著优于先前方法,并在网页导航任务上缩小了与人类性能的差距。