Recent advancements in Large Language Model (LLM)-based frameworks have extended their capabilities to complex real-world applications, such as interactive web navigation. These systems, driven by user commands, navigate web browsers to complete tasks through multi-turn dialogues, offering both innovative opportunities and significant challenges. Despite the introduction of benchmarks for conversational web navigation, a detailed understanding of the key contextual components that influence the performance of these agents remains elusive. This study aims to fill this gap by analyzing the various contextual elements crucial to the functioning of web navigation agents. We investigate the optimization of context management, focusing on the influence of interaction history and web page representation. Our work highlights improved agent performance across out-of-distribution scenarios, including unseen websites, categories, and geographic locations through effective context management. These findings provide insights into the design and optimization of LLM-based agents, enabling more accurate and effective web navigation in real-world applications.
翻译:近年来,基于大语言模型(LLM)的框架已将其能力扩展到复杂的现实世界应用中,例如交互式网页导航。这些系统在用户指令的驱动下,通过多轮对话操作网页浏览器以完成任务,既带来了创新机遇,也带来了重大挑战。尽管已引入对话式网页导航的基准测试,但对于影响这些智能体性能的关键上下文组件的详细理解仍然不足。本研究旨在通过分析对网页导航智能体运行至关重要的各种上下文要素来填补这一空白。我们研究了上下文管理的优化,重点关注交互历史和网页表示的影响。我们的工作表明,通过有效的上下文管理,智能体在分布外场景(包括未见过的网站、类别和地理位置)中的性能得到了提升。这些发现为基于LLM的智能体的设计与优化提供了见解,使其能够在现实应用中实现更准确、更有效的网页导航。