Large Language Models (LLMs) have emerged as promising agents for web navigation tasks, interpreting objectives and interacting with web pages. However, the efficiency of spliced prompts for such tasks remains underexplored. We introduces AllTogether, a standardized prompt template that enhances task context representation, thereby improving LLMs' performance in HTML-based web navigation. We evaluate the efficacy of this approach through prompt learning and instruction finetuning based on open-source Llama-2 and API-accessible GPT models. Our results reveal that models like GPT-4 outperform smaller models in web navigation tasks. Additionally, we find that the length of HTML snippet and history trajectory significantly influence performance, and prior step-by-step instructions prove less effective than real-time environmental feedback. Overall, we believe our work provides valuable insights for future research in LLM-driven web agents.
翻译:大语言模型(LLM)作为有前景的网页导航任务智能体,能够理解目标并与网页交互。然而,拼接提示在此类任务中的效率尚未得到充分探索。我们提出了AllTogether——一种标准化提示模板,通过增强任务上下文表示来提升LLM在基于HTML的网页导航中的性能。我们基于开源Llama-2和可调用API的GPT模型,通过提示学习与指令微调评估了该方法的有效性。实验结果表明,GPT-4等模型在网页导航任务中优于小型模型。此外,我们发现HTML片段长度和历史轨迹显著影响性能,且逐步先验指令的效果不如实时环境反馈。总体而言,我们认为本研究为未来LLM驱动的网页智能体研究提供了重要见解。