Large language models (LLMs) have demonstrated remarkable capabilities in performing a range of instruction following tasks in few and zero-shot settings. However, teaching LLMs to perform tasks on the web presents fundamental challenges -- combinatorially large open-world tasks and variations across web interfaces. We tackle these challenges by leveraging LLMs to decompose web tasks into a collection of sub-tasks, each of which can be solved by a low-level, closed-loop policy. These policies constitute a shared grammar across tasks, i.e., new web tasks can be expressed as a composition of these policies. We propose a novel framework, Hierarchical Policies for Web Actions using LLMs (HeaP), that learns a set of hierarchical LLM prompts from demonstrations for planning high-level tasks and executing them via a sequence of low-level policies. We evaluate HeaP against a range of baselines on a suite of web tasks, including MiniWoB++, WebArena, a mock airline CRM, as well as live website interactions, and show that it is able to outperform prior works using orders of magnitude less data.
翻译:大型语言模型(LLMs)在少量样本和零样本场景下执行指令遵循任务方面展现出卓越能力。然而,教授LLMs完成网页操作任务面临根本性挑战——开放世界任务的组合爆炸性增长与网页接口的多样性。我们通过利用LLMs将网页任务分解为子任务集合来应对这些挑战,其中每个子任务均可由低层级闭环策略求解。这些策略构成跨任务的共享语法体系,即新网页任务可表示为这些策略的组合。我们提出创新框架HeaP(面向网页操作的分层策略与LLM结合),该框架从演示数据中学习一组分层LLM提示,用于规划高层级任务并通过低层级策略序列执行。我们在涵盖MiniWoB++、WebArena、模拟航空CRM及实时网站交互的系列网页任务基准上评估HeaP,结果表明其能以数量级更少的数据量超越先前工作。