Large language models (LLMs) struggle on processing complicated observations in interactive decision making tasks. To alleviate this issue, we propose a simple hierarchical prompting approach. Diverging from previous prompting approaches that always put the full observation (e.g. a web page) to the prompt, we propose to first construct an action-aware observation which is more condensed and relevant with a dedicated SUMMARIZER prompt. The ACTOR prompt then predicts the next action based on the summarized observation. While our method has broad applicability, we particularly demonstrate its efficacy in the complex domain of web navigation where a full observation often contains redundant and irrelevant information. Our approach outperforms the previous state-of-the-art prompting mechanics by 6.2% on task success rate, demonstrating its potential on interactive decision making tasks with long observation traces.
翻译:大型语言模型在交互式决策任务中处理复杂观察时面临困难。为缓解这一问题,我们提出一种简单的层级提示方法。与以往总是将完整观察(如网页)直接输入提示的提示方法不同,我们首先通过专用的SUMMARIZER提示构建一个更精简且相关性更强的行动感知观察。随后,ACTOR提示基于总结后的观察预测下一步行动。尽管我们的方法具有广泛适用性,我们特别在网页导航这一复杂领域中验证其有效性,因为在此类任务中完整观察往往包含冗余和无关信息。我们的方法在任务成功率上比先前最先进的提示机制提高了6.2%,展现了其在具有长观察轨迹的交互式决策任务中的潜力。