Large language models (LLMs) have shown increasing capacity at planning and executing a high-level goal in a live computer environment (e.g. MiniWoB++). To perform a task, recent works often require a model to learn from trace examples of the task via either supervised learning or few/many-shot prompting. Without these trace examples, it remains a challenge how an agent can autonomously learn and improve its control on a computer, which limits the ability of an agent to perform a new task. We approach this problem with a zero-shot agent that requires no given expert traces. Our agent plans for executable actions on a partially observed environment, and iteratively progresses a task by identifying and learning from its mistakes via self-reflection and structured thought management. On the easy tasks of MiniWoB++, we show that our zero-shot agent often outperforms recent SoTAs, with more efficient reasoning. For tasks with more complexity, our reflective agent performs on par with prior best models, even though previous works had the advantages of accessing expert traces or additional screen information.
翻译:大型语言模型(LLMs)在实时计算机环境(如MiniWoB++)中规划和执行高层次目标的能力日益增强。为了完成任务,近期研究通常要求模型通过监督学习或少量/多次示例提示从任务的轨迹示例中学习。缺乏这些轨迹示例时,代理如何自主学习和改进对计算机的控制仍是一个挑战,这限制了代理执行新任务的能力。我们通过一个零样本代理来解决这一问题,该代理无需给定专家轨迹。我们的代理在部分可观测环境中规划可执行动作,并通过自我反思和结构化思维管理识别并学习自身错误,从而迭代推进任务。在MiniWoB++的简单任务中,我们证明零样本代理通常以更高效的推理超越近期最先进方法(SoTAs)。对于更复杂的任务,我们的反思代理与先前最优模型表现相当,尽管先前研究具有访问专家轨迹或额外屏幕信息的优势。