Agents capable of carrying out general tasks on a computer can improve efficiency and productivity by automating repetitive tasks and assisting in complex problem-solving. Ideally, such agents should be able to solve new computer tasks presented to them through natural language commands. However, previous approaches to this problem require large amounts of expert demonstrations and task-specific reward functions, both of which are impractical for new tasks. In this work, we show that a pre-trained large language model (LLM) agent can execute computer tasks guided by natural language using a simple prompting scheme where the agent Recursively Criticizes and Improves its output (RCI). The RCI approach significantly outperforms existing LLM methods for automating computer tasks and surpasses supervised learning (SL) and reinforcement learning (RL) approaches on the MiniWoB++ benchmark. We compare multiple LLMs and find that RCI with the InstructGPT-3+RLHF LLM is state-of-the-art on MiniWoB++, using only a handful of demonstrations per task rather than tens of thousands, and without a task-specific reward function. Furthermore, we demonstrate RCI prompting's effectiveness in enhancing LLMs' reasoning abilities on a suite of natural language reasoning tasks, outperforming chain of thought (CoT) prompting. We find that RCI combined with CoT performs better than either separately. Our code can be found here: https://github.com/posgnu/rci-agent.
翻译:具备在计算机上执行通用任务能力的智能体,可通过自动化重复性工作及协助解决复杂问题来提升效率与生产力。理想情况下,此类智能体应能通过自然语言指令处理向其提出的新型计算机任务。然而,现有方法需要大量专家演示和任务特定的奖励函数,这对于新任务而言均不切实际。本研究表明,预训练大语言模型(LLM)智能体可通过一种简单的提示方案——递归批判与改进(RCI)其输出——在自然语言引导下执行计算机任务。RCI方法在自动化计算机任务方面显著优于现有LLM方法,并在MiniWoB++基准测试中超越监督学习(SL)和强化学习(RL)方法。通过对比多个LLM,我们发现采用InstructGPT-3+RLHF的RCI在MiniWoB++上达到最先进水平,每个任务仅需少量演示(而非数万次),且无需任务特定奖励函数。此外,我们还证明了RCI提示能有效增强LLM在一系列自然语言推理任务中的推理能力,其表现优于思维链(CoT)提示。实验表明,RCI与CoT结合的效果优于任一单独方法。代码详见:https://github.com/posgnu/rci-agent