Agents capable of carrying out general tasks on a computer can improve efficiency and productivity by automating repetitive tasks and assisting in complex problem-solving. Ideally, such agents should be able to solve new computer tasks presented to them through natural language commands. However, previous approaches to this problem require large amounts of expert demonstrations and task-specific reward functions, both of which are impractical for new tasks. In this work, we show that a pre-trained large language model (LLM) agent can execute computer tasks guided by natural language using a simple prompting scheme where the agent Recursively Criticizes and Improves its output (RCI). The RCI approach significantly outperforms existing LLM methods for automating computer tasks and surpasses supervised learning (SL) and reinforcement learning (RL) approaches on the MiniWoB++ benchmark. We compare multiple LLMs and find that RCI with the InstructGPT-3+RLHF LLM is state-of-the-art on MiniWoB++, using only a handful of demonstrations per task rather than tens of thousands, and without a task-specific reward function. Furthermore, we demonstrate RCI prompting's effectiveness in enhancing LLMs' reasoning abilities on a suite of natural language reasoning tasks, outperforming chain of thought (CoT) prompting with external feedback. We find that RCI combined with CoT performs better than either separately. Our code can be found here: https://github.com/posgnu/rci-agent.
翻译:具备在计算机上执行通用任务能力的智能体能够通过自动化重复性任务及辅助复杂问题解决来提升效率与生产力。理想情况下,此类智能体应能通过自然语言指令处理其面临的新计算机任务。然而,先前针对该问题的方案需要大量专家示范和任务特定奖励函数——这两者对于新任务均缺乏实用性。本研究表明,预训练大语言模型(LLM)智能体可通过一种简单的提示方案执行由自然语言引导的计算机任务:该智能体递归式地批判并改进其输出(RCI)。RCI方法在自动化计算机任务方面显著优于现有LLM方法,并在MiniWoB++基准测试中超越了监督学习(SL)和强化学习(RL)方法。通过对比多种LLM,我们发现采用InstructGPT-3+RLHF的RCI方法在MiniWoB++上达到了最优性能,每个任务仅需少量示范(而非数万次),且无需任务特定奖励函数。此外,我们验证了RCI提示在提升LLM推理能力方面的有效性——在系列自然语言推理任务中,其表现优于结合外部反馈的思维链(CoT)提示。值得注意的是,RCI与CoT的组合方案效果优于两者单独使用。我们的代码见:https://github.com/posgnu/rci-agent。