Recent advancements in decision-making large language model (LLM) agents have demonstrated impressive performance across various benchmarks. However, these state-of-the-art approaches typically necessitate internal model fine-tuning, external model fine-tuning, or policy optimization over a defined state space. Implementing these methods can prove challenging due to the scarcity of high-quality training data or the lack of well-defined state space. Moreover, these agents do not possess certain qualities inherent to human decision-making processes, specifically the ability to learn from mistakes. Self-reflection allows humans to efficiently solve novel problems through a process of trial and error. Building on recent research, we propose Reflexion, an approach that endows an agent with dynamic memory and self-reflection capabilities to enhance its existing reasoning trace and task-specific action choice abilities. To achieve full automation, we introduce a straightforward yet effective heuristic that enables the agent to pinpoint hallucination instances, avoid repetition in action sequences, and, in some environments, construct an internal memory map of the given environment. To assess our approach, we evaluate the agent's ability to complete decision-making tasks in AlfWorld environments and knowledge-intensive, search-based question-and-answer tasks in HotPotQA environments. We observe success rates of 97% and 51%, respectively, and provide a discussion on the emergent property of self-reflection.
翻译:近期,基于大语言模型的决策型智能体在多项基准测试中展现出卓越性能。然而,这些前沿方法通常需要内部模型微调、外部模型微调或在定义的状态空间中进行策略优化。由于高质量训练数据稀缺或缺乏明确定义的状态空间,这些方法的实施颇具挑战性。此外,这类智能体并不具备人类决策过程中的某些固有特质,特别是从错误中学习的能力。自我反思使人类能够通过试错过程高效解决新颖问题。基于近期研究,我们提出"反思"方法,通过赋予智能体动态记忆与自我反思能力,增强其现有推理链条与特定任务的动作选择能力。为实现完全自动化,我们引入了一种简洁而有效的启发式机制,使智能体能够定位幻觉实例、避免动作序列重复,并在某些环境中构建给定环境的内部记忆地图。为评估该方法,我们测试了智能体在AlfWorld环境中完成决策任务以及在HotPotQA环境中完成基于搜索的知识密集型问答任务的能力,分别获得了97%和51%的成功率,并就自我反思这一涌现特性展开讨论。