Reflexion: an autonomous agent with dynamic memory and self-reflection - 专知论文

会员服务 ·

0

Agent · 状态空间 · 回合 · MoDELS · HotpotQA ·

2023 年 3 月 20 日

Reflexion: an autonomous agent with dynamic memory and self-reflection

翻译：反思：一种具有动态记忆和自我反思能力的自主智能体

Noah Shinn,Beck Labash,Ashwin Gopinath

Recent advancements in decision-making large language model (LLM) agents have demonstrated impressive performance across various benchmarks. However, these state-of-the-art approaches typically necessitate internal model fine-tuning, external model fine-tuning, or policy optimization over a defined state space. Implementing these methods can prove challenging due to the scarcity of high-quality training data or the lack of well-defined state space. Moreover, these agents do not possess certain qualities inherent to human decision-making processes, specifically the ability to learn from mistakes. Self-reflection allows humans to efficiently solve novel problems through a process of trial and error. Building on recent research, we propose Reflexion, an approach that endows an agent with dynamic memory and self-reflection capabilities to enhance its existing reasoning trace and task-specific action choice abilities. To achieve full automation, we introduce a straightforward yet effective heuristic that enables the agent to pinpoint hallucination instances, avoid repetition in action sequences, and, in some environments, construct an internal memory map of the given environment. To assess our approach, we evaluate the agent's ability to complete decision-making tasks in AlfWorld environments and knowledge-intensive, search-based question-and-answer tasks in HotPotQA environments. We observe success rates of 97% and 51%, respectively, and provide a discussion on the emergent property of self-reflection.

翻译：近期，基于大语言模型的决策型智能体在多项基准测试中展现出卓越性能。然而，这些前沿方法通常需要内部模型微调、外部模型微调或在定义的状态空间中进行策略优化。由于高质量训练数据稀缺或缺乏明确定义的状态空间，这些方法的实施颇具挑战性。此外，这类智能体并不具备人类决策过程中的某些固有特质，特别是从错误中学习的能力。自我反思使人类能够通过试错过程高效解决新颖问题。基于近期研究，我们提出"反思"方法，通过赋予智能体动态记忆与自我反思能力，增强其现有推理链条与特定任务的动作选择能力。为实现完全自动化，我们引入了一种简洁而有效的启发式机制，使智能体能够定位幻觉实例、避免动作序列重复，并在某些环境中构建给定环境的内部记忆地图。为评估该方法，我们测试了智能体在AlfWorld环境中完成决策任务以及在HotPotQA环境中完成基于搜索的知识密集型问答任务的能力，分别获得了97%和51%的成功率，并就自我反思这一涌现特性展开讨论。

0

相关内容

Agent

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

24+阅读 · 2022年3月19日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【开放新书】可验证深度学习，91页pdf阐述Deep Learning的鲁棒性，提升安全可靠性

【开放新书】可验证深度学习，91页pdf阐述Deep Learning的鲁棒性，提升安全可靠性

专知会员服务

61+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

自动驾驶汽车的协调:分类和调查综述（Coordination of Autonomous Vehicles: Taxonomy and Survey），附31页pdf

自动驾驶汽车的协调:分类和调查综述（Coordination of Autonomous Vehicles: Taxonomy and Survey），附31页pdf

专知会员服务

14+阅读 · 2020年1月9日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

强化学习大牛Sergey Levine新作：三个大模型教会机器人认路

强化学习大牛Sergey Levine新作：三个大模型教会机器人认路

机器之心

2+阅读 · 2022年7月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

使用强化学习训练机械臂完成人类任务

使用强化学习训练机械臂完成人类任务

AI研习社

14+阅读 · 2019年3月23日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

微装配智能学习与控制策略研究

国家自然科学基金

0+阅读 · 2013年12月31日

膜蒸馏生物反应器中膜的生物污染形成机理与控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

耦合布尔网络同步及其牵制控制策略的研究

国家自然科学基金

0+阅读 · 2013年12月31日

不确定环境下基于HTN的应急任务规划方法研究

国家自然科学基金

15+阅读 · 2012年12月31日

肿瘤预定位策略用于肝癌的PET显像研究

国家自然科学基金

0+阅读 · 2012年12月31日

中国通胀预期形成、前瞻性时变货币政策规则与收敛速度：基于适应性学习行为的实证研究与模拟

国家自然科学基金

0+阅读 · 2012年12月31日

不确定环境下集装箱码头物流运作能力仿真建模与动态评估

国家自然科学基金

1+阅读 · 2011年12月31日

虚拟边界新改进方法研究柱群涡致振动的被动控制问题

国家自然科学基金

0+阅读 · 2009年12月31日

应急任务生成的决策机制与管理支持方法研究

国家自然科学基金

4+阅读 · 2009年12月31日

An Object SLAM Framework for Association, Mapping, and High-Level Tasks

Arxiv

0+阅读 · 2023年5月12日

Motion Planning for Autonomous Driving: The State of the Art and Future Perspectives

Arxiv

0+阅读 · 2023年5月10日

Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey

Arxiv

17+阅读 · 2022年5月10日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

Building Intelligent Autonomous Navigation Agents

Arxiv

25+阅读 · 2021年6月25日

CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Arxiv

11+阅读 · 2021年2月18日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

VIP会员

文章信息

相关主题

最新内容

深入Project Maven：为何人工智能在战场上依然失灵

深入Project Maven：为何人工智能在战场上依然失灵

专知会员服务

11+阅读 · 7月19日

锻造未来士兵：外骨骼、基因工程与赛博格

锻造未来士兵：外骨骼、基因工程与赛博格

专知会员服务

6+阅读 · 7月19日

《无人机系统（UAS）通信网状网络试验性部署》50页报告

《无人机系统（UAS）通信网状网络试验性部署》50页报告

专知会员服务

6+阅读 · 7月19日

《无人机蜂群通信技术研究》50页

《无人机蜂群通信技术研究》50页

专知会员服务

7+阅读 · 7月19日

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

专知会员服务

10+阅读 · 7月18日

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

专知会员服务

8+阅读 · 7月18日

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

专知会员服务

13+阅读 · 7月18日

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

专知会员服务

8+阅读 · 7月18日

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

专知会员服务

10+阅读 · 7月17日

《边缘端实时无线感知赋能现场多机器人部署》200页

《边缘端实时无线感知赋能现场多机器人部署》200页

专知会员服务

10+阅读 · 7月17日

战力倍增器：自主武器系统与乌克兰及加沙冲突

战力倍增器：自主武器系统与乌克兰及加沙冲突

专知会员服务

6+阅读 · 7月17日

人工智能赋能战场情报：提速决策进程

人工智能赋能战场情报：提速决策进程

专知会员服务

5+阅读 · 7月17日

《拥抱新兴技术：面向未来军官的教育革新》

《拥抱新兴技术：面向未来军官的教育革新》

专知会员服务

8+阅读 · 7月17日

ACM MM 2026 | MAR-GRPO：稳定混合图像生成的强化学习训练

ACM MM 2026 | MAR-GRPO：稳定混合图像生成的强化学习训练

专知会员服务

6+阅读 · 7月17日

综述 | 大模型水印理论与部署：来源追踪、攻击鲁棒与可信治理

综述 | 大模型水印理论与部署：来源追踪、攻击鲁棒与可信治理

专知会员服务

7+阅读 · 7月17日

相关VIP内容

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

24+阅读 · 2022年3月19日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【开放新书】可验证深度学习，91页pdf阐述Deep Learning的鲁棒性，提升安全可靠性

【开放新书】可验证深度学习，91页pdf阐述Deep Learning的鲁棒性，提升安全可靠性

专知会员服务

61+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

自动驾驶汽车的协调:分类和调查综述（Coordination of Autonomous Vehicles: Taxonomy and Survey），附31页pdf

自动驾驶汽车的协调:分类和调查综述（Coordination of Autonomous Vehicles: Taxonomy and Survey），附31页pdf

专知会员服务

14+阅读 · 2020年1月9日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

锻造未来士兵：外骨骼、基因工程与赛博格

《无人机蜂群通信技术研究》50页

深入Project Maven：为何人工智能在战场上依然失灵

《无人机系统（UAS）通信网状网络试验性部署》50页报告

相关资讯

强化学习大牛Sergey Levine新作：三个大模型教会机器人认路

强化学习大牛Sergey Levine新作：三个大模型教会机器人认路

机器之心

2+阅读 · 2022年7月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

使用强化学习训练机械臂完成人类任务

使用强化学习训练机械臂完成人类任务

AI研习社

14+阅读 · 2019年3月23日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

相关论文

An Object SLAM Framework for Association, Mapping, and High-Level Tasks

Arxiv

0+阅读 · 2023年5月12日

Motion Planning for Autonomous Driving: The State of the Art and Future Perspectives

Arxiv

0+阅读 · 2023年5月10日

Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey

Arxiv

17+阅读 · 2022年5月10日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

Building Intelligent Autonomous Navigation Agents

Arxiv

25+阅读 · 2021年6月25日

CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Arxiv

11+阅读 · 2021年2月18日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

微装配智能学习与控制策略研究

国家自然科学基金

0+阅读 · 2013年12月31日

膜蒸馏生物反应器中膜的生物污染形成机理与控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

耦合布尔网络同步及其牵制控制策略的研究

国家自然科学基金

0+阅读 · 2013年12月31日

不确定环境下基于HTN的应急任务规划方法研究

国家自然科学基金

15+阅读 · 2012年12月31日

肿瘤预定位策略用于肝癌的PET显像研究

国家自然科学基金

0+阅读 · 2012年12月31日

中国通胀预期形成、前瞻性时变货币政策规则与收敛速度：基于适应性学习行为的实证研究与模拟

国家自然科学基金

0+阅读 · 2012年12月31日

不确定环境下集装箱码头物流运作能力仿真建模与动态评估

国家自然科学基金

1+阅读 · 2011年12月31日

虚拟边界新改进方法研究柱群涡致振动的被动控制问题

国家自然科学基金

0+阅读 · 2009年12月31日

应急任务生成的决策机制与管理支持方法研究

国家自然科学基金

4+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员