Causal Confusion and Reward Misidentification in Preference-Based Reward Learning - 专知论文

会员服务 ·

0

Learning · Performer · 测试误差 · 可辨认的 · Hacking ·

2023 年 3 月 18 日

Causal Confusion and Reward Misidentification in Preference-Based Reward Learning

翻译：基于偏好的奖励学习中的因果混淆与奖励误识别

Jeremy Tien,Jerry Zhi-Yang He,Zackory Erickson,Anca D. Dragan,Daniel S. Brown

from arxiv, In the proceedings of the Eleventh International Conference on Learning Representations (ICLR 2023). https://iclr.cc/virtual/2023/poster/10822

Learning policies via preference-based reward learning is an increasingly popular method for customizing agent behavior, but has been shown anecdotally to be prone to spurious correlations and reward hacking behaviors. While much prior work focuses on causal confusion in reinforcement learning and behavioral cloning, we focus on a systematic study of causal confusion and reward misidentification when learning from preferences. In particular, we perform a series of sensitivity and ablation analyses on several benchmark domains where rewards learned from preferences achieve minimal test error but fail to generalize to out-of-distribution states -- resulting in poor policy performance when optimized. We find that the presence of non-causal distractor features, noise in the stated preferences, and partial state observability can all exacerbate reward misidentification. We also identify a set of methods with which to interpret misidentified learned rewards. In general, we observe that optimizing misidentified rewards drives the policy off the reward's training distribution, resulting in high predicted (learned) rewards but low true rewards. These findings illuminate the susceptibility of preference learning to reward misidentification and causal confusion -- failure to consider even one of many factors can result in unexpected, undesirable behavior.

翻译：通过基于偏好的奖励学习来制定策略，是一种日益流行的定制智能体行为的方法，但已有实例证据表明，该方法容易受到虚假相关性和奖励破解行为的影响。尽管此前大量研究聚焦于强化学习和行为克隆中的因果混淆，但本文系统地研究了从偏好中学习时的因果混淆与奖励误识别问题。具体而言，我们在多个基准领域进行了一系列敏感性分析和消融分析，发现在这些领域中，通过偏好学习得到的奖励虽在测试集上误差极小，却无法泛化至分布外状态——进而导致策略优化后表现不佳。研究结果表明，非因果干扰特征的存在、给定偏好中的噪声以及部分状态可观测性均会加剧奖励误识别。此外，我们确定了一系列可用于解释误识别学习奖励的方法。总体而言，我们观察到优化误识别奖励会促使策略偏离该奖励的训练分布，导致预测（学习）的奖励值偏高而真实奖励值偏低。这些发现揭示了偏好学习易受奖励误识别和因果混淆影响的特点——即使忽略众多因素中的一个，也可能引发意外且非期望的行为。

0

相关内容

Learning

终身学习如何构建？NeurIPS2022《终身学习机》教程，70页ppt

终身学习如何构建？NeurIPS2022《终身学习机》教程，70页ppt

专知会员服务

46+阅读 · 2023年1月26日

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

【ICML2021】学习权衡不完美的示范

专知会员服务

16+阅读 · 2021年9月23日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

90+阅读 · 2021年1月12日

【论文】持续学习的图神经网络用于检测社交媒体的假新闻，Graph Neural Networks with Continual Learning for Fake News Detection from Social Media

【论文】持续学习的图神经网络用于检测社交媒体的假新闻，Graph Neural Networks with Continual Learning for Fake News Detection from Social Media

专知会员服务

41+阅读 · 2020年7月14日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

专知会员服务

30+阅读 · 2019年12月10日

【AAAI2020论文】隐私保留GBDT（Privacy-Preserving Gradient Boosting Decision Trees）

【AAAI2020论文】隐私保留GBDT（Privacy-Preserving Gradient Boosting Decision Trees）

专知会员服务

36+阅读 · 2019年11月15日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

多设备在线租赁优化模型与竞争策略研究

国家自然科学基金

0+阅读 · 2014年12月31日

领域驱动空间co-location模式挖掘技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

虚拟机不确定错误的溯源方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

稀有变异的有效发现与识别

国家自然科学基金

0+阅读 · 2013年12月31日

微小RNA-1268在先天性心脏病发病中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

有个性的仿人情感交互机器人研究与实验验证

国家自然科学基金

2+阅读 · 2012年12月31日

因果推断的统计方法

国家自然科学基金

26+阅读 · 2011年12月31日

进化规划算法的计算时间难题研究

国家自然科学基金

0+阅读 · 2010年12月31日

一种新的活细胞内标记蛋白质错误折叠的荧光探剂

国家自然科学基金

0+阅读 · 2009年12月31日

HSP70与IL-6的交互作用在焦炉逸散物致早期心血管损害中的生物学意义

国家自然科学基金

0+阅读 · 2009年12月31日

A Survey on Causal Reinforcement Learning

Arxiv

29+阅读 · 2023年2月10日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

46+阅读 · 2022年8月2日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

Reinforcement Learning based Air Combat Maneuver Generation

Reinforcement Learning based Air Combat Maneuver Generation

Arxiv

92+阅读 · 2022年1月14日

On the Opportunities and Risks of Foundation Models

Arxiv

30+阅读 · 2021年8月18日

Recent Advances and Trends in Multimodal Deep Learning: A Review

Arxiv

57+阅读 · 2021年5月24日

The Confluence of Networks, Games and Learning

Arxiv

94+阅读 · 2021年5月17日

Aesthetics, Personalization and Recommendation: A survey on Deep Learning in Fashion

Arxiv

13+阅读 · 2021年1月20日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

80+阅读 · 2020年1月19日

Video Captioning via Hierarchical Reinforcement Learning

Arxiv

20+阅读 · 2018年3月29日

VIP会员

文章信息

相关主题

最新内容

博士论文 | 面向大模型推理的内存高效算法

博士论文 | 面向大模型推理的内存高效算法

专知会员服务

2+阅读 · 7月27日

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

专知会员服务

3+阅读 · 7月27日

《无人系统互操作性导论——无人系统联合架构（JAUS）》

《无人系统互操作性导论——无人系统联合架构（JAUS）》

专知会员服务

9+阅读 · 7月27日

美空军新型反无人机部队初探

美空军新型反无人机部队初探

专知会员服务

5+阅读 · 7月27日

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

专知会员服务

3+阅读 · 7月27日

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

专知会员服务

3+阅读 · 7月27日

《防空交战流程的概率建模研究》

《防空交战流程的概率建模研究》

专知会员服务

7+阅读 · 7月27日

ICML 2026 教程 | 数值优化理论还重要吗？

ICML 2026 教程 | 数值优化理论还重要吗？

专知会员服务

6+阅读 · 7月26日

ICM 2026 | 陶哲轩：人工智能时代的数学

ICM 2026 | 陶哲轩：人工智能时代的数学

专知会员服务

9+阅读 · 7月26日

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

专知会员服务

8+阅读 · 7月26日

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

专知会员服务

11+阅读 · 7月26日

《先进防空系统选型战略框架：基于巴基斯坦的实证启示》

《先进防空系统选型战略框架：基于巴基斯坦的实证启示》

专知会员服务

8+阅读 · 7月26日

《反无人机交战场景下的战斗归零研究》

《反无人机交战场景下的战斗归零研究》

专知会员服务

7+阅读 · 7月26日

霍尔木兹与不对称作战时代：水雷、无人系统与海军力量的重新定义

霍尔木兹与不对称作战时代：水雷、无人系统与海军力量的重新定义

专知会员服务

4+阅读 · 7月26日

博士论文 | 用代码结构感知方法推进代码大模型

博士论文 | 用代码结构感知方法推进代码大模型

专知会员服务

6+阅读 · 7月25日

相关VIP内容

终身学习如何构建？NeurIPS2022《终身学习机》教程，70页ppt

终身学习如何构建？NeurIPS2022《终身学习机》教程，70页ppt

专知会员服务

46+阅读 · 2023年1月26日

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

【ICML2021】学习权衡不完美的示范

专知会员服务

16+阅读 · 2021年9月23日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

90+阅读 · 2021年1月12日

【论文】持续学习的图神经网络用于检测社交媒体的假新闻，Graph Neural Networks with Continual Learning for Fake News Detection from Social Media

【论文】持续学习的图神经网络用于检测社交媒体的假新闻，Graph Neural Networks with Continual Learning for Fake News Detection from Social Media

专知会员服务

41+阅读 · 2020年7月14日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

专知会员服务

30+阅读 · 2019年12月10日

【AAAI2020论文】隐私保留GBDT（Privacy-Preserving Gradient Boosting Decision Trees）

【AAAI2020论文】隐私保留GBDT（Privacy-Preserving Gradient Boosting Decision Trees）

专知会员服务

36+阅读 · 2019年11月15日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

美空军新型反无人机部队初探

博士论文 | 面向大模型推理的内存高效算法

《无人系统互操作性导论——无人系统联合架构（JAUS）》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A Survey on Causal Reinforcement Learning

Arxiv

29+阅读 · 2023年2月10日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

46+阅读 · 2022年8月2日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

Reinforcement Learning based Air Combat Maneuver Generation

Reinforcement Learning based Air Combat Maneuver Generation

Arxiv

92+阅读 · 2022年1月14日

On the Opportunities and Risks of Foundation Models

Arxiv

30+阅读 · 2021年8月18日

Recent Advances and Trends in Multimodal Deep Learning: A Review

Arxiv

57+阅读 · 2021年5月24日

The Confluence of Networks, Games and Learning

Arxiv

94+阅读 · 2021年5月17日

Aesthetics, Personalization and Recommendation: A survey on Deep Learning in Fashion

Arxiv

13+阅读 · 2021年1月20日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

80+阅读 · 2020年1月19日

Video Captioning via Hierarchical Reinforcement Learning

Arxiv

20+阅读 · 2018年3月29日

相关基金

多设备在线租赁优化模型与竞争策略研究

国家自然科学基金

0+阅读 · 2014年12月31日

领域驱动空间co-location模式挖掘技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

虚拟机不确定错误的溯源方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

稀有变异的有效发现与识别

国家自然科学基金

0+阅读 · 2013年12月31日

微小RNA-1268在先天性心脏病发病中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

有个性的仿人情感交互机器人研究与实验验证

国家自然科学基金

2+阅读 · 2012年12月31日

因果推断的统计方法

国家自然科学基金

26+阅读 · 2011年12月31日

进化规划算法的计算时间难题研究

国家自然科学基金

0+阅读 · 2010年12月31日

一种新的活细胞内标记蛋白质错误折叠的荧光探剂

国家自然科学基金

0+阅读 · 2009年12月31日

HSP70与IL-6的交互作用在焦炉逸散物致早期心血管损害中的生物学意义

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员