Using Machine Teaching to Investigate Human Assumptions when Teaching Reinforcement Learners - 专知论文

会员服务 ·

0

学习器 · Learning · 优化器 · Guidance · 回合 ·

2023 年 6 月 7 日

Using Machine Teaching to Investigate Human Assumptions when Teaching Reinforcement Learners

翻译：使用机器教学探究人类在教授强化学习智能体时的假设

Yun-Shiuan Chuang,Xuezhou Zhang,Yuzhe Ma,Mark K. Ho,Joseph L. Austerweil,Xiaojin Zhu

from arxiv, 21 pages, 4 figures

Successful teaching requires an assumption of how the learner learns - how the learner uses experiences from the world to update their internal states. We investigate what expectations people have about a learner when they teach them in an online manner using rewards and punishment. We focus on a common reinforcement learning method, Q-learning, and examine what assumptions people have using a behavioral experiment. To do so, we first establish a normative standard, by formulating the problem as a machine teaching optimization problem. To solve the machine teaching optimization problem, we use a deep learning approximation method which simulates learners in the environment and learns to predict how feedback affects the learner's internal states. What do people assume about a learner's learning and discount rates when they teach them an idealized exploration-exploitation task? In a behavioral experiment, we find that people can teach the task to Q-learners in a relatively efficient and effective manner when the learner uses a small value for its discounting rate and a large value for its learning rate. However, they still are suboptimal. We also find that providing people with real-time updates of how possible feedback would affect the Q-learner's internal states weakly helps them teach. Our results reveal how people teach using evaluative feedback and provide guidance for how engineers should design machine agents in a manner that is intuitive for people.

翻译：成功的教学需要对学习者如何学习——即学习者如何利用来自世界的经验更新其内部状态——作出假设。我们研究人们以在线方式使用奖励和惩罚进行教学时，对学习者抱有怎样的期望。我们聚焦于一种常见的强化学习方法——Q学习，并通过行为实验考察人们的假设。为此，我们首先将问题形式化为机器教学优化问题，以建立规范性标准。为解决该机器教学优化问题，我们采用一种深度学习近似方法，该方法在环境中模拟学习者，并学习预测反馈如何影响学习者的内部状态。当人们教授一个理想化的探索-利用任务时，他们对学习者的学习率和贴现率有何假设？在一项行为实验中，我们发现，当学习者使用较小的贴现率和较大的学习率时，人们能够以相对高效且有效的方式将任务教授给Q学习智能体。然而，他们的教学仍非最优。我们还发现，向人们提供关于可能的反馈如何影响Q学习智能体内部状态的实时更新，对其教学帮助甚微。我们的结果揭示了人们如何使用评价性反馈进行教学，并为工程师设计符合人类直觉的机器智能体提供了指导。

0

相关内容

学习器

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

55+阅读 · 2020年9月7日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

新型双组份Camassa-Holm方程的等谱问题及适定性研究

国家自然科学基金

0+阅读 · 2015年12月31日

RMND5A基因在遗传性泛发性色素异常症中的致病机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

云南地区环境与工程因素诱发滑坡的力学机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

表面等离子体共振的椭偏测量与超薄介质层表征

国家自然科学基金

0+阅读 · 2013年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

《计算机研究与发展》学术期刊

国家自然科学基金

1+阅读 · 2011年12月31日

Davey-Stewartson 型方程组的适定性和爆破性研究

国家自然科学基金

1+阅读 · 2011年12月31日

类群问题和数论运用

国家自然科学基金

1+阅读 · 2011年12月31日

Experimental Study on Reinforcement Learning-based Control of an Acrobot

Arxiv

0+阅读 · 2023年7月27日

Evaluation of Safety Constraints in Autonomous Navigation with Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年7月27日

Learning Task Automata for Reinforcement Learning using Hidden Markov Models

Arxiv

0+阅读 · 2023年7月26日

LiDAR-based drone navigation with reinforcement learning

Arxiv

0+阅读 · 2023年7月26日

Sim-to-Real Model-Based and Model-Free Deep Reinforcement Learning for Tactile Pushing

Arxiv

0+阅读 · 2023年7月26日

AI and Education: An Investigation into the Use of ChatGPT for Systems Thinking

Arxiv

0+阅读 · 2023年7月26日

Comprehensive Survey of Ternary Full Adders: Statistics, Corrections, and Assessments

Arxiv

0+阅读 · 2023年7月26日

Controlling the Latent Space of GANs through Reinforcement Learning: A Case Study on Task-based Image-to-Image Translation

Arxiv

0+阅读 · 2023年7月26日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

VIP会员

文章信息

相关主题

最新内容

《无人机对海面作战影响评估》

《无人机对海面作战影响评估》

专知会员服务

1+阅读 · 今天15:30

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

专知会员服务

2+阅读 · 今天15:27

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

专知会员服务

0+阅读 · 今天15:00

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

专知会员服务

0+阅读 · 今天14:55

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

专知会员服务

1+阅读 · 今天8:28

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

4+阅读 · 7月20日

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

专知会员服务

6+阅读 · 7月20日

美空军AI完成F-16战斗机自主空战历史性试飞

美空军AI完成F-16战斗机自主空战历史性试飞

专知会员服务

6+阅读 · 7月20日

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

专知会员服务

6+阅读 · 7月20日

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

专知会员服务

4+阅读 · 7月20日

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

专知会员服务

7+阅读 · 7月20日

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

专知会员服务

7+阅读 · 7月20日

深入Project Maven：为何人工智能在战场上依然失灵

深入Project Maven：为何人工智能在战场上依然失灵

专知会员服务

14+阅读 · 7月19日

锻造未来士兵：外骨骼、基因工程与赛博格

锻造未来士兵：外骨骼、基因工程与赛博格

专知会员服务

7+阅读 · 7月19日

《无人机系统（UAS）通信网状网络试验性部署》50页报告

《无人机系统（UAS）通信网状网络试验性部署》50页报告

专知会员服务

10+阅读 · 7月19日

相关VIP内容

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

55+阅读 · 2020年9月7日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

《无人机对海面作战影响评估》

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Experimental Study on Reinforcement Learning-based Control of an Acrobot

Arxiv

0+阅读 · 2023年7月27日

Evaluation of Safety Constraints in Autonomous Navigation with Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年7月27日

Learning Task Automata for Reinforcement Learning using Hidden Markov Models

Arxiv

0+阅读 · 2023年7月26日

LiDAR-based drone navigation with reinforcement learning

Arxiv

0+阅读 · 2023年7月26日

Sim-to-Real Model-Based and Model-Free Deep Reinforcement Learning for Tactile Pushing

Arxiv

0+阅读 · 2023年7月26日

AI and Education: An Investigation into the Use of ChatGPT for Systems Thinking

Arxiv

0+阅读 · 2023年7月26日

Comprehensive Survey of Ternary Full Adders: Statistics, Corrections, and Assessments

Arxiv

0+阅读 · 2023年7月26日

Controlling the Latent Space of GANs through Reinforcement Learning: A Case Study on Task-based Image-to-Image Translation

Arxiv

0+阅读 · 2023年7月26日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

相关基金

新型双组份Camassa-Holm方程的等谱问题及适定性研究

国家自然科学基金

0+阅读 · 2015年12月31日

RMND5A基因在遗传性泛发性色素异常症中的致病机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

云南地区环境与工程因素诱发滑坡的力学机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

表面等离子体共振的椭偏测量与超薄介质层表征

国家自然科学基金

0+阅读 · 2013年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

《计算机研究与发展》学术期刊

国家自然科学基金

1+阅读 · 2011年12月31日

Davey-Stewartson 型方程组的适定性和爆破性研究

国家自然科学基金

1+阅读 · 2011年12月31日

类群问题和数论运用

国家自然科学基金

1+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员