Human-AI Collaboration with Bandit Feedback - 专知论文

会员服务 ·

0

赌博机/老虎机 · Performance · Performer · 情景 · TEAM ·

2021 年 5 月 22 日

Human-AI Collaboration with Bandit Feedback

翻译：与强盗反馈协作

Ruijiang Gao,Maytal Saar-Tsechansky,Maria De-Arteaga,Ligong Han,Min Kyung Lee,Matthew Lease

from arxiv, Accepted at IJCAI 2021

Human-machine complementarity is important when neither the algorithm nor the human yield dominant performance across all instances in a given domain. Most research on algorithmic decision-making solely centers on the algorithm's performance, while recent work that explores human-machine collaboration has framed the decision-making problems as classification tasks. In this paper, we first propose and then develop a solution for a novel human-machine collaboration problem in a bandit feedback setting. Our solution aims to exploit the human-machine complementarity to maximize decision rewards. We then extend our approach to settings with multiple human decision makers. We demonstrate the effectiveness of our proposed methods using both synthetic and real human responses, and find that our methods outperform both the algorithm and the human when they each make decisions on their own. We also show how personalized routing in the presence of multiple human decision-makers can further improve the human-machine team performance.

翻译：当算法和人类在特定领域的所有情况下都无法产生主导性业绩时,人类机器的互补性是重要的。关于算法和人类产生的主要性能,大多数关于算法决策的研究都仅仅以算法的性能为中心,而最近探索人体机器合作的工作则将决策问题定为分类任务。在本文中,我们首先提出,然后为在强盗反馈环境中出现新的人体机器合作问题制定解决办法。我们的解决方案旨在利用人体机器的互补性,以尽量扩大决策奖赏。我们然后将我们的方法扩大到多重人类决策者的场合。我们用合成和真实的人类反应来展示我们拟议方法的有效性,我们发现我们的方法在每个人自己做决定时都超越了算法和人。我们还展示了在多个人类决策者在场的情况下个人化模式如何进一步提高人类机器团队的性能。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

对话推荐系统综述论文，35页pdf，A Survey on Conversational Recommender Systems

对话推荐系统综述论文，35页pdf，A Survey on Conversational Recommender Systems

专知会员服务

117+阅读 · 2020年4月3日

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

专知会员服务

23+阅读 · 2020年1月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【RecSys 2019报告】基于对话的推荐（Context Adaptation with Session‐based Recommenders）

【RecSys 2019报告】基于对话的推荐（Context Adaptation with Session‐based Recommenders）

专知会员服务

33+阅读 · 2019年9月20日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：位置感知的长序列会话推荐

LibRec 精选：位置感知的长序列会话推荐

LibRec智能推荐

3+阅读 · 2019年5月17日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

Centralized Model and Exploration Policy for Multi-Agent RL

Arxiv

0+阅读 · 2021年7月14日

Real Negatives Matter: Continuous Training with Real Negatives for Delayed Feedback Modeling

Arxiv

8+阅读 · 2021年4月29日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Image Retrieval with Mixed Initiative and Multimodal Feedback

Arxiv

5+阅读 · 2018年5月8日

Dialog-based Interactive Image Retrieval

Arxiv

5+阅读 · 2018年5月1日

Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems

Arxiv

6+阅读 · 2018年4月18日

Can Neural Machine Translation be Improved with User Feedback?

Arxiv

3+阅读 · 2018年4月16日

Human Interaction with Recommendation Systems

Arxiv

6+阅读 · 2018年3月28日

Collaborative Filtering with Topic and Social Latent Factors Incorporating Implicit Feedback

Arxiv

7+阅读 · 2018年3月26日

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

Arxiv

4+阅读 · 2018年3月22日

VIP会员

文章信息

相关主题

赌博机/老虎机

最新内容

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

1+阅读 · 今天16:54

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

1+阅读 · 今天16:52

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

6+阅读 · 今天8:00

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

5+阅读 · 今天7:44

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

4+阅读 · 今天7:28

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

4+阅读 · 今天7:18

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

专知会员服务

5+阅读 · 今天7:07

军事欺骗：供作战战术指挥官使用的工具

军事欺骗：供作战战术指挥官使用的工具

专知会员服务

4+阅读 · 今天7:03

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

专知会员服务

4+阅读 · 6月23日

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

6+阅读 · 6月23日

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

10+阅读 · 6月23日

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

4+阅读 · 6月23日

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

5+阅读 · 6月23日

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

8+阅读 · 6月23日

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

7+阅读 · 6月23日

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

对话推荐系统综述论文，35页pdf，A Survey on Conversational Recommender Systems

对话推荐系统综述论文，35页pdf，A Survey on Conversational Recommender Systems

专知会员服务

117+阅读 · 2020年4月3日

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

专知会员服务

23+阅读 · 2020年1月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【RecSys 2019报告】基于对话的推荐（Context Adaptation with Session‐based Recommenders）

【RecSys 2019报告】基于对话的推荐（Context Adaptation with Session‐based Recommenders）

专知会员服务

33+阅读 · 2019年9月20日

热门VIP内容

开通专知VIP会员享更多权益服务

Agentic RL：框架、实践与长程智能体训练

重新思考无人机时代的生存能力

综述 | 从问答到任务完成：Agent系统与Harness设计

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：位置感知的长序列会话推荐

LibRec 精选：位置感知的长序列会话推荐

LibRec智能推荐

3+阅读 · 2019年5月17日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

相关论文

Centralized Model and Exploration Policy for Multi-Agent RL

Arxiv

0+阅读 · 2021年7月14日

Real Negatives Matter: Continuous Training with Real Negatives for Delayed Feedback Modeling

Arxiv

8+阅读 · 2021年4月29日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Image Retrieval with Mixed Initiative and Multimodal Feedback

Arxiv

5+阅读 · 2018年5月8日

Dialog-based Interactive Image Retrieval

Arxiv

5+阅读 · 2018年5月1日

Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems

Arxiv

6+阅读 · 2018年4月18日

Can Neural Machine Translation be Improved with User Feedback?

Arxiv

3+阅读 · 2018年4月16日

Human Interaction with Recommendation Systems

Arxiv

6+阅读 · 2018年3月28日

Collaborative Filtering with Topic and Social Latent Factors Incorporating Implicit Feedback

Arxiv

7+阅读 · 2018年3月26日

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

Arxiv

4+阅读 · 2018年3月22日

微信扫码咨询专知VIP会员