Repeated Principal-Agent Games with Unobserved Agent Rewards and Perfect-Knowledge Agents - 专知论文

会员服务 ·

0

博弈 · 构建 · 可持续交通 · 设计 · 因子 ·

2023 年 4 月 14 日

Repeated Principal-Agent Games with Unobserved Agent Rewards and Perfect-Knowledge Agents

翻译：重复性委托-代理博弈中不可观测的代理人收益与完全知识型代理人

Ilgin Dogan,Zuo-Jun Max Shen,Anil Aswani

from arxiv, 50 pages, 4 figures

Motivated by a number of real-world applications from domains like healthcare and sustainable transportation, in this paper we study a scenario of repeated principal-agent games within a multi-armed bandit (MAB) framework, where: the principal gives a different incentive for each bandit arm, the agent picks a bandit arm to maximize its own expected reward plus incentive, and the principal observes which arm is chosen and receives a reward (different than that of the agent) for the chosen arm. Designing policies for the principal is challenging because the principal cannot directly observe the reward that the agent receives for their chosen actions, and so the principal cannot directly learn the expected reward using existing estimation techniques. As a result, the problem of designing policies for this scenario, as well as similar ones, remains mostly unexplored. In this paper, we construct a policy that achieves a low regret (i.e., square-root regret up to a log factor) in this scenario for the case where the agent has perfect-knowledge about its own expected rewards for each bandit arm. We design our policy by first constructing an estimator for the agent's expected reward for each bandit arm. Since our estimator uses as data the sequence of incentives offered and subsequently chosen arms, the principal's estimation can be regarded as an analogy of online inverse optimization in MAB's. Next we construct a policy that we prove achieves a low regret by deriving finite-sample concentration bounds for our estimator. We conclude with numerical simulations demonstrating the applicability of our policy to real-life setting from collaborative transportation planning.

翻译：受医疗保健和可持续交通等领域的实际应用启发，本文研究多臂老虎机（MAB）框架下的重复性委托-代理博弈场景：委托人针对每个老虎机臂提供不同激励，代理人选择最大化自身期望收益与激励之和的臂，委托人观测到被选中的臂并获取该臂的收益（与代理人收益不同）。由于委托人无法直接观测代理人因选择动作所获收益，现有估计技术无法直接学习期望收益，导致此类场景及类似场景的策略设计问题尚未得到充分探索。本文针对代理人对其各臂期望收益具有完全知识的情形，构建了一种低遗憾（即对数因子下的平方根遗憾）策略。该策略通过首先构建代理人各臂期望收益的估计器实现——该估计器以激励序列及后续被选中的臂作为数据，使得委托人的估计可类比为多臂老虎机中的在线逆优化。随后，我们通过推导估计器的有限样本集中界，证明所构建策略可实现低遗憾。最终通过数值仿真验证了该策略在协同运输规划实际场景中的适用性。

0

相关内容

斯坦福大学《博弈论基础简介》2017版，A Brief Introduction to the Basics of Game Theory，21页论文

斯坦福大学《博弈论基础简介》2017版，A Brief Introduction to the Basics of Game Theory，21页论文

专知会员服务

33+阅读 · 2022年4月1日

【多目标多智能体系统决策】196页PDF布鲁塞尔自由大学博士论文，Decision Making in Multi-Objective Multi-Agent Systems——A Utility-Based Perspective

【多目标多智能体系统决策】196页PDF布鲁塞尔自由大学博士论文，Decision Making in Multi-Objective Multi-Agent Systems——A Utility-Based Perspective

专知会员服务

118+阅读 · 2022年3月18日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【KDD2021】基于因果反事实Shapley的MARL信度分配

专知会员服务

19+阅读 · 2021年7月11日

【KDD2020】具有条件公平性的算法决策，Algorithmic Decision Making with Conditional Fairness

【KDD2020】具有条件公平性的算法决策，Algorithmic Decision Making with Conditional Fairness

专知会员服务

22+阅读 · 2020年6月19日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

【CIKM 2019论文】基于关系型图卷积网络的代理发起的社会化电子商务推荐（Relation-Aware Graph Convolutional Networks for Agent-Initiated Social E-Commerce Recommendation）

【CIKM 2019论文】基于关系型图卷积网络的代理发起的社会化电子商务推荐（Relation-Aware Graph Convolutional Networks for Agent-Initiated Social E-Commerce Recommendation）

专知会员服务

56+阅读 · 2019年11月20日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

芝加哥大学计算机系助理教授Grant Ho招募计算机安全方向博士 / 硕士 / 实习生（2023 春 / 秋）

芝加哥大学计算机系助理教授Grant Ho招募计算机安全方向博士 / 硕士 / 实习生（2023 春 / 秋）

机器之心

0+阅读 · 2022年9月27日

快来报名啦 | 图灵奖得主—— Joseph Sifakis明日重磅开讲

快来报名啦 | 图灵奖得主—— Joseph Sifakis明日重磅开讲

学术头条

0+阅读 · 2022年6月16日

重磅开讲：图灵奖得主—— Joseph Sifakis

重磅开讲：图灵奖得主—— Joseph Sifakis

THU数据派

0+阅读 · 2022年6月13日

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

专知

16+阅读 · 2020年12月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习初探 - 从多臂老虎机问题说起

强化学习初探 - 从多臂老虎机问题说起

专知

10+阅读 · 2018年4月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

弱线性双层规划问题的理论与算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于3值抽象的假设-保证式PCTL*组合随机模型检验方法

国家自然科学基金

0+阅读 · 2013年12月31日

随机最优控制理论在委托代理问题中的应用

国家自然科学基金

1+阅读 · 2013年12月31日

线性不等式约束非凸二次规划的全局最优性条件及最优化方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

个性化动态路径诱导建模理论与方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

价格受限市场中的最优投资与消费决策及其应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

多目标(半)无限DC规划问题最优条件和对偶理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

考虑班轮公司和货代公司委托代理关系的集装箱调度管理

国家自然科学基金

0+阅读 · 2011年12月31日

不适定二层规划求解策略及在委托代理中的应用研究

国家自然科学基金

0+阅读 · 2011年12月31日

Attention-Based Recurrence for Multi-Agent Reinforcement Learning under Stochastic Partial Observability

Arxiv

0+阅读 · 2023年6月3日

Reducing Large Adaptation Spaces in Self-Adaptive Systems Using Machine Learning

Arxiv

0+阅读 · 2023年6月2日

What-is and How-to for Fairness in Machine Learning: A Survey, Reflection, and Perspective

Arxiv

0+阅读 · 2023年6月2日

The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects

Arxiv

0+阅读 · 2023年6月1日

Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear

Arxiv

0+阅读 · 2023年6月1日

RHFedMTL: Resource-Aware Hierarchical Federated Multi-Task Learning

Arxiv

0+阅读 · 2023年6月1日

Achieving Fairness in Multi-Agent Markov Decision Processes Using Reinforcement Learning

Arxiv

0+阅读 · 2023年6月1日

Metropolis-Hastings algorithm in joint-attention naming game: Experimental semiotics study

Arxiv

0+阅读 · 2023年5月31日

Everything You wanted to Know about Smart Agriculture

Arxiv

29+阅读 · 2022年1月13日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

VIP会员

文章信息

相关主题

可持续交通

最新内容

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

专知会员服务

0+阅读 · 35分钟前

GNN跨域综述：从消息传递到图基础模型

GNN跨域综述：从消息传递到图基础模型

专知会员服务

0+阅读 · 37分钟前

无人机自主控制与人工智能：系统性综述

无人机自主控制与人工智能：系统性综述

专知会员服务

10+阅读 · 今天7:25

巡飞弹与反无人机系统——现代战场的两大支柱

巡飞弹与反无人机系统——现代战场的两大支柱

专知会员服务

3+阅读 · 今天6:54

《打造“黄金舰队”》57页报告

《打造“黄金舰队”》57页报告

专知会员服务

3+阅读 · 今天6:52

《北约数字教官网络发展路径》128页报告

《北约数字教官网络发展路径》128页报告

专知会员服务

2+阅读 · 今天6:33

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

7+阅读 · 6月25日

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

6+阅读 · 6月25日

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

10+阅读 · 6月25日

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

8+阅读 · 6月25日

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

8+阅读 · 6月25日

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

8+阅读 · 6月25日

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

10+阅读 · 6月25日

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

9+阅读 · 6月25日

《国防领域敏感性分析白皮书》

《国防领域敏感性分析白皮书》

专知会员服务

9+阅读 · 6月25日

相关VIP内容

斯坦福大学《博弈论基础简介》2017版，A Brief Introduction to the Basics of Game Theory，21页论文

斯坦福大学《博弈论基础简介》2017版，A Brief Introduction to the Basics of Game Theory，21页论文

专知会员服务

33+阅读 · 2022年4月1日

【多目标多智能体系统决策】196页PDF布鲁塞尔自由大学博士论文，Decision Making in Multi-Objective Multi-Agent Systems——A Utility-Based Perspective

【多目标多智能体系统决策】196页PDF布鲁塞尔自由大学博士论文，Decision Making in Multi-Objective Multi-Agent Systems——A Utility-Based Perspective

专知会员服务

118+阅读 · 2022年3月18日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【KDD2021】基于因果反事实Shapley的MARL信度分配

专知会员服务

19+阅读 · 2021年7月11日

【KDD2020】具有条件公平性的算法决策，Algorithmic Decision Making with Conditional Fairness

【KDD2020】具有条件公平性的算法决策，Algorithmic Decision Making with Conditional Fairness

专知会员服务

22+阅读 · 2020年6月19日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

【CIKM 2019论文】基于关系型图卷积网络的代理发起的社会化电子商务推荐（Relation-Aware Graph Convolutional Networks for Agent-Initiated Social E-Commerce Recommendation）

【CIKM 2019论文】基于关系型图卷积网络的代理发起的社会化电子商务推荐（Relation-Aware Graph Convolutional Networks for Agent-Initiated Social E-Commerce Recommendation）

专知会员服务

56+阅读 · 2019年11月20日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

GNN跨域综述：从消息传递到图基础模型

巡飞弹与反无人机系统——现代战场的两大支柱

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

无人机自主控制与人工智能：系统性综述

相关资讯

芝加哥大学计算机系助理教授Grant Ho招募计算机安全方向博士 / 硕士 / 实习生（2023 春 / 秋）

芝加哥大学计算机系助理教授Grant Ho招募计算机安全方向博士 / 硕士 / 实习生（2023 春 / 秋）

机器之心

0+阅读 · 2022年9月27日

快来报名啦 | 图灵奖得主—— Joseph Sifakis明日重磅开讲

快来报名啦 | 图灵奖得主—— Joseph Sifakis明日重磅开讲

学术头条

0+阅读 · 2022年6月16日

重磅开讲：图灵奖得主—— Joseph Sifakis

重磅开讲：图灵奖得主—— Joseph Sifakis

THU数据派

0+阅读 · 2022年6月13日

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

专知

16+阅读 · 2020年12月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习初探 - 从多臂老虎机问题说起

强化学习初探 - 从多臂老虎机问题说起

专知

10+阅读 · 2018年4月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Attention-Based Recurrence for Multi-Agent Reinforcement Learning under Stochastic Partial Observability

Arxiv

0+阅读 · 2023年6月3日

Reducing Large Adaptation Spaces in Self-Adaptive Systems Using Machine Learning

Arxiv

0+阅读 · 2023年6月2日

What-is and How-to for Fairness in Machine Learning: A Survey, Reflection, and Perspective

Arxiv

0+阅读 · 2023年6月2日

The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects

Arxiv

0+阅读 · 2023年6月1日

Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear

Arxiv

0+阅读 · 2023年6月1日

RHFedMTL: Resource-Aware Hierarchical Federated Multi-Task Learning

Arxiv

0+阅读 · 2023年6月1日

Achieving Fairness in Multi-Agent Markov Decision Processes Using Reinforcement Learning

Arxiv

0+阅读 · 2023年6月1日

Metropolis-Hastings algorithm in joint-attention naming game: Experimental semiotics study

Arxiv

0+阅读 · 2023年5月31日

Everything You wanted to Know about Smart Agriculture

Arxiv

29+阅读 · 2022年1月13日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

相关基金

弱线性双层规划问题的理论与算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于3值抽象的假设-保证式PCTL*组合随机模型检验方法

国家自然科学基金

0+阅读 · 2013年12月31日

随机最优控制理论在委托代理问题中的应用

国家自然科学基金

1+阅读 · 2013年12月31日

线性不等式约束非凸二次规划的全局最优性条件及最优化方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

个性化动态路径诱导建模理论与方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

价格受限市场中的最优投资与消费决策及其应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

多目标(半)无限DC规划问题最优条件和对偶理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

考虑班轮公司和货代公司委托代理关系的集装箱调度管理

国家自然科学基金

0+阅读 · 2011年12月31日

不适定二层规划求解策略及在委托代理中的应用研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员