Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards - 专知论文

会员服务 ·

0

矩 · Learning · MoDELS · 强化学习 · 值迭代 ·

2023 年 6 月 5 日

Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards

翻译：差分私有重尾奖励的情节强化学习

Yulian Wu,Xingyu Zhou,Sayak Ray Chowdhury,Di Wang

from arxiv, ICML 2023

In this paper, we study the problem of (finite horizon tabular) Markov decision processes (MDPs) with heavy-tailed rewards under the constraint of differential privacy (DP). Compared with the previous studies for private reinforcement learning that typically assume rewards are sampled from some bounded or sub-Gaussian distributions to ensure DP, we consider the setting where reward distributions have only finite $(1+v)$-th moments with some $v \in (0,1]$. By resorting to robust mean estimators for rewards, we first propose two frameworks for heavy-tailed MDPs, i.e., one is for value iteration and another is for policy optimization. Under each framework, we consider both joint differential privacy (JDP) and local differential privacy (LDP) models. Based on our frameworks, we provide regret upper bounds for both JDP and LDP cases and show that the moment of distribution and privacy budget both have significant impacts on regrets. Finally, we establish a lower bound of regret minimization for heavy-tailed MDPs in JDP model by reducing it to the instance-independent lower bound of heavy-tailed multi-armed bandits in DP model. We also show the lower bound for the problem in LDP by adopting some private minimax methods. Our results reveal that there are fundamental differences between the problem of private RL with sub-Gaussian and that with heavy-tailed rewards.

翻译：本文研究了在差分隐私约束下具有重尾奖励的（有限时域表格型）马尔可夫决策过程问题。与以往通常假设奖励来自有界或次高斯分布以确保差分隐私的私有强化学习研究不同，我们考虑奖励分布仅存在有限$(1+v)$阶矩（其中$v \in (0,1]$）的场景。通过借助奖励的鲁棒均值估计器，我们首先提出了两种适用于重尾MDP的框架：一种基于值迭代，另一种基于策略优化。在每个框架下，我们同时考虑了联合差分隐私和本地差分隐私模型。基于所提框架，我们给出了JDP和LDP情况下的遗憾上界，并证明了分布矩和隐私预算均对遗憾值有显著影响。最后，我们通过将问题归结为DP模型下重尾多臂强盗问题的实例无关下界，建立了JDP模型中重尾MDP的遗憾最小化下界，并采用私有极小极大方法给出了该问题在LDP下的下界。研究结果表明，具有次高斯奖励的私有强化学习与具有重尾奖励的私有强化学习之间存在本质差异。

0

相关内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

128+阅读 · 2022年4月21日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

129+阅读 · 2020年11月20日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

几类含∞-Laplace算子的特征值问题的研究

国家自然科学基金

1+阅读 · 2015年12月31日

场论中偏微分方程的涡旋解

国家自然科学基金

0+阅读 · 2014年12月31日

氰戊菊酯对睾酮的昼夜毒性中钟基因Bmal1的作用及调控机制

国家自然科学基金

0+阅读 · 2014年12月31日

光束在PT对称光子晶格中的线性动力学特性

国家自然科学基金

0+阅读 · 2013年12月31日

深紫外AlGaN光学各向异性与自旋轨道调制研究

国家自然科学基金

0+阅读 · 2013年12月31日

具有状态约束的Navier-Stokes方程的最优控制问题

国家自然科学基金

0+阅读 · 2013年12月31日

非线性算子零点的分裂算法：收敛性分析与应用

国家自然科学基金

0+阅读 · 2013年12月31日

红闪（sprite）对地闪的非线性响应与干涉效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

北极冰间水道反演和敏感性试验

国家自然科学基金

0+阅读 · 2012年12月31日

穿孔金属/反铁磁多层膜的远红外(THz)光学性质

国家自然科学基金

0+阅读 · 2012年12月31日

Rate-optimal Bayesian Simple Regret in Best Arm Identification

Arxiv

0+阅读 · 2023年7月26日

Settling the Sample Complexity of Online Reinforcement Learning

Arxiv

0+阅读 · 2023年7月25日

Submodular Reinforcement Learning

Arxiv

0+阅读 · 2023年7月25日

Differentially Private Distributed Estimation and Learning

Arxiv

0+阅读 · 2023年7月25日

A Differentially Private Weighted Empirical Risk Minimization Procedure and its Application to Outcome Weighted Learning

Arxiv

0+阅读 · 2023年7月24日

A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning

Arxiv

1+阅读 · 2023年7月24日

Provable Reset-free Reinforcement Learning by No-Regret Reduction

Arxiv

0+阅读 · 2023年7月22日

The importance of feature preprocessing for differentially private linear optimization

Arxiv

0+阅读 · 2023年7月19日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

VIP会员

文章信息

相关主题

最新内容

《无人机对海面作战影响评估》

《无人机对海面作战影响评估》

专知会员服务

10+阅读 · 7月21日

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

专知会员服务

9+阅读 · 7月21日

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

专知会员服务

3+阅读 · 7月21日

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

专知会员服务

5+阅读 · 7月21日

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

专知会员服务

7+阅读 · 7月21日

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

5+阅读 · 7月20日

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

专知会员服务

7+阅读 · 7月20日

美空军AI完成F-16战斗机自主空战历史性试飞

美空军AI完成F-16战斗机自主空战历史性试飞

专知会员服务

6+阅读 · 7月20日

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

专知会员服务

8+阅读 · 7月20日

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

专知会员服务

7+阅读 · 7月20日

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

专知会员服务

9+阅读 · 7月20日

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

专知会员服务

9+阅读 · 7月20日

深入Project Maven：为何人工智能在战场上依然失灵

深入Project Maven：为何人工智能在战场上依然失灵

专知会员服务

15+阅读 · 7月19日

锻造未来士兵：外骨骼、基因工程与赛博格

锻造未来士兵：外骨骼、基因工程与赛博格

专知会员服务

8+阅读 · 7月19日

《无人机系统（UAS）通信网状网络试验性部署》50页报告

《无人机系统（UAS）通信网状网络试验性部署》50页报告

专知会员服务

10+阅读 · 7月19日

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

128+阅读 · 2022年4月21日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

129+阅读 · 2020年11月20日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

《无人机对海面作战影响评估》

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

相关资讯

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Rate-optimal Bayesian Simple Regret in Best Arm Identification

Arxiv

0+阅读 · 2023年7月26日

Settling the Sample Complexity of Online Reinforcement Learning

Arxiv

0+阅读 · 2023年7月25日

Submodular Reinforcement Learning

Arxiv

0+阅读 · 2023年7月25日

Differentially Private Distributed Estimation and Learning

Arxiv

0+阅读 · 2023年7月25日

A Differentially Private Weighted Empirical Risk Minimization Procedure and its Application to Outcome Weighted Learning

Arxiv

0+阅读 · 2023年7月24日

A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning

Arxiv

1+阅读 · 2023年7月24日

Provable Reset-free Reinforcement Learning by No-Regret Reduction

Arxiv

0+阅读 · 2023年7月22日

The importance of feature preprocessing for differentially private linear optimization

Arxiv

0+阅读 · 2023年7月19日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

相关基金

几类含∞-Laplace算子的特征值问题的研究

国家自然科学基金

1+阅读 · 2015年12月31日

场论中偏微分方程的涡旋解

国家自然科学基金

0+阅读 · 2014年12月31日

氰戊菊酯对睾酮的昼夜毒性中钟基因Bmal1的作用及调控机制

国家自然科学基金

0+阅读 · 2014年12月31日

光束在PT对称光子晶格中的线性动力学特性

国家自然科学基金

0+阅读 · 2013年12月31日

深紫外AlGaN光学各向异性与自旋轨道调制研究

国家自然科学基金

0+阅读 · 2013年12月31日

具有状态约束的Navier-Stokes方程的最优控制问题

国家自然科学基金

0+阅读 · 2013年12月31日

非线性算子零点的分裂算法：收敛性分析与应用

国家自然科学基金

0+阅读 · 2013年12月31日

红闪（sprite）对地闪的非线性响应与干涉效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

北极冰间水道反演和敏感性试验

国家自然科学基金

0+阅读 · 2012年12月31日

穿孔金属/反铁磁多层膜的远红外(THz)光学性质

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员