Post-Episodic Reinforcement Learning Inference - 专知论文

会员服务 ·

0

估计/估计量 · 规范化的 · 推断 · Learning · Weight ·

2023 年 2 月 17 日

Post-Episodic Reinforcement Learning Inference

翻译：后经验性强化学习推断

Vasilis Syrgkanis,Ruohan Zhan

We consider estimation and inference with data collected from episodic reinforcement learning (RL) algorithms; i.e. adaptive experimentation algorithms that at each period (aka episode) interact multiple times in a sequential manner with a single treated unit. Our goal is to be able to evaluate counterfactual adaptive policies after data collection and to estimate structural parameters such as dynamic treatment effects, which can be used for credit assignment (e.g. what was the effect of the first period action on the final outcome). Such parameters of interest can be framed as solutions to moment equations, but not minimizers of a population loss function, leading to Z-estimation approaches in the case of static data. However, such estimators fail to be asymptotically normal in the case of adaptive data collection. We propose a re-weighted Z-estimation approach with carefully designed adaptive weights to stabilize the episode-varying estimation variance, which results from the nonstationary policy that typical episodic RL algorithms invoke. We identify proper weighting schemes to restore the consistency and asymptotic normality of the re-weighted Z-estimators for target parameters, which allows for hypothesis testing and constructing reliable confidence regions for target parameters of interest. Primary applications include dynamic treatment effect estimation and dynamic off-policy evaluation.

翻译：我们考虑从经验性强化学习算法收集的数据中进行估计与推断的问题；即自适应性实验算法，该算法在每个周期（即剧集）中与单个处理单元以序列方式多次交互。我们的目标是在数据收集后能够评估反事实自适应策略，并估计动态处理效应等结构参数，这些参数可用于信用分配（例如，第一周期行动对最终结果的影响）。此类感兴趣的参数可以表述为矩方程的解，但并非总体损失函数的最小化者，这导致了在静态数据情况下的Z估计方法。然而，在自适应数据收集的情况下，此类估计量无法渐近正态。我们提出了一种重新加权的Z估计方法，该方法使用精心设计的自适应权重来稳定由典型经验性RL算法引发的非平稳策略导致的剧集变化估计方差。我们确定了适当的加权方案，以恢复目标参数的重加权Z估计量的一致性和渐近正态性，从而允许对感兴趣的目标参数进行假设检验并构建可靠的置信区域。主要应用包括动态处理效应估计和动态离线策略评估。

0

相关内容

估计/估计量

估计/估计量

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

24+阅读 · 2022年3月19日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

86+阅读 · 2020年2月18日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

102+阅读 · 2020年2月8日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

99+阅读 · 2019年12月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

恐惧与负性情绪中多巴胺神经元功能的改变与回路机制

国家自然科学基金

0+阅读 · 2015年12月31日

果蝇突触后谷氨酸受体不同亚型的稳态调控机制

国家自然科学基金

0+阅读 · 2013年12月31日

miR-150通过VEGF和Tie-2双重调控脑梗死后血管新生和血管渗漏

国家自然科学基金

0+阅读 · 2013年12月31日

RegIII信号通路与SOCS3甲基化协同调控胰腺炎症恶性转化的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Cdk5介导的Drp1磷酸化对线粒体的调控及其在阿尔茨海默病中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

阿尔茨海默病p75NTR对Tau蛋白过度磷酸化的调控作用和机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

视黄酸诱导神经干细胞分化过程中蛋白磷酸化的动态研究

国家自然科学基金

0+阅读 · 2012年12月31日

β淀粉样蛋白对神经突触传递和可塑性的影晌

国家自然科学基金

0+阅读 · 2011年12月31日

间隙连接蛋白Cx43磷酸化状态对白血病骨髓基质细胞GJIC功能的影响及作用机制研究

国家自然科学基金

0+阅读 · 2010年12月31日

新的肿瘤抑制分子－钙周期素结合蛋白（CacyBP/SIP）对细胞周期G2/M期的调控作用

国家自然科学基金

0+阅读 · 2008年12月31日

Optimism and Delays in Episodic Reinforcement Learning

Arxiv

0+阅读 · 2023年4月6日

AR3n: A Reinforcement Learning-based Assist-As-Needed Controller for Robotic Rehabilitation

Arxiv

0+阅读 · 2023年4月6日

A Policy-Guided Imitation Approach for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年4月5日

Risk-Aware Distributed Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年4月4日

Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年4月4日

Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

Arxiv

0+阅读 · 2023年4月3日

A Survey on Causal Reinforcement Learning

Arxiv

29+阅读 · 2023年2月10日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

17+阅读 · 2018年6月27日

VIP会员

文章信息

相关主题

估计/估计量

最新内容

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

0+阅读 · 今天14:41

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

专知会员服务

2+阅读 · 今天14:37

美空军AI完成F-16战斗机自主空战历史性试飞

美空军AI完成F-16战斗机自主空战历史性试飞

专知会员服务

2+阅读 · 今天14:13

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

专知会员服务

2+阅读 · 今天14:11

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

专知会员服务

2+阅读 · 今天14:05

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

专知会员服务

2+阅读 · 今天13:23

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

专知会员服务

1+阅读 · 今天13:11

深入Project Maven：为何人工智能在战场上依然失灵

深入Project Maven：为何人工智能在战场上依然失灵

专知会员服务

14+阅读 · 7月19日

锻造未来士兵：外骨骼、基因工程与赛博格

锻造未来士兵：外骨骼、基因工程与赛博格

专知会员服务

7+阅读 · 7月19日

《无人机系统（UAS）通信网状网络试验性部署》50页报告

《无人机系统（UAS）通信网状网络试验性部署》50页报告

专知会员服务

7+阅读 · 7月19日

《无人机蜂群通信技术研究》50页

《无人机蜂群通信技术研究》50页

专知会员服务

8+阅读 · 7月19日

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

专知会员服务

11+阅读 · 7月18日

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

专知会员服务

8+阅读 · 7月18日

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

专知会员服务

13+阅读 · 7月18日

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

专知会员服务

9+阅读 · 7月18日

相关VIP内容

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

24+阅读 · 2022年3月19日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

86+阅读 · 2020年2月18日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

102+阅读 · 2020年2月8日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

99+阅读 · 2019年12月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

印度精确打击与指挥架构的断层

美空军AI完成F-16战斗机自主空战历史性试飞

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Optimism and Delays in Episodic Reinforcement Learning

Arxiv

0+阅读 · 2023年4月6日

AR3n: A Reinforcement Learning-based Assist-As-Needed Controller for Robotic Rehabilitation

Arxiv

0+阅读 · 2023年4月6日

A Policy-Guided Imitation Approach for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年4月5日

Risk-Aware Distributed Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年4月4日

Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年4月4日

Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

Arxiv

0+阅读 · 2023年4月3日

A Survey on Causal Reinforcement Learning

Arxiv

29+阅读 · 2023年2月10日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

17+阅读 · 2018年6月27日

相关基金

恐惧与负性情绪中多巴胺神经元功能的改变与回路机制

国家自然科学基金

0+阅读 · 2015年12月31日

果蝇突触后谷氨酸受体不同亚型的稳态调控机制

国家自然科学基金

0+阅读 · 2013年12月31日

miR-150通过VEGF和Tie-2双重调控脑梗死后血管新生和血管渗漏

国家自然科学基金

0+阅读 · 2013年12月31日

RegIII信号通路与SOCS3甲基化协同调控胰腺炎症恶性转化的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Cdk5介导的Drp1磷酸化对线粒体的调控及其在阿尔茨海默病中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

阿尔茨海默病p75NTR对Tau蛋白过度磷酸化的调控作用和机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

视黄酸诱导神经干细胞分化过程中蛋白磷酸化的动态研究

国家自然科学基金

0+阅读 · 2012年12月31日

β淀粉样蛋白对神经突触传递和可塑性的影晌

国家自然科学基金

0+阅读 · 2011年12月31日

间隙连接蛋白Cx43磷酸化状态对白血病骨髓基质细胞GJIC功能的影响及作用机制研究

国家自然科学基金

0+阅读 · 2010年12月31日

新的肿瘤抑制分子－钙周期素结合蛋白（CacyBP/SIP）对细胞周期G2/M期的调控作用

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员