Nonuniqueness and Convergence to Equivalent Solutions in Observer-based Inverse Reinforcement Learning - 专知论文

会员服务 ·

0

逆强化学习 · Learning · 强化学习 · 在线 · 代价函数 ·

2024 年 5 月 30 日

Nonuniqueness and Convergence to Equivalent Solutions in Observer-based Inverse Reinforcement Learning

翻译：基于观测器的逆强化学习中的非唯一性与等效解收敛性

Jared Town,Zachary Morrison,Rushikesh Kamalapurkar

A key challenge in solving the deterministic inverse reinforcement learning (IRL) problem online and in real-time is the existence of multiple solutions. Nonuniqueness necessitates the study of the notion of equivalent solutions, i.e., solutions that result in a different cost functional but same feedback matrix, and convergence to such solutions. While offline algorithms that result in convergence to equivalent solutions have been developed in the literature, online, real-time techniques that address nonuniqueness are not available. In this paper, a regularized history stack observer that converges to approximately equivalent solutions of the IRL problem is developed. Novel data-richness conditions are developed to facilitate the analysis and simulation results are provided to demonstrate the effectiveness of the developed technique.

翻译：确定性逆强化学习（IRL）问题在线实时求解的一个关键挑战在于解的多重性。非唯一性要求我们研究等效解的概念，即那些导致不同代价泛函但产生相同反馈矩阵的解，并研究向此类解的收敛性。尽管文献中已发展出能够收敛至等效解的离线算法，但目前尚缺乏能够处理非唯一性的在线实时方法。本文提出一种正则化历史堆栈观测器，能够收敛至IRL问题的近似等效解。研究提出了新的数据充分性条件以辅助分析，并通过仿真结果验证了所提方法的有效性。

0

相关内容

逆强化学习

逆强化学习

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Weyl半金属TaAs/NbAs在极端条件下的输运性质研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

概率抽样设计及其统计推断方法

国家自然科学基金

6+阅读 · 2014年12月31日

Continually Learn to Map Visual Concepts to Large Language Models in Resource-constrained Environments

Arxiv

0+阅读 · 2024年7月11日

Safe and Reliable Training of Learning-Based Aerospace Controllers

Arxiv

0+阅读 · 2024年7月9日

Explainable Hyperdimensional Computing for Balancing Privacy and Transparency in Additive Manufacturing Monitoring

Arxiv

0+阅读 · 2024年7月9日

Simple and Interpretable Probabilistic Classifiers for Knowledge Graphs

Arxiv

0+阅读 · 2024年7月9日

Explainable AI for Enhancing Efficiency of DL-based Channel Estimation

Arxiv

0+阅读 · 2024年7月9日

Using Grammar Masking to Ensure Syntactic Validity in LLM-based Modeling Tasks

Arxiv

0+阅读 · 2024年7月9日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

Decomposed Mutual Information Estimation for Contrastive Representation Learning

Arxiv

11+阅读 · 2021年6月25日

Advances and Open Problems in Federated Learning

Advances and Open Problems in Federated Learning

Arxiv

18+阅读 · 2019年12月10日

Attention-based Ensemble for Deep Metric Learning

Arxiv

17+阅读 · 2018年4月2日

VIP会员

文章信息

相关主题

逆强化学习

最新内容

《越野作战环境下路径规划的多准则整数规划模型》

《越野作战环境下路径规划的多准则整数规划模型》

专知会员服务

4+阅读 · 今天8:06

人工智能大语言模型引擎如何重塑全球冲突信息环境最新50页

人工智能大语言模型引擎如何重塑全球冲突信息环境最新50页

专知会员服务

3+阅读 · 今天8:00

《防空系统对自主武器系统辩论中“有意义的人类控制”的启示》70页报告

《防空系统对自主武器系统辩论中“有意义的人类控制”的启示》70页报告

专知会员服务

3+阅读 · 今天7:53

“对标ChatGPT”：乌军研发Marichka AI系统用于战场筹划

“对标ChatGPT”：乌军研发Marichka AI系统用于战场筹划

专知会员服务

6+阅读 · 今天7:49

《同步多无人机系统中的故障与通信》

《同步多无人机系统中的故障与通信》

专知会员服务

2+阅读 · 今天6:23

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

专知会员服务

2+阅读 · 7月28日

博士论文 | 从算法到基础模型：强化学习的统一视角

博士论文 | 从算法到基础模型：强化学习的统一视角

专知会员服务

7+阅读 · 7月28日

面向国防作战的最佳自主与蜂群无人机技术

面向国防作战的最佳自主与蜂群无人机技术

专知会员服务

7+阅读 · 7月28日

《异构人类团队的协作决策过程混合建模研究》

《异构人类团队的协作决策过程混合建模研究》

专知会员服务

8+阅读 · 7月28日

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

专知会员服务

8+阅读 · 7月28日

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

专知会员服务

9+阅读 · 7月28日

博士论文 | 面向大模型推理的内存高效算法

博士论文 | 面向大模型推理的内存高效算法

专知会员服务

5+阅读 · 7月27日

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

专知会员服务

10+阅读 · 7月27日

《无人系统互操作性导论——无人系统联合架构（JAUS）》

《无人系统互操作性导论——无人系统联合架构（JAUS）》

专知会员服务

14+阅读 · 7月27日

美空军新型反无人机部队初探

美空军新型反无人机部队初探

专知会员服务

9+阅读 · 7月27日

相关VIP内容

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能大语言模型引擎如何重塑全球冲突信息环境最新50页

“对标ChatGPT”：乌军研发Marichka AI系统用于战场筹划

《越野作战环境下路径规划的多准则整数规划模型》

《防空系统对自主武器系统辩论中“有意义的人类控制”的启示》70页报告

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

相关论文

Continually Learn to Map Visual Concepts to Large Language Models in Resource-constrained Environments

Arxiv

0+阅读 · 2024年7月11日

Safe and Reliable Training of Learning-Based Aerospace Controllers

Arxiv

0+阅读 · 2024年7月9日

Explainable Hyperdimensional Computing for Balancing Privacy and Transparency in Additive Manufacturing Monitoring

Arxiv

0+阅读 · 2024年7月9日

Simple and Interpretable Probabilistic Classifiers for Knowledge Graphs

Arxiv

0+阅读 · 2024年7月9日

Explainable AI for Enhancing Efficiency of DL-based Channel Estimation

Arxiv

0+阅读 · 2024年7月9日

Using Grammar Masking to Ensure Syntactic Validity in LLM-based Modeling Tasks

Arxiv

0+阅读 · 2024年7月9日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

Decomposed Mutual Information Estimation for Contrastive Representation Learning

Arxiv

11+阅读 · 2021年6月25日

Advances and Open Problems in Federated Learning

Advances and Open Problems in Federated Learning

Arxiv

18+阅读 · 2019年12月10日

Attention-based Ensemble for Deep Metric Learning

Arxiv

17+阅读 · 2018年4月2日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Weyl半金属TaAs/NbAs在极端条件下的输运性质研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

概率抽样设计及其统计推断方法

国家自然科学基金

6+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员