Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems - 专知论文

会员服务 ·

0

控制问题 · 适应控制 · 等效 · 自适应控制 · 扰动 ·

2023 年 3 月 24 日

Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems

翻译：增强型RBMLE-UCB方法用于线性二次型系统的自适应控制

Akshay Mete,Rahul Singh,P. R. Kumar

from arxiv, 36th Conference on Neural Information Processing Systems (NeurIPS 2022). https://openreview.net/forum?id=7pNV4PCjbQy

We consider the problem of controlling an unknown stochastic linear system with quadratic costs - called the adaptive LQ control problem. We re-examine an approach called ''Reward Biased Maximum Likelihood Estimate'' (RBMLE) that was proposed more than forty years ago, and which predates the ''Upper Confidence Bound'' (UCB) method as well as the definition of ''regret'' for bandit problems. It simply added a term favoring parameters with larger rewards to the criterion for parameter estimation. We show how the RBMLE and UCB methods can be reconciled, and thereby propose an Augmented RBMLE-UCB algorithm that combines the penalty of the RBMLE method with the constraints of the UCB method, uniting the two approaches to optimism in the face of uncertainty. We establish that theoretically, this method retains $\Tilde{\mathcal{O}}(\sqrt{T})$ regret, the best-known so far. We further compare the empirical performance of the proposed Augmented RBMLE-UCB and the standard RBMLE (without the augmentation) with UCB, Thompson Sampling, Input Perturbation, Randomized Certainty Equivalence and StabL on many real-world examples including flight control of Boeing 747 and Unmanned Aerial Vehicle. We perform extensive simulation studies showing that the Augmented RBMLE consistently outperforms UCB, Thompson Sampling and StabL by a huge margin, while it is marginally better than Input Perturbation and moderately better than Randomized Certainty Equivalence.

翻译：我们考虑未知随机线性系统在二次成本下的控制问题——即自适应LQ控制问题。我们重新审视了一种名为"奖励偏置最大似然估计"（RBMLE）的方法，该方法提出于四十余年前，早于"上限置信区间"（UCB）方法以及针对赌博机问题中"遗憾"的定义。其核心思路是在参数估计准则中添加一项偏好更大奖励参数的正则项。我们揭示了RBMLE与UCB方法之间的内在联系，并据此提出一种增强型RBMLE-UCB算法——该算法融合了RBMLE方法的惩罚项与UCB方法的约束条件，将两种面对不确定性时的乐观主义策略统一起来。我们从理论上证明，该方法保留了$\Tilde{\mathcal{O}}(\sqrt{T})$的最优遗憾界。此外，我们在包括波音747飞行控制与无人机在内的多个实际案例中，将所提增强型RBMLE-UCB算法与标准RBMLE（未增强版本）、UCB、汤普森采样、输入扰动、随机化确定等价以及StabL方法进行了经验性能对比。大量仿真研究表明，增强型RBMLE算法的性能显著优于UCB、汤普森采样和StabL，略优于输入扰动，且适度优于随机化确定等价方法。

0

相关内容

控制问题

生成式对抗网络异常检测，GANs for Anomaly Detection

专知会员服务

34+阅读 · 2021年9月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

【NeuraIPS2020-谷歌】用于鲁棒性和不确定性量化的超参数集成

【NeuraIPS2020-谷歌】用于鲁棒性和不确定性量化的超参数集成

专知会员服务

13+阅读 · 2020年10月27日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

【MIT】对抗鲁棒性的流形正则化，Manifold Regularization for Adversarial Robustness

【MIT】对抗鲁棒性的流形正则化，Manifold Regularization for Adversarial Robustness

专知会员服务

28+阅读 · 2020年3月11日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【Google-普林斯顿】从学习速率中解开自适应梯度法，Disentangling Adaptive Gradient

专知会员服务

19+阅读 · 2020年3月5日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Ada Workshop @ ESEC/FSE 2022，邀妳线上参与国际顶会！

Ada Workshop @ ESEC/FSE 2022，邀妳线上参与国际顶会！

微软研究院AI头条

1+阅读 · 2022年11月10日

【IJCAI2022教程】对话推荐系统，88页ppt，Conversational Recommender Systems

【IJCAI2022教程】对话推荐系统，88页ppt，Conversational Recommender Systems

专知

2+阅读 · 2022年7月28日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇序列推荐相关论文—卷积序列嵌入学习、用户记忆网络、上下文GRU、迁移学习

【论文推荐】最新六篇序列推荐相关论文—卷积序列嵌入学习、用户记忆网络、上下文GRU、迁移学习

专知

10+阅读 · 2018年4月8日

【论文推荐】最新七篇知识图谱相关论文—知识表示学习、增强神经网络、链接预测、关系预测与提取、综述、递归特性生成、深度知识感知网络

【论文推荐】最新七篇知识图谱相关论文—知识表示学习、增强神经网络、链接预测、关系预测与提取、综述、递归特性生成、深度知识感知网络

专知

29+阅读 · 2018年3月6日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

模糊双线性跳变系统的多目标控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

几类控制系数未知非线性系统的输出反馈跟踪控制研究

国家自然科学基金

1+阅读 · 2013年12月31日

面向智能电网的广域阻尼控制(WADC)研究

国家自然科学基金

0+阅读 · 2013年12月31日

执行器饱和多时滞系统的控制综合及抗饱和设计

国家自然科学基金

0+阅读 · 2013年12月31日

具有Markov跳变参数的随机混合拟哈密顿系统的动力学与控制

国家自然科学基金

0+阅读 · 2012年12月31日

广义系统下含不稳定和非正则加权函数的奇异控制问题

国家自然科学基金

0+阅读 · 2012年12月31日

Volterra积分微分方程高效谱配置方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

超洛伦兹-高斯光束的构建及其用于表征大角度激光束的研究

国家自然科学基金

0+阅读 · 2009年12月31日

随机微分方程的逼近

国家自然科学基金

0+阅读 · 2009年12月31日

分布参数系统逻辑切换自适应控制及应用

国家自然科学基金

0+阅读 · 2009年12月31日

A multilinear HJB-POD method for the optimal control of PDEs

A multilinear HJB-POD method for the optimal control of PDEs

Arxiv

0+阅读 · 2023年5月15日

Accelerated Algorithms for Nonlinear Matrix Decomposition with the ReLU function

Arxiv

0+阅读 · 2023年5月15日

Fast Online Algorithms for Linear Programming

Arxiv

0+阅读 · 2023年5月13日

Sequential model correction for nonlinear inverse problems

Arxiv

0+阅读 · 2023年5月12日

Policy Gradient Algorithms Implicitly Optimize by Continuation

Arxiv

0+阅读 · 2023年5月11日

Manifold Regularized Tucker Decomposition Approach for Spatiotemporal Traffic Data Imputation

Arxiv

0+阅读 · 2023年5月11日

Robust Privacy-Preserving Models for Cluster-Level Confounding: Recognizing Disparities in Access to Transplantation

Arxiv

0+阅读 · 2023年5月10日

DNN Verification, Reachability, and the Exponential Function Problem

Arxiv

0+阅读 · 2023年5月10日

$FM^2$: Field-matrixed Factorization Machines for Recommender Systems

Arxiv

16+阅读 · 2021年2月20日

A Survey on Knowledge Graph-Based Recommender Systems

Arxiv

92+阅读 · 2020年2月28日

VIP会员

文章信息

相关主题

自适应控制

最新内容

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

专知会员服务

3+阅读 · 7月23日

《基于强化学习的自动化红队测试》

《基于强化学习的自动化红队测试》

专知会员服务

3+阅读 · 7月23日

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

专知会员服务

4+阅读 · 7月23日

“天降毒雾”：无人机如何使化学战重返乌克兰战场

“天降毒雾”：无人机如何使化学战重返乌克兰战场

专知会员服务

1+阅读 · 7月23日

伊朗不对称防空战略的演进

伊朗不对称防空战略的演进

专知会员服务

3+阅读 · 7月23日

对抗环境下超视距目标打击的情报支援

对抗环境下超视距目标打击的情报支援

专知会员服务

10+阅读 · 7月22日

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

专知会员服务

4+阅读 · 7月22日

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

专知会员服务

8+阅读 · 7月22日

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

专知会员服务

10+阅读 · 7月22日

《无人机对海面作战影响评估》

《无人机对海面作战影响评估》

专知会员服务

15+阅读 · 7月21日

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

专知会员服务

14+阅读 · 7月21日

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

专知会员服务

4+阅读 · 7月21日

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

专知会员服务

6+阅读 · 7月21日

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

专知会员服务

9+阅读 · 7月21日

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

7+阅读 · 7月20日

相关VIP内容

生成式对抗网络异常检测，GANs for Anomaly Detection

专知会员服务

34+阅读 · 2021年9月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

【NeuraIPS2020-谷歌】用于鲁棒性和不确定性量化的超参数集成

【NeuraIPS2020-谷歌】用于鲁棒性和不确定性量化的超参数集成

专知会员服务

13+阅读 · 2020年10月27日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

【MIT】对抗鲁棒性的流形正则化，Manifold Regularization for Adversarial Robustness

【MIT】对抗鲁棒性的流形正则化，Manifold Regularization for Adversarial Robustness

专知会员服务

28+阅读 · 2020年3月11日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【Google-普林斯顿】从学习速率中解开自适应梯度法，Disentangling Adaptive Gradient

专知会员服务

19+阅读 · 2020年3月5日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

热门VIP内容

开通专知VIP会员享更多权益服务

《基于强化学习的自动化红队测试》

“天降毒雾”：无人机如何使化学战重返乌克兰战场

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

相关资讯

Ada Workshop @ ESEC/FSE 2022，邀妳线上参与国际顶会！

Ada Workshop @ ESEC/FSE 2022，邀妳线上参与国际顶会！

微软研究院AI头条

1+阅读 · 2022年11月10日

【IJCAI2022教程】对话推荐系统，88页ppt，Conversational Recommender Systems

【IJCAI2022教程】对话推荐系统，88页ppt，Conversational Recommender Systems

专知

2+阅读 · 2022年7月28日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇序列推荐相关论文—卷积序列嵌入学习、用户记忆网络、上下文GRU、迁移学习

【论文推荐】最新六篇序列推荐相关论文—卷积序列嵌入学习、用户记忆网络、上下文GRU、迁移学习

专知

10+阅读 · 2018年4月8日

【论文推荐】最新七篇知识图谱相关论文—知识表示学习、增强神经网络、链接预测、关系预测与提取、综述、递归特性生成、深度知识感知网络

【论文推荐】最新七篇知识图谱相关论文—知识表示学习、增强神经网络、链接预测、关系预测与提取、综述、递归特性生成、深度知识感知网络

专知

29+阅读 · 2018年3月6日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

相关论文

A multilinear HJB-POD method for the optimal control of PDEs

A multilinear HJB-POD method for the optimal control of PDEs

Arxiv

0+阅读 · 2023年5月15日

Accelerated Algorithms for Nonlinear Matrix Decomposition with the ReLU function

Arxiv

0+阅读 · 2023年5月15日

Fast Online Algorithms for Linear Programming

Arxiv

0+阅读 · 2023年5月13日

Sequential model correction for nonlinear inverse problems

Arxiv

0+阅读 · 2023年5月12日

Policy Gradient Algorithms Implicitly Optimize by Continuation

Arxiv

0+阅读 · 2023年5月11日

Manifold Regularized Tucker Decomposition Approach for Spatiotemporal Traffic Data Imputation

Arxiv

0+阅读 · 2023年5月11日

Robust Privacy-Preserving Models for Cluster-Level Confounding: Recognizing Disparities in Access to Transplantation

Arxiv

0+阅读 · 2023年5月10日

DNN Verification, Reachability, and the Exponential Function Problem

Arxiv

0+阅读 · 2023年5月10日

$FM^2$: Field-matrixed Factorization Machines for Recommender Systems

Arxiv

16+阅读 · 2021年2月20日

A Survey on Knowledge Graph-Based Recommender Systems

Arxiv

92+阅读 · 2020年2月28日

相关基金

模糊双线性跳变系统的多目标控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

几类控制系数未知非线性系统的输出反馈跟踪控制研究

国家自然科学基金

1+阅读 · 2013年12月31日

面向智能电网的广域阻尼控制(WADC)研究

国家自然科学基金

0+阅读 · 2013年12月31日

执行器饱和多时滞系统的控制综合及抗饱和设计

国家自然科学基金

0+阅读 · 2013年12月31日

具有Markov跳变参数的随机混合拟哈密顿系统的动力学与控制

国家自然科学基金

0+阅读 · 2012年12月31日

广义系统下含不稳定和非正则加权函数的奇异控制问题

国家自然科学基金

0+阅读 · 2012年12月31日

Volterra积分微分方程高效谱配置方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

超洛伦兹-高斯光束的构建及其用于表征大角度激光束的研究

国家自然科学基金

0+阅读 · 2009年12月31日

随机微分方程的逼近

国家自然科学基金

0+阅读 · 2009年12月31日

分布参数系统逻辑切换自适应控制及应用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员