强化学习低层四旋翼控制中的动态熵调节：随机性与确定性对比 (Dynamic Entropy Tuning in Reinforcement Learning Low-Level Quadcopter Control: Stochasticity vs Determinism) - 专知论文

会员服务 ·

0

确定性策略 · 算法 · 强化学习 · 强化学习算法 · 学习算法 ·

2025 年 12 月 20 日

Dynamic Entropy Tuning in Reinforcement Learning Low-Level Quadcopter Control: Stochasticity vs Determinism

翻译：强化学习低层四旋翼控制中的动态熵调节：随机性与确定性对比

Youssef Mahran,Zeyad Gamal,Ayman El-Badawy

from arxiv, This is the Author Accepted Manuscript version of a paper accepted for publication. The final published version is available via IEEE Xplore

This paper explores the impact of dynamic entropy tuning in Reinforcement Learning (RL) algorithms that train a stochastic policy. Its performance is compared against algorithms that train a deterministic one. Stochastic policies optimize a probability distribution over actions to maximize rewards, while deterministic policies select a single deterministic action per state. The effect of training a stochastic policy with both static entropy and dynamic entropy and then executing deterministic actions to control the quadcopter is explored. It is then compared against training a deterministic policy and executing deterministic actions. For the purpose of this research, the Soft Actor-Critic (SAC) algorithm was chosen for the stochastic algorithm while the Twin Delayed Deep Deterministic Policy Gradient (TD3) was chosen for the deterministic algorithm. The training and simulation results show the positive effect the dynamic entropy tuning has on controlling the quadcopter by preventing catastrophic forgetting and improving exploration efficiency.

翻译：本文探讨了在训练随机策略的强化学习算法中动态熵调节的影响，并将其性能与训练确定性策略的算法进行比较。随机策略通过优化动作的概率分布来最大化奖励，而确定性策略则为每个状态选择单一的确定性动作。研究探索了使用静态熵和动态熵训练随机策略后，执行确定性动作来控制四旋翼的效果，并将其与训练确定性策略并执行确定性动作的方法进行对比。在本研究中，随机算法选用Soft Actor-Critic，确定性算法则选用Twin Delayed Deep Deterministic Policy Gradient。训练与仿真结果表明，动态熵调节通过防止灾难性遗忘并提升探索效率，对四旋翼控制产生了积极影响。

0

相关内容

确定性策略

确定性策略

【牛津博士论文】无监督物体学习（Unsupervised Object Learning）

【牛津博士论文】无监督物体学习（Unsupervised Object Learning）

专知会员服务

14+阅读 · 2025年11月30日

144页ppt《扩散模型》，Google DeepMind Sander Dieleman

144页ppt《扩散模型》，Google DeepMind Sander Dieleman

专知会员服务

50+阅读 · 2025年11月21日

【NeurIPS2025】迈向开放世界的三维“物体性”学习

【NeurIPS2025】迈向开放世界的三维“物体性”学习

专知会员服务

12+阅读 · 2025年10月21日

【NeurIPS 2024】基于大型语言模型的三层学习用于时间序列OOD泛化

【NeurIPS 2024】基于大型语言模型的三层学习用于时间序列OOD泛化

专知会员服务

19+阅读 · 2024年10月13日

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

专知会员服务

24+阅读 · 2023年5月10日

【Erik J Bekkers博士论文】SE(2)中基于亚黎曼几何的视网膜图像分析，Retinal Image Analysis using Sub-Riemannian Geometry in SE(2)

【Erik J Bekkers博士论文】SE(2)中基于亚黎曼几何的视网膜图像分析，Retinal Image Analysis using Sub-Riemannian Geometry in SE(2)

专知会员服务

13+阅读 · 2022年3月27日

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

专知会员服务

20+阅读 · 2020年6月11日

【深度图相似学习综述】Deep Graph Similarity Learning: A Survey，29页pdf，117条参考文献

【深度图相似学习综述】Deep Graph Similarity Learning: A Survey，29页pdf，117条参考文献

专知会员服务

98+阅读 · 2019年12月31日

【Google AI新论文EfficientDet】规模化高效化的物体检测，EfficientDet: Scalable and Efficient Object Detection(附pdf)

【Google AI新论文EfficientDet】规模化高效化的物体检测，EfficientDet: Scalable and Efficient Object Detection(附pdf)

专知会员服务

27+阅读 · 2019年11月24日

《面向军事应用的数据驱动的行为建模》荷兰应用科学研究组织（NTO）

《面向军事应用的数据驱动的行为建模》荷兰应用科学研究组织（NTO）

专知

54+阅读 · 2022年6月2日

【CVPR 2020 Oral】小样本类增量学习

【CVPR 2020 Oral】小样本类增量学习

专知

20+阅读 · 2020年6月26日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

【NeurIPS 2019】vGraph：联合节点检测与节点表示生成模型

【NeurIPS 2019】vGraph：联合节点检测与节点表示生成模型

专知

23+阅读 · 2019年12月21日

论文浅尝 | Interaction Embeddings for Prediction and Explanation

论文浅尝 | Interaction Embeddings for Prediction and Explanation

开放知识图谱

11+阅读 · 2019年2月1日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

31+阅读 · 2018年7月12日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

斯坦福Jure Leskovec图表示学习：无监督和有监督方法（附PPT下载）

斯坦福Jure Leskovec图表示学习：无监督和有监督方法（附PPT下载）

专知

24+阅读 · 2017年12月17日

在TensorFlow中对比两大生成模型：VAE与GAN

在TensorFlow中对比两大生成模型：VAE与GAN

机器之心

12+阅读 · 2017年10月23日

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

AI研习社

18+阅读 · 2017年8月31日

基于散射点密度信息熵的层析SAR建筑三维重建新方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

测量误差数据下部分线性模型有约束统计推断理论

国家自然科学基金

2+阅读 · 2015年12月31日

基于饱和受限的多个体系统鲁棒一致性问题研究

国家自然科学基金

0+阅读 · 2015年12月31日

数值研究脉冲射频大气压N2/O2混合气体放电中等离子体的基本特性

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

切换系统的容错保成本和容错H无穷控制

国家自然科学基金

0+阅读 · 2015年12月31日

住宅体形系数的碳敏感性研究—以长三角地区建成住宅为实证

国家自然科学基金

0+阅读 · 2014年12月31日

随机系数和带跳的线性随机微分系统的H2/H∞控制

国家自然科学基金

0+阅读 · 2014年12月31日

基于狄利克雷过程的潜变量模型贝叶斯半参数分析

国家自然科学基金

2+阅读 · 2014年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

High Effort, Low Gain: Fundamental Limits of Active Learning for Linear Dynamical Systems

Arxiv

0+阅读 · 1月29日

Robust Deep Monte Carlo Counterfactual Regret Minimization: Addressing Theoretical Risks in Neural Fictitious Self-Play

Arxiv

0+阅读 · 1月28日

Revisiting Incremental Stochastic Majorization-Minimization Algorithms with Applications to Mixture of Experts

Arxiv

0+阅读 · 1月27日

Time-to-Injury Forecasting in Elite Female Football: A DeepHit Survival Approach

Arxiv

0+阅读 · 1月27日

More at Stake: How Payoff and Language Shape LLM Agent Strategies in Cooperation Dilemmas

Arxiv

0+阅读 · 1月27日

Unifying Low Dimensional Observations in Deep Learning Through the Deep Linear Unconstrained Feature Model

Arxiv

0+阅读 · 1月26日

Fiducial Inference for Random-Effects Calibration Models: Advancing Reliable Quantification in Environmental Analytical Chemistry

Arxiv

0+阅读 · 1月25日

The Proximal Surrogate Index: Long-Term Treatment Effects under Unobserved Confounding

Arxiv

0+阅读 · 1月25日

Zero-Shot Transfer Capabilities of the Sundial Foundation Model for Leaf Area Index Forecasting

Arxiv

0+阅读 · 1月15日

Probabilistic Insights for Efficient Exploration Strategies in Reinforcement Learning

Arxiv

0+阅读 · 1月15日

VIP会员

文章信息

相关主题

确定性策略

强化学习算法

相关VIP内容

【牛津博士论文】无监督物体学习（Unsupervised Object Learning）

【牛津博士论文】无监督物体学习（Unsupervised Object Learning）

专知会员服务

14+阅读 · 2025年11月30日

144页ppt《扩散模型》，Google DeepMind Sander Dieleman

144页ppt《扩散模型》，Google DeepMind Sander Dieleman

专知会员服务

50+阅读 · 2025年11月21日

【NeurIPS2025】迈向开放世界的三维“物体性”学习

【NeurIPS2025】迈向开放世界的三维“物体性”学习

专知会员服务

12+阅读 · 2025年10月21日

【NeurIPS 2024】基于大型语言模型的三层学习用于时间序列OOD泛化

【NeurIPS 2024】基于大型语言模型的三层学习用于时间序列OOD泛化

专知会员服务

19+阅读 · 2024年10月13日

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

专知会员服务

24+阅读 · 2023年5月10日

【Erik J Bekkers博士论文】SE(2)中基于亚黎曼几何的视网膜图像分析，Retinal Image Analysis using Sub-Riemannian Geometry in SE(2)

【Erik J Bekkers博士论文】SE(2)中基于亚黎曼几何的视网膜图像分析，Retinal Image Analysis using Sub-Riemannian Geometry in SE(2)

专知会员服务

13+阅读 · 2022年3月27日

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

专知会员服务

20+阅读 · 2020年6月11日

【深度图相似学习综述】Deep Graph Similarity Learning: A Survey，29页pdf，117条参考文献

【深度图相似学习综述】Deep Graph Similarity Learning: A Survey，29页pdf，117条参考文献

专知会员服务

98+阅读 · 2019年12月31日

【Google AI新论文EfficientDet】规模化高效化的物体检测，EfficientDet: Scalable and Efficient Object Detection(附pdf)

【Google AI新论文EfficientDet】规模化高效化的物体检测，EfficientDet: Scalable and Efficient Object Detection(附pdf)

专知会员服务

27+阅读 · 2019年11月24日

热门VIP内容

开通专知VIP会员享更多权益服务

智能体记忆深度剖析：评价指标与系统局限性的分类体系及实证分析

《可信人工智能赋能系统的支柱》

【CMU博士论文】可靠轨迹预测的分层基石：数据、评估与方法

人工智能赋能边缘与自主系统：美陆军现代化进程聚焦威胁探测与战术边缘情报

相关资讯

《面向军事应用的数据驱动的行为建模》荷兰应用科学研究组织（NTO）

《面向军事应用的数据驱动的行为建模》荷兰应用科学研究组织（NTO）

专知

54+阅读 · 2022年6月2日

【CVPR 2020 Oral】小样本类增量学习

【CVPR 2020 Oral】小样本类增量学习

专知

20+阅读 · 2020年6月26日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

【NeurIPS 2019】vGraph：联合节点检测与节点表示生成模型

【NeurIPS 2019】vGraph：联合节点检测与节点表示生成模型

专知

23+阅读 · 2019年12月21日

论文浅尝 | Interaction Embeddings for Prediction and Explanation

论文浅尝 | Interaction Embeddings for Prediction and Explanation

开放知识图谱

11+阅读 · 2019年2月1日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

31+阅读 · 2018年7月12日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

斯坦福Jure Leskovec图表示学习：无监督和有监督方法（附PPT下载）

斯坦福Jure Leskovec图表示学习：无监督和有监督方法（附PPT下载）

专知

24+阅读 · 2017年12月17日

在TensorFlow中对比两大生成模型：VAE与GAN

在TensorFlow中对比两大生成模型：VAE与GAN

机器之心

12+阅读 · 2017年10月23日

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

AI研习社

18+阅读 · 2017年8月31日

相关论文

High Effort, Low Gain: Fundamental Limits of Active Learning for Linear Dynamical Systems

Arxiv

0+阅读 · 1月29日

Robust Deep Monte Carlo Counterfactual Regret Minimization: Addressing Theoretical Risks in Neural Fictitious Self-Play

Arxiv

0+阅读 · 1月28日

Revisiting Incremental Stochastic Majorization-Minimization Algorithms with Applications to Mixture of Experts

Arxiv

0+阅读 · 1月27日

Time-to-Injury Forecasting in Elite Female Football: A DeepHit Survival Approach

Arxiv

0+阅读 · 1月27日

More at Stake: How Payoff and Language Shape LLM Agent Strategies in Cooperation Dilemmas

Arxiv

0+阅读 · 1月27日

Unifying Low Dimensional Observations in Deep Learning Through the Deep Linear Unconstrained Feature Model

Arxiv

0+阅读 · 1月26日

Fiducial Inference for Random-Effects Calibration Models: Advancing Reliable Quantification in Environmental Analytical Chemistry

Arxiv

0+阅读 · 1月25日

The Proximal Surrogate Index: Long-Term Treatment Effects under Unobserved Confounding

Arxiv

0+阅读 · 1月25日

Zero-Shot Transfer Capabilities of the Sundial Foundation Model for Leaf Area Index Forecasting

Arxiv

0+阅读 · 1月15日

Probabilistic Insights for Efficient Exploration Strategies in Reinforcement Learning

Arxiv

0+阅读 · 1月15日

相关基金

基于散射点密度信息熵的层析SAR建筑三维重建新方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

测量误差数据下部分线性模型有约束统计推断理论

国家自然科学基金

2+阅读 · 2015年12月31日

基于饱和受限的多个体系统鲁棒一致性问题研究

国家自然科学基金

0+阅读 · 2015年12月31日

数值研究脉冲射频大气压N2/O2混合气体放电中等离子体的基本特性

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

切换系统的容错保成本和容错H无穷控制

国家自然科学基金

0+阅读 · 2015年12月31日

住宅体形系数的碳敏感性研究—以长三角地区建成住宅为实证

国家自然科学基金

0+阅读 · 2014年12月31日

随机系数和带跳的线性随机微分系统的H2/H∞控制

国家自然科学基金

0+阅读 · 2014年12月31日

基于狄利克雷过程的潜变量模型贝叶斯半参数分析

国家自然科学基金

2+阅读 · 2014年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员