Deep reinforcement learning (DRL) often struggles with complex robotic manipulation tasks due to low sample efficiency and biased value estimation. Model-based reinforcement learning (MBRL) improves efficiency by leveraging environment dynamics, with prior work integrating Model Predictive Control (MPC) to enhance policy robustness through online trajectory optimization. However, existing MBRL approaches still suffer from high model bias, task-specific cost function design, and significant computational overhead. To address these challenges, we propose Q-guided Stein Variational Model Predictive Actor-Critic (Q-STAC)--a unified framework that bridges Bayesian MPC and Soft Actor-Critic (SAC). Q-STAC employs Stein Variational Gradient Descent (SVGD) to iteratively optimize action sequences sampled from a learned prior distribution guided by Q-values, thereby eliminating manual cost-function engineering. By performing short-horizon model-predictive rollouts, Q-STAC reduces cumulative prediction errors, improves training stability and reduces computational complexity. Experiments on simulated particle navigation, diverse robotic manipulation tasks, and a real-world fruit-picking scenario demonstrate that Q-STAC consistently achieves superior sample efficiency, stability, and overall performance compared to both model-free and model-based baselines.


翻译:深度强化学习(DRL)在处理复杂机器人操作任务时,常因样本效率低下和价值估计偏差而受限。基于模型的强化学习(MBRL)通过利用环境动态模型提升效率,先前研究通过集成模型预测控制(MPC)实现在线轨迹优化以增强策略鲁棒性。然而,现有MBRL方法仍面临模型偏差高、任务特定成本函数设计复杂以及计算开销大等问题。为应对这些挑战,本文提出Q值引导的斯坦因变分模型预测演员-评论家(Q-STAC)——一个融合贝叶斯MPC与柔性演员-评论家(SAC)的统一框架。Q-STAC采用斯坦因变分梯度下降(SVGD),在Q值引导下对从学习到的先验分布中采样的动作序列进行迭代优化,从而无需人工设计成本函数。通过执行短时域模型预测推演,Q-STAC降低了累积预测误差,提升了训练稳定性并减少了计算复杂度。在模拟粒子导航、多样化机器人操作任务及真实世界水果采摘场景中的实验表明,相较于无模型与基于模型的基线方法,Q-STAC在样本效率、稳定性和整体性能上均表现出显著优势。

0
下载
关闭预览

相关内容

ACM/IEEE第23届模型驱动工程语言和系统国际会议,是模型驱动软件和系统工程的首要会议系列,由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来,模型涵盖了建模的各个方面,从语言和方法到工具和应用程序。模特的参加者来自不同的背景,包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛,参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会,并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。 官网链接:http://www.modelsconference.org/
Top
微信扫码咨询专知VIP会员