This paper addresses the prediction stability, prediction accuracy and control capability of the current probabilistic model-based reinforcement learning (MBRL) built on neural networks. A novel approach dropout-based probabilistic ensembles with trajectory sampling (DPETS) is proposed where the system uncertainty is stably predicted by combining the Monte-Carlo dropout and trajectory sampling in one framework. Its loss function is designed to correct the fitting error of neural networks for more accurate prediction of probabilistic models. The state propagation in its policy is extended to filter the aleatoric uncertainty for superior control capability. Evaluated by several Mujoco benchmark control tasks under additional disturbances and one practical robot arm manipulation task, DPETS outperforms related MBRL approaches in both average return and convergence velocity while achieving superior performance than well-known model-free baselines with significant sample efficiency. The open source code of DPETS is available at https://github.com/mrjun123/DPETS.
翻译:本文针对当前基于神经网络构建的概率模型驱动强化学习(MBRL)在预测稳定性、预测精度与控制能力方面的不足,提出了一种新型方法——基于丢弃的轨迹采样概率集成(DPETS)。该方法通过将蒙特卡洛丢弃与轨迹采样统一框架内结合,实现了对系统不确定性的稳定预测。其损失函数设计用于修正神经网络的拟合误差,从而提升概率模型预测的准确性。策略中的状态传播策略被扩展为过滤偶然不确定性,以获得更优的控制能力。通过在多个MuJoCo基准控制任务(含额外扰动)及一项实际机器人臂操控任务上的评估,DPETS在平均回报与收敛速度上均优于相关MBRL方法,并在样本效率显著提升的前提下,实现了超越知名无模型基线的卓越性能。DPETS的开源代码已发布于https://github.com/mrjun123/DPETS。