Model-based reinforcement learning is a widely accepted solution for solving excessive sample demands. However, the predictions of the dynamics models are often not accurate enough, and the resulting bias may incur catastrophic decisions due to insufficient robustness. Therefore, it is highly desired to investigate how to improve the robustness of model-based RL algorithms while maintaining high sampling efficiency. In this paper, we propose Model-Based Double-dropout Planning (MBDP) to balance robustness and efficiency. MBDP consists of two kinds of dropout mechanisms, where the rollout-dropout aims to improve the robustness with a small cost of sample efficiency, while the model-dropout is designed to compensate for the lost efficiency at a slight expense of robustness. By combining them in a complementary way, MBDP provides a flexible control mechanism to meet different demands of robustness and efficiency by tuning two corresponding dropout ratios. The effectiveness of MBDP is demonstrated both theoretically and experimentally.
翻译:基于模型的强化学习是解决样本需求量过大的普遍方案。然而,动力学模型的预测往往不够精确,由此产生的偏差可能因鲁棒性不足而导致灾难性决策。因此,如何提升基于模型强化学习算法的鲁棒性同时保持高采样效率,是一个亟需研究的问题。本文提出基于模型的双重丢弃规划方法(MBDP)以平衡鲁棒性与效率。MBDP包含两种丢弃机制:其中展开丢弃机制旨在以较小的样本效率代价提升鲁棒性,而模型丢弃机制则设计用于以轻微牺牲鲁棒性为代价补偿损失效率。通过互补组合,MBDP提供了灵活的控制机制,通过调节相应的两个丢弃比率满足不同的鲁棒性与效率需求。理论分析与实验验证均证明了MBDP的有效性。