Model-based reinforcement learning (MBRL) achieves significant sample efficiency in practice in comparison to model-free RL, but its performance is often limited by the existence of model prediction error. To reduce the model error, standard MBRL approaches train a single well-designed network to fit the entire environment dynamics, but this wastes rich information on multiple sub-dynamics which can be modeled separately, allowing us to construct the world model more accurately. In this paper, we propose the Environment Dynamics Decomposition (ED2), a novel world model construction framework that models the environment in a decomposing manner. ED2 contains two key components: sub-dynamics discovery (SD2) and dynamics decomposition prediction (D2P). SD2 discovers the sub-dynamics in an environment automatically and then D2P constructs the decomposed world model following the sub-dynamics. ED2 can be easily combined with existing MBRL algorithms and empirical results show that ED2 significantly reduces the model error, increases the sample efficiency, and achieves higher asymptotic performance when combined with the state-of-the-art MBRL algorithms on various continuous control tasks. Our code is open source and available at https://github.com/ED2-source-code/ED2.
翻译:基于模型的强化学习(MBRL)在实际应用中相比无模型强化学习取得了显著的样本效率,但其性能常受限于模型预测误差的存在。为降低模型误差,标准MBRL方法训练单一精心设计的网络来拟合整个环境动态,但这种做法忽视了可单独建模的多个子动态所蕴含的丰富信息——通过分别建模这些子动态,我们能够更精确地构建世界模型。本文提出环境动态分解(ED2),这是一种以分解方式建模环境的新型世界模型构建框架。ED2包含两个关键组件:子动态发现(SD2)与动态分解预测(D2P)。SD2自动发现环境中的子动态,D2P则依据子动态构建分解式世界模型。ED2可便捷地与现有MBRL算法结合,实验结果表明:在各类连续控制任务中,ED2与最先进的MBRL算法联用时,显著降低了模型误差,提升了样本效率,并实现了更高的渐近性能。我们的代码已开源,可通过https://github.com/ED2-source-code/ED2获取。