Sparsity of rewards while applying a deep reinforcement learning method negatively affects its sample-efficiency. A viable solution to deal with the sparsity of rewards is to learn via intrinsic motivation which advocates for adding an intrinsic reward to the reward function to encourage the agent to explore the environment and expand the sample space. Though intrinsic motivation methods are widely used to improve data-efficient learning in the reinforcement learning model, they also suffer from the so-called detachment problem. In this article, we discuss the limitations of intrinsic curiosity module in sparse-reward multi-agent reinforcement learning and propose a method called I-Go-Explore that combines the intrinsic curiosity module with the Go-Explore framework to alleviate the detachment problem.
翻译:深度强化学习方法在奖励稀疏时会对其样本效率产生负面影响。处理奖励稀疏性的一种可行方案是通过内在动机进行学习,该方法主张在奖励函数中添加内在奖励以鼓励智能体探索环境并扩展样本空间。尽管内在动机方法被广泛用于提升强化学习模型的数据高效学习能力,但其仍存在所谓的"解离问题"。本文探讨了内在好奇心模块在稀疏奖励多智能体强化学习中的局限性,并提出一种名为I-Go-Explore的方法,该方法将内在好奇心模块与Go-Explore框架相结合以缓解解离问题。