Reward shaping is effective in addressing the sparse-reward challenge in reinforcement learning by providing immediate feedback through auxiliary informative rewards. Based on the reward shaping strategy, we propose a novel multi-task reinforcement learning framework, that integrates a centralized reward agent (CRA) and multiple distributed policy agents. The CRA functions as a knowledge pool, which aims to distill knowledge from various tasks and distribute it to individual policy agents to improve learning efficiency. Specifically, the shaped rewards serve as a straightforward metric to encode knowledge. This framework not only enhances knowledge sharing across established tasks but also adapts to new tasks by transferring valuable reward signals. We validate the proposed method on both discrete and continuous domains, demonstrating its robustness in multi-task sparse-reward settings and its effective transferability to unseen tasks.
翻译:奖励塑形通过提供辅助信息奖励的即时反馈,有效解决了强化学习中的稀疏奖励问题。基于奖励塑形策略,我们提出了一种新颖的多任务强化学习框架,该框架集成了一个集中式奖励智能体(CRA)和多个分布式策略智能体。CRA充当知识池,旨在从不同任务中提炼知识,并将其分发给各个策略智能体以提高学习效率。具体而言,塑形奖励作为一种直接的知识编码度量。该框架不仅增强了已建立任务间的知识共享,还能通过迁移有价值的奖励信号来适应新任务。我们在离散和连续领域验证了所提方法,证明了其在多任务稀疏奖励设置下的鲁棒性以及对未见任务的有效可迁移性。