This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL) that combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks. The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments and improves the interpretability of the learnt policies required to complete the cooperative task. The RMs associated with each sub-task are learnt in a decentralised manner and then used to guide the behaviour of each agent. By doing so, the complexity of a cooperative multi-agent problem is reduced, allowing for more effective learning. The results suggest that our approach is a promising direction for future research in MARL, especially in complex environments with large state spaces and multiple agents.
翻译:本文提出了一种新颖的多智能体强化学习方法,该方法将合作性任务分解与学习编码子任务结构的奖励机相结合。所提方法有助于处理部分可观测环境中奖励的非马尔可夫性质,并提升完成合作任务所需学习策略的可解释性。每个子任务对应的奖励机以去中心化方式进行学习,随后用于指导各智能体的行为。通过这种方式,合作性多智能体问题的复杂性得以降低,从而支持更高效的学习。实验结果表明,我们的方法为多智能体强化学习的未来研究提供了有前景的方向,尤其在具有大状态空间和多个智能体的复杂环境中。