In this paper, we study the cooperative Multi-Agent Reinforcement Learning (MARL) problems using Reward Machines (RMs) to specify the reward functions such that the prior knowledge of high-level events in a task can be leveraged to facilitate the learning efficiency. Unlike the existing work that RMs have been incorporated into MARL for task decomposition and policy learning in relatively simple domains or with an assumption of independencies among the agents, we present Multi-Agent Reinforcement Learning with a Hierarchy of RMs (MAHRM) that is capable of dealing with more complex scenarios when the events among agents can occur concurrently and the agents are highly interdependent. MAHRM exploits the relationship of high-level events to decompose a task into a hierarchy of simpler subtasks that are assigned to a small group of agents, so as to reduce the overall computational complexity. Experimental results in three cooperative MARL domains show that MAHRM outperforms other MARL methods using the same prior knowledge of high-level events.
翻译:本文研究利用奖励机器(Reward Machines, RMs)指定奖励函数的协同多智能体强化学习(Multi-Agent Reinforcement Learning, MARL)问题,从而利用任务中高层事件的先验知识提升学习效率。不同于现有将RMs融入MARL的工作(这些工作通常局限于相对简单的领域或在智能体间相互独立的假设下进行任务分解与策略学习),我们提出了基于奖励机器层次结构的多智能体强化学习(Multi-Agent Reinforcement Learning with a Hierarchy of RMs, MAHRM),该方法能够处理智能体间事件可并发发生且高度相互依赖的复杂场景。MAHRM通过挖掘高层事件间的关联,将任务分解为分配给少量智能体的层次化简单子任务,从而降低整体计算复杂度。在三个协同MARL领域的实验结果表明,MAHRM在利用相同高层事件先验知识的情况下优于其他MARL方法。