Reinforcement learning (RL) algorithms can find an optimal policy for a single agent to accomplish a particular task. However, many real-world problems require multiple agents to collaborate in order to achieve a common goal. For example, a robot executing a task in a warehouse may require the assistance of a drone to retrieve items from high shelves. In Decentralized Multi-Agent RL (DMARL), agents learn independently and then combine their policies at execution time, but often must satisfy constraints on compatibility of local policies to ensure that they can achieve the global task when combined. In this paper, we study how providing high-level symbolic knowledge to agents can help address unique challenges of this setting, such as privacy constraints, communication limitations, and performance concerns. In particular, we extend the formal tools used to check the compatibility of local policies with the team task, making decentralized training with theoretical guarantees usable in more scenarios. Furthermore, we empirically demonstrate that symbolic knowledge about the temporal evolution of events in the environment can significantly expedite the learning process in DMARL.
翻译:强化学习(RL)算法能够为单个智能体找到完成特定任务的最优策略。然而,许多现实世界问题需要多个智能体协作以实现共同目标。例如,仓库中执行任务的机器人可能需要无人机的协助以从高架取回物品。在去中心化多智能体强化学习(DMARL)中,智能体独立学习并在执行时组合其策略,但通常必须满足局部策略的兼容性约束,以确保组合后能够完成全局任务。本文研究了向智能体提供高层符号知识如何帮助应对该场景中的独特挑战,例如隐私约束、通信限制和性能问题。具体而言,我们扩展了用于检验局部策略与团队任务兼容性的形式化工具,使得具有理论保证的去中心化训练能够在更多场景中应用。此外,我们通过实验证明,关于环境中事件时序演化的符号知识能够显著加速DMARL的学习过程。