Steering cooperative multi-agent reinforcement learning (MARL) towards desired outcomes is challenging, particularly when the global guidance from a human on the whole multi-agent system is impractical in a large-scale MARL. On the other hand, designing external mechanisms (e.g., intrinsic rewards and human feedback) to coordinate agents mostly relies on empirical studies, lacking a easy-to-use research tool. In this work, we employ multi-agent influence diagrams (MAIDs) as a graphical framework to address the above issues. First, we introduce the concept of MARL interaction paradigms, using MAIDs to analyze and visualize both unguided self-organization and global guidance mechanisms in MARL. Then, we design a new MARL interaction paradigm, referred to as the targeted intervention paradigm that is applied to only a single targeted agent, so the problem of global guidance can be mitigated. In our implementation, we introduce a causal inference technique, referred to as Pre-Strategy Intervention (PSI), to realize the targeted intervention paradigm. Since MAIDs can be regarded as a special class of causal diagrams, a composite desired outcome that integrates the primary task goal and an additional desired outcome can be achieved by maximizing the corresponding causal effect through the PSI. Moreover, the bundled relevance graph analysis of MAIDs provides a tool to identify whether an MARL learning paradigm is workable under the design of an MARL interaction paradigm. In experiments, we demonstrate the effectiveness of our proposed targeted intervention, and verify the result of relevance graph analysis.
翻译:在合作式多智能体强化学习(MARL)中,引导系统朝向期望结果具有挑战性,尤其是在大规模MARL中,人类对整个多智能体系统进行全局指导往往不切实际。另一方面,设计外部机制(例如内在奖励和人类反馈)来协调智能体大多依赖于实证研究,缺乏易于使用的研究工具。在本工作中,我们采用多智能体影响图(MAID)作为图形化框架来解决上述问题。首先,我们引入MARL交互范式的概念,利用MAID分析和可视化MARL中无引导的自组织机制与全局指导机制。接着,我们设计了一种新的MARL交互范式,称为目标干预范式,该范式仅应用于单个目标智能体,从而缓解全局指导的问题。在我们的实现中,我们引入了一种称为预策略干预(PSI)的因果推断技术来实现目标干预范式。由于MAID可被视为一类特殊的因果图,通过PSI最大化相应的因果效应,可以实现整合主要任务目标与附加期望结果的复合期望结果。此外,MAID的捆绑关联图分析提供了一种工具,用于识别在特定MARL交互范式设计下,某个MARL学习范式是否可行。在实验中,我们证明了所提出的目标干预的有效性,并验证了关联图分析的结果。