One of the main challenges of multi-agent learning lies in establishing convergence of the algorithms, as, in general, a collection of individual, self-serving agents is not guaranteed to converge with their joint policy, when learning concurrently. This is in stark contrast to most single-agent environments, and sets a prohibitive barrier for deployment in practical applications, as it induces uncertainty in long term behavior of the system. In this work, we apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning. We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a heuristic sampling algorithm for scenarios where learning dynamics are not known. We demonstrate the applications to a regularized version of Dirac Generative Adversarial Network, a four-intersection traffic control scenario run in a state of the art open-source microscopic traffic simulator SUMO, and a mathematical model of economic competition.
翻译:多智能体学习的主要挑战之一在于确保算法的收敛性,因为通常情况下,当多个独立且自利的智能体同时学习时,其联合策略无法保证收敛。这与大多数单智能体环境形成鲜明对比,并在实际应用部署中设置了难以逾越的障碍,因为它给系统的长期行为带来了不确定性。本文运用动力系统定性理论中的“陷阱区域”概念,在分散式学习中加入联合策略空间中的安全集合。我们提出了一种二元划分算法,用于验证候选集合在已知学习动力学的系统中是否构成陷阱区域,以及一种启发式采样算法,用于学习动力学未知的场景。我们将该方法应用于正则化版本的狄拉克生成对抗网络、在先进开源微观交通仿真器SUMO上运行的四交叉口交通控制场景以及一个经济竞争数学模型。