There has been growing interest in deep reinforcement learning (DRL) algorithm design, and reward design is one key component of DRL. Among the various techniques, formal methods integrated with DRL have garnered considerable attention due to their expressiveness and ability to define the requirements for the states and actions of the agent. However, the literature of Signal Temporal Logic (STL) in guiding multi-agent reinforcement learning (MARL) reward design remains limited. In this paper, we propose a novel STL-guided multi-agent reinforcement learning algorithm. The STL specifications are designed to include both task specifications according to the objective of each agent and safety specifications, and the robustness values of the STL specifications are leveraged to generate rewards. We validate the advantages of our method through empirical studies. The experimental results demonstrate significant performance improvements compared to MARL without STL guidance, along with a remarkable increase in the overall safety rate of the multi-agent systems.
翻译:近年来,深度强化学习算法设计备受关注,其中奖励设计是关键组成部分。在各种技术中,形式化方法与深度强化学习的结合因其对智能体状态及行为需求的定义能力与表达性而备受瞩目。然而,关于信号时序逻辑(STL)在多智能体强化学习奖励设计中的指导作用,现有文献仍然有限。本文提出了一种新颖的基于STL引导的多智能体强化学习算法。该STL规范同时包含根据各智能体目标制定的任务规范与安全规范,并利用STL规范的鲁棒性值生成奖励。通过实证研究验证了本方法的优势。实验结果表明,与未采用STL引导的多智能体强化学习相比,本方法在性能上获得显著提升,同时多智能体系统的整体安全率也得到极大提高。