Multi-agent hierarchical reinforcement learning (MAHRL) has been studied as an effective means to solve intelligent decision problems in complex and large-scale environments. However, most current MAHRL algorithms follow the traditional way of using reward functions in reinforcement learning, which limits their use to a single task. This study aims to design a multi-agent cooperative algorithm with logic reward shaping (LRS), which uses a more flexible way of setting the rewards, allowing for the effective completion of multi-tasks. LRS uses Linear Temporal Logic (LTL) to express the internal logic relation of subtasks within a complex task. Then, it evaluates whether the subformulae of the LTL expressions are satisfied based on a designed reward structure. This helps agents to learn to effectively complete tasks by adhering to the LTL expressions, thus enhancing the interpretability and credibility of their decisions. To enhance coordination and cooperation among multiple agents, a value iteration technique is designed to evaluate the actions taken by each agent. Based on this evaluation, a reward function is shaped for coordination, which enables each agent to evaluate its status and complete the remaining subtasks through experiential learning. Experiments have been conducted on various types of tasks in the Minecraft-like environment. The results demonstrate that the proposed algorithm can improve the performance of multi-agents when learning to complete multi-tasks.
翻译:多智能体分层强化学习(MAHRL)已被证明是解决复杂大规模环境中智能决策问题的有效方法。然而,当前大多数MAHRL算法仍遵循强化学习中奖励函数的传统设计方式,这限制了其在单一任务中的应用。本研究旨在设计一种结合逻辑奖励塑形(LRS)的多智能体协作算法,通过更灵活的奖励设置方式,实现多任务的有效完成。LRS利用线性时序逻辑(LTL)表达复杂任务中子任务间的内在逻辑关系,并基于设计的奖励结构评估LTL表达式的子公式是否得到满足。这有助于智能体通过学习遵循LTL表达式来有效完成任务,从而增强其决策的可解释性与可信度。为提升多智能体间的协调与合作能力,本研究设计了价值迭代技术以评估各智能体的行动,并基于此评估构建协调奖励函数,使每个智能体能够评估自身状态并通过经验学习完成剩余子任务。在类Minecraft环境中的多类任务上进行的实验表明,所提算法能有效提升多智能体学习完成多任务时的性能表现。