This dissertation explores the application of multi-agent reinforcement learning (MARL) for handling deadlocks in intralogistics systems that rely on autonomous mobile robots (AMRs). AMRs enhance operational flexibility but also increase the risk of deadlocks, which degrade system throughput and reliability. Existing approaches often neglect deadlock handling in the planning phase and rely on rigid control rules that cannot adapt to dynamic operational conditions. To address these shortcomings, this work develops a structured methodology for integrating MARL into logistics planning and operational control. It introduces reference models that explicitly consider deadlock-capable multi-agent pathfinding (MAPF) problems, enabling systematic evaluation of MARL strategies. Using grid-based environments and an external simulation software, the study compares traditional deadlock handling strategies with MARL-based solutions, focusing on PPO and IMPALA algorithms under different training and execution modes. Findings reveal that MARL-based strategies, particularly when combined with centralized training and decentralized execution (CTDE), outperform rule-based methods in complex, congested environments. In simpler environments or those with ample spatial freedom, rule-based methods remain competitive due to their lower computational demands. These results highlight that MARL provides a flexible and scalable solution for deadlock handling in dynamic intralogistics scenarios, but requires careful tailoring to the operational context.
翻译:本论文探讨了多智能体强化学习(MARL)在依赖自主移动机器人(AMR)的物流内部系统中处理死锁问题的应用。AMR提升了操作灵活性,但也增加了死锁风险,从而降低系统吞吐量和可靠性。现有方法常在规划阶段忽视死锁处理,并依赖僵化的控制规则,无法适应动态操作条件。为弥补这些不足,本研究开发了一种将MARL集成到物流规划与操作控制中的结构化方法。它引入了明确考虑可能导致死锁的多智能体路径规划(MAPF)问题的参考模型,从而能够系统评估MARL策略。通过基于网格的环境和外部仿真软件,本研究比较了传统死锁处理策略与基于MARL的解决方案,重点关注PPO和IMPALA算法在不同训练与执行模式下的表现。研究结果表明,基于MARL的策略,特别是结合集中训练与分散执行(CTDE)时,在复杂、拥挤的环境中优于基于规则的方法。在更简单或具有充足空间自由度的环境中,基于规则的方法因其较低的计算需求仍具有竞争力。这些结果突显了MARL为动态物流内部场景中的死锁处理提供了灵活且可扩展的解决方案,但需要根据操作环境进行精细调整。