ORION：面向协作多智能体在线导航的选项正则化深度强化学习 (ORION: Option-Regularized Deep Reinforcement Learning for Cooperative Multi-Agent Online Navigation)

Existing methods for multi-agent navigation typically assume fully known environments, offering limited support for partially known scenarios such as warehouses or factory floors. There, agents may need to plan trajectories that balance their own path optimality with their ability to collect and share information about the environment that can help their teammates reach their own goals. To these ends, we propose ORION, a novel deep reinforcement learning framework for cooperative multi-agent online navigation in partially known environments. Starting from an imperfect prior map, ORION trains agents to make decentralized decisions, coordinate to reach their individual targets, and actively reduce map uncertainty by sharing online observations in a closed perception-action loop. We first design a shared graph encoder that fuses prior map with online perception into a unified representation, providing robust state embeddings under dynamic map discrepancies. At the core of ORION is an option-critic framework that learns to reason about a set of high-level cooperative modes that translate into sequences of low-level actions, allowing agents to switch between individual navigation and team-level exploration adaptively. We further introduce a dual-stage cooperation strategy that enables agents to assist teammates under map uncertainty, thereby reducing the overall makespan. Across extensive maze-like maps and large-scale warehouse environments, our simulation results show that ORION achieves high-quality, real-time decentralized cooperation over varying team sizes, outperforming state-of-the-art classical and learning-based baselines. Finally, we validate ORION on physical robot teams, demonstrating its robustness and practicality for real-world cooperative navigation.

翻译：现有多智能体导航方法通常假设环境完全已知，对仓库或工厂车间等部分已知场景的支持有限。在这些场景中，智能体需要规划既能保证自身路径最优性，又能兼顾环境信息收集与共享能力的轨迹，以协助队友达成各自目标。为此，我们提出ORION——一种面向部分已知环境中协作多智能体在线导航的新型深度强化学习框架。基于不完善的先验地图，ORION训练智能体进行分散式决策，通过协调抵达各自目标，并在感知-动作闭环中共享在线观测以主动降低地图不确定性。我们首先设计了一个共享图编码器，将先验地图与在线感知融合为统一表征，为动态地图差异下的状态嵌入提供鲁棒性。ORION的核心是选项-评论家框架，该框架学习推理一组可转化为底层动作序列的高层协作模式，使智能体能自适应地在个体导航与团队级探索间切换。我们进一步引入双阶段协作策略，使智能体能在地图不确定条件下协助队友，从而缩短整体完工时间。在大量迷宫式地图和大规模仓库环境中的仿真结果表明，ORION在不同团队规模下均能实现高质量实时分散协作，性能优于最先进的经典方法与基于学习的基线方法。最后，我们在实体机器人团队上验证了ORION，证明了其在现实世界协作导航中的鲁棒性与实用性。