We present a novel approach for efficient and reliable goal-directed long-horizon navigation for a multi-robot team in a structured, unknown environment by predicting statistics of unknown space. Building on recent work in learning-augmented model based planning under uncertainty, we introduce a high-level state and action abstraction that lets us approximate the challenging Dec-POMDP into a tractable stochastic MDP. Our Multi-Robot Learning over Subgoals Planner (MR-LSP) guides agents towards coordinated exploration of regions more likely to reach the unseen goal. We demonstrate improvement in cost against other multi-robot strategies; in simulated office-like environments, we show that our approach saves 13.29% (2 robot) and 4.6% (3 robot) average cost versus standard non-learned optimistic planning and a learning-informed baseline.
翻译:我们提出了一种新颖方法,通过在结构化未知环境中预测未知空间的统计特征,实现多机器人团队高效且可靠的目标导向长期导航。基于近期在不确定性条件下学习增强型模型规划的研究进展,我们引入了一种高层状态与动作抽象机制,将具有挑战性的去中心化部分可观测马尔可夫决策过程(Dec-POMDP)近似为可解的随机马尔可夫决策过程(MDP)。我们的多机器人子目标学习规划器(MR-LSP)引导智能体协同探索更可能抵达未知目标的区域。相较于其他多机器人策略,我们在模拟办公环境中验证了该方法在成本上的改进:与非学习型乐观规划及基于学习的基准策略相比,本方法分别节省了13.29%(双机器人)和4.6%(三机器人)的平均成本。