基于策略的位姿优化：多机器人SLAM中基于强化学习的分布式位姿图优化方法 (Policies over Poses: Reinforcement Learning based Distributed Pose-Graph Optimization for Multi-Robot SLAM)

We consider the distributed pose-graph optimization (PGO) problem, which is fundamental in accurate trajectory estimation in multi-robot simultaneous localization and mapping (SLAM). Conventional iterative approaches linearize a highly non-convex optimization objective, requiring repeated solving of normal equations, which often converge to local minima and thus produce suboptimal estimates. We propose a scalable, outlier-robust distributed planar PGO framework using Multi-Agent Reinforcement Learning (MARL). We cast distributed PGO as a partially observable Markov game defined on local pose-graphs, where each action refines a single edge's pose estimate. A graph partitioner decomposes the global pose graph, and each robot runs a recurrent edge-conditioned Graph Neural Network (GNN) encoder with adaptive edge-gating to denoise noisy edges. Robots sequentially refine poses through a hybrid policy that utilizes prior action memory and graph embeddings. After local graph correction, a consensus scheme reconciles inter-robot disagreements to produce a globally consistent estimate. Our extensive evaluations on a comprehensive suite of synthetic and real-world datasets demonstrate that our learned MARL-based actors reduce the global objective by an average of 37.5% more than the state-of-the-art distributed PGO framework, while enhancing inference efficiency by at least 6X. We also demonstrate that actor replication allows a single learned policy to scale effortlessly to substantially larger robot teams without any retraining. Code is publicly available at https://github.com/herolab-uga/policies-over-poses.

翻译：本文研究分布式位姿图优化（PGO）问题，该问题在多机器人同时定位与建图（SLAM）的精确轨迹估计中具有基础性意义。传统迭代方法通过对高度非凸的优化目标进行线性化，需要反复求解正规方程，常收敛至局部极小值，从而产生次优估计。我们提出一种可扩展、抗异常值的分布式平面PGO框架，采用多智能体强化学习（MARL）。我们将分布式PGO建模为基于局部位姿图的部分可观测马尔可夫博弈，其中每个动作优化单一边缘的位姿估计。通过图分割器分解全局位姿图，每个机器人运行具有自适应边缘门控的循环边缘条件图神经网络（GNN）编码器以去噪含噪声边缘。机器人通过混合策略（利用先验动作记忆与图嵌入）顺序优化位姿。局部图校正后，采用共识机制协调机器人间差异以生成全局一致估计。我们在综合合成与真实数据集上的大量评估表明，基于学习的MARL智能体将全局目标函数平均降低37.5%（优于当前最先进的分布式PGO框架），同时推理效率提升至少6倍。我们还证明，智能体复制机制使得单一学习策略无需重新训练即可轻松扩展至更大规模机器人团队。代码公开于：https://github.com/herolab-uga/policies-over-poses。