See What the Robot Can't See: Learning Cooperative Perception for Visual Navigation

We consider the problem of navigating a mobile robot towards a target in an unknown environment that is endowed with visual sensors, where neither the robot nor the sensors have access to global positioning information and only use first-person-view images. In order to overcome the need for positioning, we train the sensors to encode and communicate relevant viewpoint information to the mobile robot, whose objective it is to use this information to navigate to the target along the shortest path. We overcome the challenge of enabling all the sensors (even those that cannot directly see the target) to predict the direction along the shortest path to the target by implementing a neighborhood-based feature aggregation module using a Graph Neural Network (GNN) architecture. In our experiments, we first demonstrate generalizability to previously unseen environments with various sensor layouts. Our results show that by using communication between the sensors and the robot, we achieve up to 2.0x improvement in SPL (Success weighted by Path Length) when compared to a communication-free baseline. This is done without requiring a global map, positioning data, nor pre-calibration of the sensor network. Second, we perform a zero-shot transfer of our model from simulation to the real world. Laboratory experiments demonstrate the feasibility of our approach in various cluttered environments. Finally, we showcase examples of successful navigation to the target while both the sensor network layout as well as obstacles are dynamically reconfigured as the robot navigates. We provide a video demo, the dataset, trained models, and source code. https://www.youtube.com/watch?v=kcmr6RUgucw https://github.com/proroklab/sensor-guided-visual-nav

翻译：我们研究了在配备视觉传感器的未知环境中，移动机器人向目标导航的问题。在该环境中，机器人和传感器均无法获取全局定位信息，仅能使用第一人称视角图像。为克服定位需求，我们训练传感器编码并传输相关视角信息给移动机器人，机器人则利用这些信息沿最短路径向目标导航。通过采用基于图神经网络（GNN）架构的邻域特征聚合模块，我们解决了使所有传感器（包括无法直接看到目标的传感器）都能预测沿最短路径朝向目标的方向这一挑战。实验中，我们首先证明了该方法在多种传感器布局的未知环境中具有泛化能力。结果表明，通过传感器与机器人之间的通信，与无通信基线相比，SPL（按路径长度加权的成功率）提升了高达2.0倍。这一提升无需全局地图、定位数据或传感器网络的预校准。其次，我们实现了从仿真到真实世界的零样本模型迁移。实验室实验证明了该方法在各种杂乱环境中的可行性。最后，我们展示了在机器人导航过程中，当传感器网络布局和障碍物动态变化时，成功导航至目标的示例。我们提供了视频演示、数据集、训练模型及源代码。https://www.youtube.com/watch?v=kcmr6RUgucw https://github.com/proroklab/sensor-guided-visual-nav