Robot navigation under visual corruption presents a formidable challenge. To address this, we propose a Test-time Adaptation (TTA) method, named as TTA-Nav, for point-goal navigation under visual corruptions. Our "plug-and-play" method incorporates a top-down decoder to a pre-trained navigation model. Firstly, the pre-trained navigation model gets a corrupted image and extracts features. Secondly, the top-down decoder produces the reconstruction given the high-level features extracted by the pre-trained model. Then, it feeds the reconstruction of a corrupted image back to the pre-trained model. Finally, the pre-trained model does forward pass again to output action. Despite being trained solely on clean images, the top-down decoder can reconstruct cleaner images from corrupted ones without the need for gradient-based adaptation. The pre-trained navigation model with our top-down decoder significantly enhances navigation performance across almost all visual corruptions in our benchmarks. Our method improves the success rate of point-goal navigation from the state-of-the-art result of 46% to 94% on the most severe corruption. This suggests its potential for broader application in robotic visual navigation.
翻译:视觉损坏下的机器人导航是一项严峻挑战。为解决此问题,我们提出了一种名为TTA-Nav的测试时自适应(TTA)方法,用于应对视觉损坏下的点目标导航。这种“即插即用”方法将一个自上而下的解码器集成到预训练导航模型中。首先,预训练导航模型获取损坏图像并提取特征;其次,自上而下的解码器利用预训练模型提取的高层特征进行图像重建;接着,它将损坏图像的重建结果反馈回预训练模型;最后,预训练模型再次执行前向传播以输出动作。尽管仅在干净图像上训练,自上而下的解码器无需基于梯度的自适应即可从损坏图像中重建更清晰的图像。集成自上而下解码器的预训练导航模型在几乎所有视觉损坏的基准测试中显著提升了导航性能。在最严重的损坏条件下,我们的方法将点目标导航成功率从当前最优的46%提升至94%。这表明该方法在机器人视觉导航领域具有广泛的应用潜力。