Modality-invariant Visual Odometry for Embodied Vision

Effectively localizing an agent in a realistic, noisy setting is crucial for many embodied vision tasks. Visual Odometry (VO) is a practical substitute for unreliable GPS and compass sensors, especially in indoor environments. While SLAM-based methods show a solid performance without large data requirements, they are less flexible and robust w.r.t. to noise and changes in the sensor suite compared to learning-based approaches. Recent deep VO models, however, limit themselves to a fixed set of input modalities, e.g., RGB and depth, while training on millions of samples. When sensors fail, sensor suites change, or modalities are intentionally looped out due to available resources, e.g., power consumption, the models fail catastrophically. Furthermore, training these models from scratch is even more expensive without simulator access or suitable existing models that can be fine-tuned. While such scenarios get mostly ignored in simulation, they commonly hinder a model's reusability in real-world applications. We propose a Transformer-based modality-invariant VO approach that can deal with diverse or changing sensor suites of navigation agents. Our model outperforms previous methods while training on only a fraction of the data. We hope this method opens the door to a broader range of real-world applications that can benefit from flexible and learned VO models.

翻译：在真实、噪声环境中有效定位智能体对许多具身视觉任务至关重要。视觉里程计是GPS和磁力计不可靠时的实用替代方案，尤其在室内环境中。基于SLAM的方法虽无需大量数据即可展现出稳健性能，但相较于学习方法，其在传感器套件噪声和变化方面的灵活性与鲁棒性较差。然而，近期深度VO模型仅局限于固定的输入模态组合（如RGB和深度），并需在数百万样本上训练。当传感器故障、传感器套件变更或出于资源（如功耗）考虑故意循环使用模态时，这些模型将彻底失效。此外，若无法访问模拟器或无法对现有模型进行微调，从头训练此类模型的成本甚至更为高昂。尽管此类场景在模拟中常被忽视，但在实际应用中它们普遍阻碍了模型的可复用性。我们提出一种基于Transformer的模态不变VO方法，可应对导航智能体多样化或变化的传感器套件。我们的模型在仅使用少量数据训练的情况下仍优于先前方法。期待该方法能为更多受益于灵活可学习的VO模型的真实世界应用打开大门。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR2020】自监督的深度视觉测程与在线适应，Self-Supervised Deep Visual Odometry

专知会员服务

32+阅读 · 2020年5月14日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日