This paper presents a cross-modal learning framework that exploits complementary information from depth and grayscale images for robust navigation. We introduce a Cross-Modal Wasserstein Autoencoder that learns shared latent representations by enforcing cross-modal consistency, enabling the system to infer depth-relevant features from grayscale observations when depth measurements are corrupted. The learned representations are integrated with a Reinforcement Learning-based policy for collision-free navigation in unstructured environments when depth sensors experience degradation due to adverse conditions such as poor lighting or reflective surfaces. Simulation and real-world experiments demonstrate that our approach maintains robust performance under significant depth degradation and successfully transfers to real environments.
翻译:本文提出了一种跨模态学习框架,利用深度图像与灰度图像的互补信息实现鲁棒导航。我们引入跨模态瓦瑟斯坦自编码器,通过强制跨模态一致性来学习共享潜在表征,使系统在深度测量受损时能从灰度观测中推断深度相关特征。当深度传感器因恶劣光照或反光表面等不利条件出现性能退化时,所学表征与基于强化学习的策略相结合,可在非结构化环境中实现无碰撞导航。仿真与真实环境实验表明,本方法在深度显著退化条件下仍能保持鲁棒性能,并成功迁移至真实场景。