The utilization of multi-modal sensor data in visual place recognition (VPR) has demonstrated enhanced performance compared to single-modal counterparts. Nonetheless, integrating additional sensors comes with elevated costs and may not be feasible for systems that demand lightweight operation, thereby impacting the practical deployment of VPR. To address this issue, we resort to knowledge distillation, which empowers single-modal students to learn from cross-modal teachers without introducing additional sensors during inference. Despite the notable advancements achieved by current distillation approaches, the exploration of feature relationships remains an under-explored area. In order to tackle the challenge of cross-modal distillation in VPR, we present DistilVPR, a novel distillation pipeline for VPR. We propose leveraging feature relationships from multiple agents, including self-agents and cross-agents for teacher and student neural networks. Furthermore, we integrate various manifolds, characterized by different space curvatures for exploring feature relationships. This approach enhances the diversity of feature relationships, including Euclidean, spherical, and hyperbolic relationship modules, thereby enhancing the overall representational capacity. The experiments demonstrate that our proposed pipeline achieves state-of-the-art performance compared to other distillation baselines. We also conduct necessary ablation studies to show design effectiveness. The code is released at: https://github.com/sijieaaa/DistilVPR
翻译:在视觉地点识别(VPR)中,利用多模态传感器数据相比单模态方法展现出更优的性能。然而,集成额外传感器会带来高昂的成本,且对于需要轻量化运行的系统而言可能难以实现,从而影响VPR的实际部署。为解决此问题,我们采用知识蒸馏方法,使单模态学生网络能够从跨模态教师网络中学习,而无需在推理阶段引入额外传感器。尽管当前蒸馏方法已取得显著进展,但特征关系探索仍是一个亟待深入研究的方向。为应对VPR中的跨模态蒸馏挑战,我们提出DistilVPR——一种新型VPR蒸馏流水线。具体而言,我们提出利用来自多个智能体的特征关系,包括教师网络和学生网络的自智能体与跨智能体。此外,我们整合了由不同空间曲率表征的多种流形以探索特征关系。该方法通过引入欧几里得、球面及双曲关系模块,增强了特征关系的多样性,从而提升整体表征能力。实验结果表明,所提流水线相较于其他蒸馏基线方法取得了最先进性能。我们亦开展了必要的消融研究以验证设计有效性。相关代码已开源:https://github.com/sijieaaa/DistilVPR