Re-identification (ReID) is a critical challenge in computer vision, predominantly studied in the context of pedestrians and vehicles. However, robust object-instance ReID, which has significant implications for tasks such as autonomous exploration, long-term perception, and scene understanding, remains underexplored. In this work, we address this gap by proposing a novel dual-path object-instance re-identification transformer architecture that integrates multimodal RGB and depth information. By leveraging depth data, we demonstrate improvements in ReID across scenes that are cluttered or have varying illumination conditions. Additionally, we develop a ReID-based localization framework that enables accurate camera localization and pose identification across different viewpoints. We validate our methods using two custom-built RGB-D datasets, as well as multiple sequences from the open-source TUM RGB-D datasets. Our approach demonstrates significant improvements in both object instance ReID (mAP of 75.18) and localization accuracy (success rate of 83% on TUM-RGBD), highlighting the essential role of object ReID in advancing robotic perception. Our models, frameworks, and datasets have been made publicly available.
翻译:重识别(ReID)是计算机视觉领域的一项关键挑战,目前主要在行人和车辆场景中得到广泛研究。然而,对于自主探索、长期感知和场景理解等任务具有重要意义的鲁棒性物体实例重识别研究仍显不足。本工作通过提出一种新颖的双路径物体实例重识别Transformer架构来填补这一空白,该架构融合了多模态RGB与深度信息。通过利用深度数据,我们证明了该方法在杂乱或光照条件变化的场景中能有效提升重识别性能。此外,我们开发了基于重识别的定位框架,能够实现跨不同视角的精确相机定位与姿态识别。我们使用两个自建RGB-D数据集以及开源TUM RGB-D数据集中的多个序列验证了所提方法。我们的方法在物体实例重识别(mAP达75.18)和定位精度(在TUM-RGBD上成功率达83%)方面均显示出显著提升,凸显了物体重识别在推动机器人感知技术发展中的关键作用。我们的模型、框架及数据集均已公开。