Integrating RGB and NIR stereo imaging provides complementary spectral information, potentially enhancing robotic 3D vision in challenging lighting conditions. However, existing datasets and imaging systems lack pixel-level alignment between RGB and NIR images, posing challenges for downstream vision tasks. In this paper, we introduce a robotic vision system equipped with pixel-aligned RGB-NIR stereo cameras and a LiDAR sensor mounted on a mobile robot. The system simultaneously captures pixel-aligned pairs of RGB stereo images, NIR stereo images, and temporally synchronized LiDAR points. Utilizing the mobility of the robot, we present a dataset containing continuous video frames under diverse lighting conditions. We then introduce two methods that utilize the pixel-aligned RGB-NIR images: an RGB-NIR image fusion method and a feature fusion method. The first approach enables existing RGB-pretrained vision models to directly utilize RGB-NIR information without fine-tuning. The second approach fine-tunes existing vision models to more effectively utilize RGB-NIR information. Experimental results demonstrate the effectiveness of using pixel-aligned RGB-NIR images across diverse lighting conditions.
翻译:集成RGB与NIR立体成像可提供互补的光谱信息,有望在具有挑战性的光照条件下增强机器人三维视觉能力。然而,现有数据集与成像系统缺乏RGB与NIR图像间的像素级对齐,这为下游视觉任务带来了挑战。本文提出一种搭载像素对齐RGB-NIR立体相机及LiDAR传感器的移动机器人视觉系统。该系统能同步采集像素对齐的RGB立体图像对、NIR立体图像对以及时间同步的LiDAR点云数据。借助机器人的移动能力,我们构建了一个包含多种光照条件下连续视频帧的数据集。在此基础上,我们提出两种利用像素对齐RGB-NIR图像的方法:RGB-NIR图像融合方法与特征融合方法。前者使现有基于RGB预训练的视觉模型无需微调即可直接利用RGB-NIR信息;后者通过微调现有视觉模型以更有效地利用RGB-NIR信息。实验结果表明,像素对齐的RGB-NIR图像在多种光照条件下均能有效提升视觉任务性能。