We present AnyThermal, a thermal backbone that captures robust task-agnostic thermal features suitable for a variety of tasks such as cross-modal place recognition, thermal segmentation, and monocular depth estimation using thermal images. Existing thermal backbones that follow task-specific training from small-scale data result in utility limited to a specific environment and task. Unlike prior methods, AnyThermal can be used for a wide range of environments (indoor, aerial, off-road, urban) and tasks, all without task-specific training. Our key insight is to distill the feature representations from visual foundation models such as DINOv2 into a thermal encoder using thermal data from these multiple environments. To bridge the diversity gap of the existing RGB-Thermal datasets, we introduce the TartanRGBT platform, the first open-source data collection platform with synced RGB-Thermal image acquisition. We use this payload to collect the TartanRGBT dataset - a diverse and balanced dataset collected in 4 environments. We demonstrate the efficacy of AnyThermal and TartanRGBT, achieving state-of-the-art results with improvements of up to 36% across diverse environments and downstream tasks on existing datasets.
翻译:我们提出了AnyThermal,一种能够捕获鲁棒、任务无关热特征的热成像骨干网络,适用于跨模态地点识别、热图像分割、热图像单目深度估计等多种任务。现有热成像骨干网络通常基于小规模数据进行任务特定训练,导致其效用局限于特定环境和任务。与先前方法不同,AnyThermal无需任务特定训练即可广泛适用于多种环境(室内、空中、越野、城市)和任务。我们的核心思路是利用来自多个环境的热数据,将视觉基础模型(如DINOv2)的特征表征蒸馏到热编码器中。为弥合现有RGB-热成像数据集的多样性鸿沟,我们推出了TartanRGBT平台——首个开源的同步采集RGB与热图像的数据收集平台。我们利用该载荷收集了TartanRGBT数据集,这是一个在4种环境中采集的多样化且平衡的数据集。我们验证了AnyThermal与TartanRGBT的有效性,在现有数据集上的多种环境和下游任务中实现了最高性能,提升幅度最高达36%。