Neural Radiance Fields (NeRFs) quickly evolved as the new de-facto standard for the task of novel view synthesis when trained on a set of RGB images. In this paper, we conduct a comprehensive evaluation of neural scene representations, such as NeRFs, in the context of multi-modal learning. Specifically, we present four different strategies of how to incorporate a second modality, other than RGB, into NeRFs: (1) training from scratch independently on both modalities; (2) pre-training on RGB and fine-tuning on the second modality; (3) adding a second branch; and (4) adding a separate component to predict (color) values of the additional modality. We chose thermal imaging as second modality since it strongly differs from RGB in terms of radiosity, making it challenging to integrate into neural scene representations. For the evaluation of the proposed strategies, we captured a new publicly available multi-view dataset, ThermalMix, consisting of six common objects and about 360 RGB and thermal images in total. We employ cross-modality calibration prior to data capturing, leading to high-quality alignments between RGB and thermal images. Our findings reveal that adding a second branch to NeRF performs best for novel view synthesis on thermal images while also yielding compelling results on RGB. Finally, we also show that our analysis generalizes to other modalities, including near-infrared images and depth maps. Project page: https://mert-o.github.io/ThermalNeRF/.
翻译:神经辐射场(NeRF)作为基于RGB图像集进行新视角合成任务的范式,迅速发展为新的事实标准。本文对以NeRF为代表的神经场景表示在多模态学习背景下的应用进行了全面评估。具体而言,我们提出了四种将RGB之外的第二模态融入NeRF的策略:(1)独立对两种模态进行从头训练;(2)在RGB上预训练并在第二模态上微调;(3)添加第二分支;(4)增加独立组件以预测额外模态的(颜色)值。我们选择热成像作为第二模态,因其在辐射度方面与RGB存在显著差异,故难以整合至神经场景表示中。为评估上述策略,我们采集了新的公开多视角数据集ThermalMix,涵盖六种常见物体及约360张RGB与热成像图像。在数据采集前,我们采用跨模态校准方法,实现了RGB与热成像图像的高质量对齐。研究结果表明,在NeRF中添加第二分支的策略在热成像新视角合成任务中表现最佳,同时在RGB任务中也能获得令人瞩目的结果。最后,我们证实了该分析可泛化至其他模态,包括近红外图像和深度图。项目主页:https://mert-o.github.io/ThermalNeRF/。