Supercharging Thermal Gaussian Splatting with Depth Estimation

Efficient and robust 3D scene representation is crucial in autonomous driving, robotics, and related fields. While RGB images provide valuable content for 3D reconstruction, other modalities like thermal or depth can enable additional information on the environment. Lately, novel view synthesis methods like 3D Gaussian Splatting have started using multiple modalities to further boost their performance. But fusing or combining multimodal data can make the process slower and can bring in additional challenges. Therefore, our project aims to use single modality based on thermal infrared domain, by removing the reliance on visible light as much as possible. This single modality can be expected to be faster as it does not rely on multimodal data. We propose a method, Thermal-to-Depth Gaussian Splatting (TDg), that uses only thermal images and depth estimation in its architecture to derive the radiance fields. Our TDg method outperforms the MSMG (Multiple Single-Modal Gaussians) baseline in most cases on our test datasets, RGBT-Scenes and ThermalMix. On average, the rendering quality metrics such as learned perceptual image patch similarity (LPIPS), structural similarity index measure (SSIM), and peak signal-to-noise ratio (PSNR) of TDg are 1.12%, 0.034%, and 0.01% better than the baseline MSMG values. It also reduces the training time significantly, by 12 mins 47 secs (55% improvement). Overall, our method is successful in deriving these thermal radiance fields, which can ultimately have several applications, such as identifying heat sources critical in surveillance, search or rescue operations, and industrial inspections where temperature is widely used to monitor machines.

翻译：高效且稳健的三维场景表示在自动驾驶、机器人技术及相关领域至关重要。虽然RGB图像为三维重建提供了宝贵内容，但热红外或深度等其他模态能够提供环境的额外信息。近期，诸如三维高斯泼溅等新视角合成方法开始利用多模态数据以进一步提升性能。然而，融合或结合多模态数据可能降低处理速度并引入额外挑战。因此，本项目旨在通过尽可能减少对可见光的依赖，使用基于热红外域的单模态方法。这种单模态方法因不依赖多模态数据，有望实现更快的处理速度。我们提出了一种名为热到深度高斯泼溅（TDg）的方法，该方法在其架构中仅使用热图像和深度估计来导出辐射场。在我们的测试数据集RGBT-Scenes和ThermalMix上，我们的TDg方法在大多数情况下优于MSMG（多单模态高斯）基线。平均而言，TDg的学习感知图像块相似度（LPIPS）、结构相似性指数度量（SSIM）和峰值信噪比（PSNR）等渲染质量指标分别比基线MSMG值高出1.12%、0.034%和0.01%。同时，训练时间显著减少，缩短了12分47秒（提升55%）。总体而言，我们的方法成功导出了这些热辐射场，最终可应用于多个领域，例如识别监控、搜索救援行动中的关键热源，以及广泛利用温度监测机器的工业检测。