Multi-modal scene reconstruction integrating RGB and thermal infrared data is essential for robust environmental perception across diverse lighting and weather conditions. However, extending 3D Gaussian Splatting (3DGS) to multi-spectral scenarios remains challenging. Current approaches often struggle to fully leverage the complementary information of multi-modal data, typically relying on mechanisms that either tend to neglect cross-modal correlations or leverage shared representations that fail to adaptively handle the complex structural correlations and physical discrepancies between spectrums. To address these limitations, we propose ThermoSplat, a novel framework that enables deep spectral-aware reconstruction through active feature modulation and adaptive geometry decoupling. First, we introduce a Cross-Modal FiLM Modulation mechanism that dynamically conditions shared latent features on thermal structural priors, effectively guiding visible texture synthesis with reliable cross-modal geometric cues. Second, to accommodate modality-specific geometric inconsistencies, we propose a Modality-Adaptive Geometric Decoupling scheme that learns independent opacity offsets and executes an independent rasterization pass for the thermal branch. Additionally, a hybrid rendering pipeline is employed to integrate explicit Spherical Harmonics with implicit neural decoding, ensuring both semantic consistency and high-frequency detail preservation. Extensive experiments on the RGBT-Scenes dataset demonstrate that ThermoSplat achieves state-of-the-art rendering quality across both visible and thermal spectrums.
翻译:融合RGB与热红外数据的多模态场景重建对于在多样光照与天气条件下实现鲁棒的环境感知至关重要。然而,将3D高斯溅射(3DGS)扩展到多光谱场景仍面临挑战。现有方法通常难以充分利用多模态数据的互补信息,其采用的机制往往倾向于忽略跨模态关联,或利用无法自适应处理频谱间复杂结构关联与物理差异的共享表征。为应对这些局限,我们提出ThermoSplat,一种通过主动特征调制与自适应几何解耦实现深度光谱感知重建的新型框架。首先,我们引入一种跨模态FiLM调制机制,该机制基于热结构先验动态调节共享潜在特征,从而利用可靠的跨模态几何线索有效指导可见光纹理合成。其次,为适应模态特定的几何不一致性,我们提出一种模态自适应几何解耦方案,该方案学习独立的不透明度偏移并为热分支执行独立的栅格化过程。此外,采用混合渲染管线将显式球谐函数与隐式神经解码相结合,确保语义一致性与高频细节保留。在RGBT-Scenes数据集上的大量实验表明,ThermoSplat在可见光与热光谱范围内均实现了最先进的渲染质量。