Traditional shadow removal networks often treat image restoration as an unconstrained mapping, lacking the physical interpretability required to balance localized texture recovery with global illumination consistency. To address this, we propose CFSR, a multi-modal prior-driven framework that reframes shadow removal as a physics-constrained restoration process. By seamlessly integrating 3D geometric cues with large-scale foundation model semantics, CFSR effectively bridges the 2D-3D domain gap. Specifically, we first map observations into a custom HVI color space to suppress shadow-induced noise and robustly fuse RGB data with estimated depth priors. At its core, our Geometric & Semantic Dual Explicit Guided Attention mechanism utilizes DINO features and 3D surface normals to directly modulate the attention affinity matrix, structurally enforcing physical lighting constraints. To recover severely degraded regions, we inject holistic priors via a frozen CLIP encoder. Finally, our Frequency Collaborative Reconstruction Module (FCRM) achieves an optimal synthesis by decoupling the decoding process. Conditioned on geometric priors, FCRM seamlessly harmonizes the reconstruction of sharp high-frequency occlusion boundaries with the restoration of low-frequency global illumination. Extensive experiments demonstrate that CFSR achieves state-of-the-art performance across multiple challenging benchmarks.
翻译:传统阴影去除网络往往将图像恢复视为无约束映射,缺乏平衡局部纹理恢复与全局光照一致性所需的物理可解释性。为此,我们提出CFSR——一种多模态先验驱动框架,将阴影重构为物理约束下的恢复过程。通过将3D几何线索与大规模基础模型语义无缝融合,CFSR有效弥合了2D-3D域间鸿沟。具体而言,我们首先将观测映射至定制HVI色彩空间以抑制阴影噪声,并稳健融合RGB数据与估计深度先验。其核心在于,本框架提出的几何与语义双显式引导注意力机制,利用DINO特征和3D表面法向量直接调控注意力亲和矩阵,从结构层面施加物理光照约束。为恢复严重退化区域,我们通过冻结CLIP编码器注入全局先验。最后,频域协同重构模块(FCRM)通过解耦解码过程实现优化合成:在几何先验约束下,FCRM将尖锐高频遮挡边界的重建与低频全局光照的恢复无缝协调。大量实验表明,CFSR在多个挑战性基准上均达到最优性能。