Cross-Modal Spherical Aggregation for Weakly Supervised Remote Sensing Shadow Removal

Remote sensing shadow removal, which aims to recover contaminated surface information, is tricky since shadows typically display overwhelmingly low illumination intensities. In contrast, the infrared image is robust toward significant light changes, providing visual clues complementary to the visible image. Nevertheless, the existing methods ignore the collaboration between heterogeneous modalities, leading to undesired quality degradation. To fill this gap, we propose a weakly supervised shadow removal network with a spherical feature space, dubbed S2-ShadowNet, to explore the best of both worlds for visible and infrared modalities. Specifically, we employ a modal translation (visible-to-infrared) model to learn the cross-domain mapping, thus generating realistic infrared samples. Then, Swin Transformer is utilized to extract strong representational visible/infrared features. Simultaneously, the extracted features are mapped to the smooth spherical manifold, which alleviates the domain shift through regularization. Well-designed similarity loss and orthogonality loss are embedded into the spherical space, prompting the separation of private visible/infrared features and the alignment of shared visible/infrared features through constraints on both representation content and orientation. Such a manner encourages implicit reciprocity between modalities, thus providing a novel insight into shadow removal. Notably, ground truth is not available in practice, thus S2-ShadowNet is trained by cropping shadow and shadow-free patches from the shadow image itself, avoiding stereotypical and strict pair data acquisition. More importantly, we contribute a large-scale weakly supervised shadow removal benchmark, including 4000 shadow images with corresponding shadow masks.

翻译：遥感阴影去除旨在恢复被遮蔽的地表信息，但由于阴影区域通常呈现极低的照明强度，该任务具有挑战性。相比之下，红外图像对光照变化具有鲁棒性，能提供与可见光图像互补的视觉线索。然而，现有方法忽略了异构模态间的协同作用，导致不理想的质量退化。为填补这一空白，我们提出了一种基于球面特征空间的弱监督阴影去除网络，称为S2-ShadowNet，以充分挖掘可见光与红外模态的协同优势。具体而言，我们采用模态转换（可见光到红外）模型学习跨域映射，从而生成真实的红外样本。随后，利用Swin Transformer提取强表征性的可见光/红外特征。同时，将提取的特征映射至平滑的球面流形，通过正则化缓解域偏移。在球面空间中嵌入了精心设计的相似性损失与正交性损失，通过对表征内容与方向的约束，促使私有可见光/红外特征分离，并实现共享可见光/红外特征的对齐。这种方式促进了模态间的隐式互惠，从而为阴影去除提供了新的视角。值得注意的是，实践中无法获取真实标注数据，因此S2-ShadowNet通过从阴影图像自身裁剪阴影与非阴影区域进行训练，避免了刻板且严格的数据配对采集。更重要的是，我们构建了一个大规模弱监督阴影去除基准数据集，包含4000幅阴影图像及对应的阴影掩码。