Removing soft and self shadows that lack clear boundaries from a single image is still challenging. Self shadows are shadows that are cast on the object itself. Most existing methods rely on binary shadow masks, without considering the ambiguous boundaries of soft and self shadows. In this paper, we present DeS3, a method that removes hard, soft and self shadows based on the self-tuned ViT feature similarity and color convergence. Our novel ViT similarity loss utilizes features extracted from a pre-trained Vision Transformer. This loss helps guide the reverse diffusion process towards recovering scene structures. We also introduce a color convergence loss to constrain the surface colors in the reverse inference process to avoid any color shifts. Our DeS3 is able to differentiate shadow regions from the underlying objects, as well as shadow regions from the object casting the shadow. This capability enables DeS3 to better recover the structures of objects even when they are partially occluded by shadows. Different from existing methods that rely on constraints during the training phase, we incorporate the ViT similarity and color convergence loss during the sampling stage. This enables our DeS3 model to effectively integrate its strong modeling capabilities with input-specific knowledge in a self-tuned manner. Our method outperforms state-of-the-art methods on the SRD, AISTD, LRSS, USR and UIUC datasets, removing hard, soft, and self shadows robustly. Specifically, our method outperforms the SOTA method by 20% of the RMSE of the whole image on the SRD dataset.
翻译:从单张图像中去除缺乏清晰边界的软阴影和自阴影仍具挑战性。自阴影是指投射于物体自身的阴影。现有方法大多依赖二值阴影掩膜,未考虑软阴影与自阴影的模糊边界。本文提出DeS3方法,基于自调优ViT特征相似性与颜色收敛机制,实现对硬阴影、软阴影及自阴影的去除。新型ViT相似性损失利用预训练视觉Transformer提取的特征,引导逆向扩散过程恢复场景结构;同时引入颜色收敛损失,在逆向推理过程中约束表面颜色以避免色偏。DeS3能够区分阴影区域与投射阴影的物体,以及阴影区域与下方物体,从而在物体被阴影部分遮挡时更好地恢复其结构。与依赖训练阶段约束的现有方法不同,我们在采样阶段融入ViT相似性与颜色收敛损失,使模型以自调优方式有效整合强建模能力与输入特定知识。在SRD、AISTD、LRSS、USR及UIUC数据集上,本方法在鲁棒去除硬阴影、软阴影及自阴影方面均超越当前最优方法。具体而言,在SRD数据集上,本方法整图均方根误差(RMSE)较当前最优方法降低20%。