Transformer recently emerged as the de facto model for computer vision tasks and has also been successfully applied to shadow removal. However, these existing methods heavily rely on intricate modifications to the attention mechanisms within the transformer blocks while using a generic patch embedding. As a result, it often leads to complex architectural designs requiring additional computation resources. In this work, we aim to explore the efficacy of incorporating shadow information within the early processing stage. Accordingly, we propose a transformer-based framework with a novel patch embedding that is tailored for shadow removal, dubbed ShadowMaskFormer. Specifically, we present a simple and effective mask-augmented patch embedding to integrate shadow information and promote the model's emphasis on acquiring knowledge for shadow regions. Extensive experiments conducted on the ISTD, ISTD+, and SRD benchmark datasets demonstrate the efficacy of our method against state-of-the-art approaches while using fewer model parameters.
翻译:Transformer近期已成为计算机视觉任务的事实标准模型,并已成功应用于阴影去除领域。然而,现有方法严重依赖对Transformer模块内注意力机制的复杂修改,同时采用通用的分块嵌入方式。这往往导致架构设计复杂且需要额外计算资源。本研究旨在探索在早期处理阶段融入阴影信息的有效性。据此,我们提出了一种基于Transformer的框架,其中包含一种专门针对阴影去除设计的新型分块嵌入方法,命名为ShadowMaskFormer。具体而言,我们提出了一种简单有效的掩码增强分块嵌入方法,用于整合阴影信息并促使模型更加关注阴影区域的知识获取。在ISTD、ISTD+和SRD基准数据集上进行的大量实验表明,我们的方法在参数更少的情况下,性能优于现有最优方法。