Segment anything model (SAM) has achieved great success in the field of natural image segmentation. Nevertheless, SAM tends to classify shadows as background, resulting in poor segmentation performance for shadow detection task. In this paper, we propose an simple but effective approach for fine tuning SAM to detect shadows. Additionally, we also combine it with long short-term attention mechanism to extend its capabilities to video shadow detection. Specifically, we first fine tune SAM by utilizing shadow data combined with sparse prompts and apply the fine-tuned model to detect a specific frame (e.g., first frame) in the video with a little user assistance. Subsequently, using the detected frame as a reference, we employ a long short-term network to learn spatial correlations between distant frames and temporal consistency between contiguous frames, thereby achieving shadow information propagation across frames. Extensive experimental results demonstrate that our method outperforms the state-of-the-art techniques, with improvements of 17.2% and 3.3% in terms of MAE and IoU, respectively, validating the effectiveness of our method.
翻译:任意分割模型(SAM)在自然图像分割领域取得了巨大成功。然而,该模型倾向于将阴影归类为背景,导致阴影检测任务的分割性能较差。本文提出了一种简单而有效的方法,对SAM进行微调以实现阴影检测。此外,我们还将长短期注意力机制与其结合,将其能力扩展至视频阴影检测。具体而言,我们首先利用结合稀疏提示的阴影数据对SAM进行微调,并将微调后的模型应用于在少量用户辅助下检测视频中的特定帧(例如第一帧)。随后,以检测到的帧为参考,我们采用长短期网络学习远距离帧之间的空间相关性以及连续帧之间的时间一致性,从而实现跨帧的阴影信息传播。大量实验结果表明,我们的方法在MAE和IoU指标上分别提升了17.2%和3.3%,优于现有最先进技术,从而验证了该方法的有效性。