Shadows, formed by the occlusion of light, play an essential role in visual perception and directly influence scene understanding, image quality, and visual realism. This paper presents a unified survey and benchmark of deep-learning-based shadow detection, removal, and generation across images and videos. We introduce consistent taxonomies for architectures, supervision strategies, and learning paradigms; review major datasets and evaluation protocols; and re-train representative methods under standardized settings to enable fair comparison. Our benchmark reveals key findings, including inconsistencies in prior reports, strong dependence on model design and resolution, and limited cross-dataset generalization due to dataset bias. By synthesizing insights across the three tasks, we highlight shared illumination cues and priors that connect detection, removal, and generation. We further outline future directions involving unified all-in-one frameworks, semantics- and geometry-aware reasoning, shadow-based AIGC authenticity analysis, and the integration of physics-guided priors into multimodal foundation models. Corrected datasets, trained models, and evaluation tools are released to support reproducible research.
翻译:阴影由光线遮挡形成,在视觉感知中扮演关键角色,并直接影响场景理解、图像质量与视觉真实感。本文对基于深度学习的图像与视频阴影检测、消除与生成任务进行了系统性综述与基准测试。我们建立了统一的架构分类体系、监督策略与学习范式;回顾了主流数据集与评估协议;通过在标准化设置下复现代表性方法以实现公平比较。基准测试揭示了若干重要发现,包括既有研究结论的不一致性、模型设计与分辨率的高度依赖性,以及因数据集偏差导致的跨数据集泛化能力局限。通过综合三项任务的共性认知,我们强调了连接检测、消除与生成任务的共享光照线索与先验知识。进一步展望了未来研究方向,包括一体化框架设计、语义与几何感知推理、基于阴影的AIGC真实性分析,以及物理先验与多模态基础模型的融合。我们公开了修正数据集、训练模型与评估工具以支持可复现研究。