Document shadow is a common issue that arise when capturing documents using mobile devices, which significantly impacts the readability. Current methods encounter various challenges including inaccurate detection of shadow masks and estimation of illumination. In this paper, we propose ShaDocFormer, a Transformer-based architecture that integrates traditional methodologies and deep learning techniques to tackle the problem of document shadow removal. The ShaDocFormer architecture comprises two components: the Shadow-attentive Threshold Detector (STD) and the Cascaded Fusion Refiner (CFR). The STD module employs a traditional thresholding technique and leverages the attention mechanism of the Transformer to gather global information, thereby enabling precise detection of shadow masks. The cascaded and aggregative structure of the CFR module facilitates a coarse-to-fine restoration process for the entire image. As a result, ShaDocFormer excels in accurately detecting and capturing variations in both shadow and illumination, thereby enabling effective removal of shadows. Extensive experiments demonstrate that ShaDocFormer outperforms current state-of-the-art methods in both qualitative and quantitative measurements.
翻译:文档影像是使用移动设备拍摄文档时常见的问题,会严重影响可读性。现有方法面临阴影掩膜检测不准确和光照估计困难等挑战。本文提出ShaDocFormer——一种基于Transformer的架构,融合了传统方法和深度学习技术以解决文档阴影去除问题。该架构包含两个组件:阴影注意力阈值检测器(STD)和级联融合优化器(CFR)。STD模块采用传统阈值技术,并利用Transformer的注意力机制收集全局信息,从而实现对阴影掩膜的精确检测。CFR模块的级联聚合结构支持对整个图像进行从粗到细的恢复过程。因此,ShaDocFormer在精确检测和捕捉阴影与光照变化方面表现出色,从而有效消除阴影。大量实验表明,ShaDocFormer在定性和定量评估中均优于当前最优方法。