The presence of shadows significantly impacts the visual quality of scanned documents. However, the existing traditional techniques and deep learning methods used for shadow removal have several limitations. These methods either rely heavily on heuristics, resulting in suboptimal performance, or require large datasets to learn shadow-related features. In this study, we propose the DocDeshadower, a multi-frequency Transformer-based model built on Laplacian Pyramid. DocDeshadower is designed to remove shadows at different frequencies in a coarse-to-fine manner. To achieve this, we decompose the shadow image into different frequency bands using Laplacian Pyramid. In addition, we introduce two novel components to this model: the Attention-Aggregation Network and the Gated Multi-scale Fusion Transformer. The Attention-Aggregation Network is designed to remove shadows in the low-frequency part of the image, whereas the Gated Multi-scale Fusion Transformer refines the entire image at a global scale with its large perceptive field. Our extensive experiments demonstrate that DocDeshadower outperforms the current state-of-the-art methods in both qualitative and quantitative terms.
翻译:阴影的存在会显著影响扫描文档的视觉质量。然而,现有的用于阴影去除的传统技术和深度学习方法存在若干局限性。这些方法要么严重依赖于启发式规则,导致性能欠佳,要么需要大规模数据集来学习阴影相关特征。在本研究中,我们提出了DocDeshadower,一种基于拉普拉斯金字塔的多频Transformer模型。DocDeshadower旨在以由粗到细的方式去除不同频率的阴影。为此,我们利用拉普拉斯金字塔将阴影图像分解为不同频带。此外,我们向该模型引入了两个新颖组件:注意力聚合网络(Attention-Aggregation Network)和门控多尺度融合Transformer(Gated Multi-scale Fusion Transformer)。注意力聚合网络用于去除图像低频部分的阴影,而门控多尺度融合Transformer则凭借其大感受野在全局尺度上优化整幅图像。大量实验表明,DocDeshadower在定性和定量两方面均优于当前最先进的方法。