Watermark text spotting in document images can offer access to an often unexplored source of information, providing crucial evidence about a record's scope, audience and sometimes even authenticity. Stemming from the problem of text spotting, detecting and understanding watermarks in documents inherits the same hardships - in the wild, writing can come in various fonts, sizes and forms, making generic recognition a very difficult problem. To address the lack of resources in this field and propel further research, we propose a novel benchmark (K-Watermark) containing 65,447 data samples generated using Wrender, a watermark text patterns rendering procedure. A validity study using humans raters yields an authenticity score of 0.51 against pre-generated watermarked documents. To prove the usefulness of the dataset and rendering technique, we developed an end-to-end solution (Wextract) for detecting the bounding box instances of watermark text, while predicting the depicted text. To deal with this specific task, we introduce a variance minimization loss and a hierarchical self-attention mechanism. To the best of our knowledge, we are the first to propose an evaluation benchmark and a complete solution for retrieving watermarks from documents surpassing baselines by 5 AP points in detection and 4 points in character accuracy.
翻译:文档图像中的水印文本检测能够挖掘通常未被探索的信息源,为记录的范围、受众乃至真实性提供关键证据。该问题源于文本检测领域,文档中水印的检测与理解继承相同难点——现实场景中文字可能呈现多种字体、尺寸和形式,使得通用识别极具挑战性。为填补该领域资源空白并推动后续研究,我们提出新型基准数据集(K-Watermark),包含通过水印文本模式渲染流程Wrender生成的65,447个数据样本。基于人类评估者的有效性研究显示,该数据集与预生成水印文档的吻合度得分为0.51。为验证数据集与渲染技术的实用性,我们开发了端到端解决方案(Wextract),可同时检测水印文本的边界框实例并预测其文字内容。针对该特定任务,我们引入方差最小化损失函数与层级自注意力机制。据我们所知,这是首个提出水印检索评估基准与完整解决方案的工作,在检测性能上超越基线5个平均精度点,字符准确率提升4个百分点。