Watermark text spotting in document images can offer access to an often unexplored source of information, providing crucial evidence about a record's scope, audience and sometimes even authenticity. Stemming from the problem of text spotting, detecting and understanding watermarks in documents inherits the same hardships - in the wild, writing can come in various fonts, sizes and forms, making generic recognition a very difficult problem. To address the lack of resources in this field and propel further research, we propose a novel benchmark (K-Watermark) containing 65,447 data samples generated using Wrender, a watermark text patterns rendering procedure. A validity study using humans raters yields an authenticity score of 0.51 against pre-generated watermarked documents. To prove the usefulness of the dataset and rendering technique, we developed an end-to-end solution (Wextract) for detecting the bounding box instances of watermark text, while predicting the depicted text. To deal with this specific task, we introduce a variance minimization loss and a hierarchical self-attention mechanism. To the best of our knowledge, we are the first to propose an evaluation benchmark and a complete solution for retrieving watermarks from documents surpassing baselines by 5 AP points in detection and 4 points in character accuracy.
翻译:文档图像中的水印文字检测可获取通常未被探索的信息源,为记录的范围、受众乃至真实性提供关键证据。源于文字检测问题,文档中水印的检测与理解继承了相同的难点——在自然场景中,文字可能以多种字体、尺寸和形式出现,使得通用识别成为极具挑战的问题。为解决该领域资源匮乏问题并推动后续研究,我们提出一项新型基准(K-Watermark),包含利用Wrender水印文字模式渲染流程生成的65,447个数据样本。通过人类评估者进行的有效性研究显示,与预生成水印文档相比,其真实性得分为0.51。为证明数据集与渲染技术的实用性,我们开发了端到端解决方案(Wextract),用于检测水印文字的边界框实例并预测其中文字。针对该特定任务,我们引入方差最小化损失函数与层级自注意力机制。据我们所知,这是首个提出评估基准与完整解决方案,用于从文档中提取水印的工作,该方法在检测性能上超越基线5个平均精度点,字符准确率提升4个百分点。