Information Extraction from visually rich documents is a challenging task that has gained a lot of attention in recent years due to its importance in several document-control based applications and its widespread commercial value. The majority of the research work conducted on this topic to date follow a two-step pipeline. First, they read the text using an off-the-shelf Optical Character Recognition (OCR) engine, then, they extract the fields of interest from the obtained text. The main drawback of these approaches is their dependence on an external OCR system, which can negatively impact both performance and computational speed. Recent OCR-free methods were proposed to address the previous issues. Inspired by their promising results, we propose in this paper an OCR-free end-to-end information extraction model named DocParser. It differs from prior end-to-end approaches by its ability to better extract discriminative character features. DocParser achieves state-of-the-art results on various datasets, while still being faster than previous works.
翻译:从视觉丰富文档中抽取信息是一项具有挑战性的任务,近年来因其在多个基于文档控制的应用中的重要性及广泛的商业价值而备受关注。至今,针对该主题的大部分研究工作均遵循两阶段流程:首先,使用现成的光学字符识别(OCR)引擎读取文本;然后,从获取的文本中提取目标字段。这些方法的主要缺点在于依赖外部OCR系统,这会负面影响模型性能与计算速度。近期提出的无OCR方法旨在解决上述问题。受其令人鼓舞的结果启发,本文提出了一种名为DocParser的无OCR端到端信息抽取模型。与先前的端到端方法不同,DocParser能够更好地提取判别性字符特征。该模型在多个数据集上取得了最先进的结果,同时仍比先前方法更快。