Visual information extraction (VIE), which aims to simultaneously perform OCR and information extraction in a unified framework, has drawn increasing attention due to its essential role in various applications like understanding receipts, goods, and traffic signs. However, as existing benchmark datasets for VIE mainly consist of document images without the adequate diversity of layout structures, background disturbs, and entity categories, they cannot fully reveal the challenges of real-world applications. In this paper, we propose a large-scale dataset consisting of camera images for VIE, which contains not only the larger variance of layout, backgrounds, and fonts but also much more types of entities. Besides, we propose a novel framework for end-to-end VIE that combines the stages of OCR and information extraction in an end-to-end learning fashion. Different from the previous end-to-end approaches that directly adopt OCR features as the input of an information extraction module, we propose to use contrastive learning to narrow the semantic gap caused by the difference between the tasks of OCR and information extraction. We evaluate the existing end-to-end methods for VIE on the proposed dataset and observe that the performance of these methods has a distinguishable drop from SROIE (a widely used English dataset) to our proposed dataset due to the larger variance of layout and entities. These results demonstrate our dataset is more practical for promoting advanced VIE algorithms. In addition, experiments demonstrate that the proposed VIE method consistently achieves the obvious performance gains on the proposed and SROIE datasets.
翻译:视觉信息提取(VIE)旨在统一框架下同步执行OCR与信息提取,因其在收据、商品、交通标志等各类应用中的关键作用而日益受到关注。然而,现有VIE基准数据集主要由文档图像构成,缺乏布局结构、背景干扰和实体类别的充分多样性,难以全面反映真实应用场景的挑战。本文构建了包含相机拍摄图像的大规模VIE数据集,不仅包含布局、背景和字体的更大变异性,还涵盖更多类型的实体。此外,我们提出了一种新颖的端到端VIE框架,以端到端学习方式融合OCR和信息提取阶段。不同于先前直接采用OCR特征作为信息提取模块输入的端到端方法,本文提出利用对比学习来缩小因OCR与信息提取任务差异导致的语义鸿沟。我们在所提数据集上评估了现有VIE端到端方法,发现这些方法从SROIE(广泛使用的英文数据集)迁移至本数据集时性能显著下降,这源于布局和实体的更大变异性。结果表明本文数据集更能推动先进VIE算法的发展。同时,实验证明所提VIE方法在自建数据集及SROIE上均取得持续显著的性能提升。