Document pair extraction aims to identify key and value entities as well as their relationships from visually-rich documents. Most existing methods divide it into two separate tasks: semantic entity recognition (SER) and relation extraction (RE). However, simply concatenating SER and RE serially can lead to severe error propagation, and it fails to handle cases like multi-line entities in real scenarios. To address these issues, this paper introduces a novel framework, PEneo (Pair Extraction new decoder option), which performs document pair extraction in a unified pipeline, incorporating three concurrent sub-tasks: line extraction, line grouping, and entity linking. This approach alleviates the error accumulation problem and can handle the case of multi-line entities. Furthermore, to better evaluate the model's performance and to facilitate future research on pair extraction, we introduce RFUND, a re-annotated version of the commonly used FUNSD and XFUND datasets, to make them more accurate and cover realistic situations. Experiments on various benchmarks demonstrate PEneo's superiority over previous pipelines, boosting the performance by a large margin (e.g., 19.89%-22.91% F1 score on RFUND-EN) when combined with various backbones like LiLT and LayoutLMv3, showing its effectiveness and generality. Codes and the new annotations will be open to the public.
翻译:文档配对提取旨在从视觉丰富的文档中识别键值实体及其关系。现有方法大多将其划分为语义实体识别(SER)和关系抽取(RE)两个独立任务。然而,简单串联SER和RE会导致严重的错误传播,且无法处理实际场景中的多行实体等情形。为解决这些问题,本文提出新型框架PEneo(配对提取新解码器选项),该框架通过统一流水线执行文档配对提取,包含三个并发子任务:行提取、行分组与实体链接。该方法缓解了误差累积问题,并能处理多行实体情况。此外,为更好评估模型性能并促进配对提取的未来研究,我们引入了RFUND——对常用FUNSD和XFUND数据集进行重新标注的版本,以提升其精确性并覆盖真实场景。在多个基准测试上的实验表明,PEneo优于先前流水线方法,在结合LiLT、LayoutLMv3等多种骨干网络时性能大幅提升(例如,在RFUND-EN上F1分数提升19.89%-22.91%),展现出其有效性与通用性。代码及新标注数据将向公众开放。