Document pair extraction aims to identify key and value entities as well as their relationships from visually-rich documents. Most existing methods divide it into two separate tasks: semantic entity recognition (SER) and relation extraction (RE). However, simply concatenating SER and RE serially can lead to severe error propagation, and it fails to handle cases like multi-line entities in real scenarios. To address these issues, this paper introduces a novel framework, PEneo (Pair Extraction new decoder option), which performs document pair extraction in a unified pipeline, incorporating three concurrent sub-tasks: line extraction, line grouping, and entity linking. This approach alleviates the error accumulation problem and can handle the case of multi-line entities. Furthermore, to better evaluate the model's performance and to facilitate future research on pair extraction, we introduce RFUND, a re-annotated version of the commonly used FUNSD and XFUND datasets, to make them more accurate and cover realistic situations. Experiments on various benchmarks demonstrate PEneo's superiority over previous pipelines, boosting the performance by a large margin (e.g., 19.89%-22.91% F1 score on RFUND-EN) when combined with various backbones like LiLT and LayoutLMv3, showing its effectiveness and generality. Codes and the new annotations are available at \href{https://github.com/ZeningLin/PEneo}{https://github.com/ZeningLin/PEneo}.
翻译:文档对抽取旨在从视觉丰富的文档中识别关键实体、值实体及其关系。现有方法大多将其分解为两个独立任务:语义实体识别(SER)与关系抽取(RE)。然而,简单串联SER与RE会导致严重的错误传播,且无法处理实际场景中的多行实体情况。为解决这些问题,本文提出一种新颖框架PEneo(Pair Extraction new decoder option),通过统一流程执行文档对抽取,整合了三个并行的子任务:线提取、线分组与实体链接。该方法缓解了错误累积问题,并能处理多行实体。此外,为更好评估模型性能并推动文档对抽取的未来研究,我们引入了RFUND数据集——对广泛使用的FUNSD与XFUND数据集进行重新标注的版本,使其更精确且覆盖真实场景。在多类基准测试上的实验表明,PEneo优于以往流程,与LiLT、LayoutLMv3等多种骨干网络结合时性能大幅提升(如在RFUND-EN上F1分数提升19.89%-22.91%),证明了其有效性与泛化能力。代码及新标注数据发布于 \href{https://github.com/ZeningLin/PEneo}{https://github.com/ZeningLin/PEneo}。