Visually Rich Form Understanding (VRFU) poses a complex research problem due to the documents' highly structured nature and yet highly variable style and content. Current annotation schemes decompose form understanding and omit key hierarchical structure, making development and evaluation of end-to-end models difficult. In this paper, we propose a novel F1 metric to evaluate form parsers and describe a new content-agnostic, tree-based annotation scheme for VRFU: TreeForm. We provide methods to convert previous annotation schemes into TreeForm structures and evaluate TreeForm predictions using a modified version of the normalized tree-edit distance. We present initial baselines for our end-to-end performance metric and the TreeForm edit distance, averaged over the FUNSD and XFUND datasets, of 61.5 and 26.4 respectively. We hope that TreeForm encourages deeper research in annotating, modeling, and evaluating the complexities of form-like documents.
翻译:视觉丰富的表单理解(Visually Rich Form Understanding, VRFU)由于文档高度结构化的特性及其高度可变的风格与内容,构成了一个复杂的研究问题。当前的标注方案在分解表单理解过程时,忽略了关键的层级结构,使得端到端模型的开发与评估变得困难。本文提出了一种新颖的F1指标用于评估表单解析器,并描述了一种新的、不依赖内容且基于树的VRFU标注方案:TreeForm。我们提供了将先前标注方案转换为TreeForm结构的方法,并利用归一化树编辑距离的改进版本评估TreeForm预测结果。我们给出了端到端性能指标与TreeForm编辑距离的初步基线结果,在FUNSD和XFUND数据集上的平均值分别为61.5和26.4。我们希望TreeForm能推动对类表单文档的标注、建模与评估复杂性开展更深入的研究。