Compared to general document analysis tasks, form document structure understanding and retrieval are challenging. Form documents are typically made by two types of authors; A form designer, who develops the form structure and keys, and a form user, who fills out form values based on the provided keys. Hence, the form values may not be aligned with the form designer's intention (structure and keys) if a form user gets confused. In this paper, we introduce Form-NLU, the first novel dataset for form structure understanding and its key and value information extraction, interpreting the form designer's intent and the alignment of user-written value on it. It consists of 857 form images, 6k form keys and values, and 4k table keys and values. Our dataset also includes three form types: digital, printed, and handwritten, which cover diverse form appearances and layouts. We propose a robust positional and logical relation-based form key-value information extraction framework. Using this dataset, Form-NLU, we first examine strong object detection models for the form layout understanding, then evaluate the key information extraction task on the dataset, providing fine-grained results for different types of forms and keys. Furthermore, we examine it with the off-the-shelf pdf layout extraction tool and prove its feasibility in real-world cases.
翻译:摘要:与通用文档分析任务相比,表单文档的结构理解与检索更具挑战性。表单文档通常由两类作者制作:表单设计者(负责开发表单结构与键)和表单填写者(根据提供的键填写表单值)。因此,若填写者产生困惑,其输入的表单值可能偏离设计者的原始意图(结构与键)。本文提出Form-NLU——首个用于表单结构理解及其键值信息抽取的新型数据集,旨在解释表单设计者意图并识别用户填写内容与其对齐程度。该数据集包含857张表单图像、6000个表单键值对及4000个表格键值对,同时涵盖数字、打印与手写三种表单类型,覆盖多样化的表单外观与布局。我们提出一种基于位置与逻辑关系的稳健表单键值信息抽取框架。利用该数据集,首先评估强目标检测模型在表单布局理解中的表现,继而进行键信息抽取任务评测,并针对不同表单类型与键类别给出细粒度结果。此外,通过与现有PDF布局抽取工具联合测试,验证了该方法在实际场景中的可行性。