Recent works on form understanding mostly employ multimodal transformers or large-scale pre-trained language models. These models need ample data for pre-training. In contrast, humans can usually identify key-value pairings from a form only by looking at layouts, even if they don't comprehend the language used. No prior research has been conducted to investigate how helpful layout information alone is for form understanding. Hence, we propose a unique entity-relation graph parsing method for scanned forms called LAGNN, a language-independent Graph Neural Network model. Our model parses a form into a word-relation graph in order to identify entities and relations jointly and reduce the time complexity of inference. This graph is then transformed by deterministic rules into a fully connected entity-relation graph. Our model simply takes into account relative spacing between bounding boxes from layout information to facilitate easy transfer across languages. To further improve the performance of LAGNN, and achieve isomorphism between entity-relation graphs and word-relation graphs, we use integer linear programming (ILP) based inference. Code is publicly available at https://github.com/Bhanu068/LAGNN
翻译:近期表单理解的研究工作主要采用多模态Transformer或大规模预训练语言模型。这类模型需要大量数据进行预训练。相比之下,人类即使不理解表单使用的语言,通常也能仅通过版面布局识别键值对。目前尚无研究探讨单纯利用布局信息对表单理解的帮助程度。为此,我们提出一种针对扫描表单的独特实体关系图解析方法——LAGNN,即语言无关图神经网络模型。该模型将表单解析为词关系图,以联合识别实体与关系并降低推理时间复杂度。通过确定性规则将该图转换为全连接实体关系图。模型仅利用布局信息中的边界框相对间距,便于跨语言迁移。为进一步提升LAGNN性能并实现实体关系图与词关系图的同构性,我们采用基于整数线性规划(ILP)的推理方法。代码已开源在 https://github.com/Bhanu068/LAGNN