Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs

This paper defines and explores the design space for information extraction (IE) from layout-rich documents using large language models (LLMs). The three core challenges of layout-aware IE with LLMs are 1) data structuring, 2) model engagement, and 3) output refinement. Our study delves into the sub-problems within these core challenges, such as input representation, chunking, prompting, and selection of LLMs and multimodal models. It examines the outcomes of different design choices through a new layout-aware IE test suite, benchmarking against the state-of-art (SoA) model LayoutLMv3. The results show that the configuration from one-factor-at-a-time (OFAT) trial achieves near-optimal results with 14.1 points F1-score gain from the baseline model, while full factorial exploration yields only a slightly higher 15.1 points gain at around 36x greater token usage. We demonstrate that well-configured general-purpose LLMs can match the performance of specialized models, providing a cost-effective alternative. Our test-suite is freely available at https://github.com/gayecolakoglu/LayIE-LLM.

翻译：本文定义并探索了利用大语言模型从富布局文档中进行信息提取的设计空间。使用大语言模型进行布局感知信息提取面临的三个核心挑战是：1）数据结构化，2）模型交互，以及3）输出优化。我们的研究深入探讨了这些核心挑战中的子问题，例如输入表示、分块处理、提示工程以及大语言模型和多模态模型的选择。研究通过一个新的布局感知信息提取测试套件，对比当前最先进的LayoutLMv3模型，检验了不同设计选择的结果。结果表明，通过单因素轮换试验获得的配置方案取得了接近最优的结果，相比基线模型F1分数提升了14.1分，而全因子探索仅带来略高的15.1分提升，但其令牌使用量却增加了约36倍。我们证明，经过良好配置的通用大语言模型能够达到专用模型的性能，提供了一种具有成本效益的替代方案。我们的测试套件已在https://github.com/gayecolakoglu/LayIE-LLM 开源提供。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日