Declaration of Performance (DoP) documents, mandated by EU regulation, specify characteristics of construction products, such as fire resistance and insulation. While this information is essential for quality control and reducing carbon footprints, it is not easily machine readable. Despite content requirements, DoPs exhibit significant variation in layout, schema, and format, further complicated by their multilingual nature. In this work, we propose DoP Key Information Extraction (KIE) and Question Answering (QA) as new NLP challenges. To address this challenge, we design a domain-specific AgenticIE system based on a planner-executor-corresponder pattern. For evaluation, we introduce a high-density, expert-annotated dataset of complex, multi-page regulatory documents in English and German. Unlike standard IE datasets (e.g., FUNSD, CORD) with sparse annotations, our dataset contains over 15K annotated entities, averaging over 190 annotations per document. Our agentic system outperforms static and multimodal LLM baselines, achieving Exact Match (EM) scores of 0.396 vs. 0.342 (GPT-4o, +16%) and 0.314 (GPT-4o-V, +26%) across the KIE and QA tasks. Our experimental analysis validates the benefits of the agentic system, as well as the challenging nature of our new DoP dataset.
翻译:性能声明(DoP)文档是欧盟法规要求提供的文件,用于规定建筑产品的特性,如耐火性和隔热性。虽然这些信息对于质量控制和减少碳足迹至关重要,但其不易被机器读取。尽管内容有统一要求,但DoP文档在布局、模式和格式上存在显著差异,加之其多语言特性,使得处理更为复杂。在本研究中,我们将DoP关键信息提取(KIE)和问答(QA)任务提出为新的自然语言处理挑战。为应对这一挑战,我们基于规划器-执行器-通信器模式,设计了一个领域特定的AgenticIE系统。为进行评估,我们引入了一个高密度、由专家标注的复杂多页监管文档数据集,包含英语和德语文本。与标注稀疏的标准信息提取数据集(如FUNSD、CORD)不同,我们的数据集包含超过15,000个标注实体,平均每份文档有超过190个标注。我们的智能体系统在KIE和QA任务上均优于静态和多模态大语言模型基线,其精确匹配(EM)分数达到0.396,相较于GPT-4o的0.342(提升16%)和GPT-4o-V的0.314(提升26%)。我们的实验分析验证了智能体系统的优势,以及我们新构建的DoP数据集所具有的挑战性。