The preservation of cultural heritage is increasingly transitioning towards data-driven predictive maintenance and "Digital Twin" construction. However, the mechanical constitutive models required for high-fidelity simulations remain fragmented across decades of unstructured scientific literature, creating a "Data Silo" that hinders conservation engineering. To address this, we present an automated, two-stage agentic framework leveraging Large Language Models (LLMs) to extract mechanical constitutive equations, calibrated parameters, and metadata from PDF documents. The workflow employs a resource-efficient "Gatekeeper" agent for relevance filtering and a high-capability "Analyst" agent for fine-grained extraction, featuring a novel Context-Aware Symbolic Grounding mechanism to resolve mathematical ambiguities. Applied to a corpus of over 2,000 research papers, the system successfully isolated 113 core documents and constructed a structured database containing 185 constitutive model instances and over 450 calibrated parameters. The extraction precision reached 80.4\%, establishing a highly efficient "Human-in-the-loop" workflow that reduces manual data curation time by approximately 90\%. We demonstrate the system's utility through a web-based Knowledge Retrieval Platform, which enables rapid parameter discovery for computational modeling. This work transforms scattered literature into a queryable digital asset, laying the data foundation for the "Digital Material Twin" of built heritage.
翻译:文化遗产保护正日益向数据驱动的预测性维护与"数字孪生"构建转型。然而,高保真仿真所需的机械本构模型仍分散在数十年积累的非结构化科学文献中,形成了阻碍保护工程实践的"数据孤岛"。为此,我们提出一种基于大语言模型(LLMs)的自动化双阶段智能体框架,用于从PDF文档中提取机械本构方程、标定参数及元数据。该工作流采用资源高效的"守门人"智能体进行相关性筛选,并利用高性能"分析员"智能体进行细粒度提取,其创新性地引入上下文感知符号锚定机制以消除数学表达歧义。在包含2000余篇研究论文的语料库中,该系统成功筛选出113篇核心文献,构建了包含185个本构模型实例及450余项标定参数的结构化数据库。提取精确率达到80.4%,建立了高效的"人在回路"工作流程,将人工数据整理时间减少约90%。我们通过基于网络的知识检索平台展示了该系统的实用性,该平台支持计算建模参数的快速发现。本研究将分散的文献转化为可查询的数字资产,为建筑遗产的"数字材料孪生"奠定了数据基础。