Skill-Based Autonomous Agents for Material Creep Database Construction

The advancement of data-driven materials science is currently constrained by a fundamental bottleneck: the vast majority of historical experimental data remains locked within the unstructured text and rasterized figures of legacy scientific literature. Manual curation of this knowledge is prohibitively labor-intensive and prone to human error. To address this challenge, we introduce an autonomous, agent-based framework powered by Large Language Models (LLMs) designed to excavate high-fidelity datasets from scientific PDFs without human intervention. By deploying a modular "skill-based" architecture, the agent orchestrates complex cognitive tasks - including semantic filtering, multi-modal information extraction, and physics-informed validation. We demonstrate the efficacy of this framework by constructing a physically self-consistent database for material creep mechanics, a domain characterized by complex graphical trajectories and heterogeneous constitutive models. Applying the pipeline to 243 publications, the agent achieved a verified extraction success rate exceeding 90% for graphical data digitization. Crucially, we introduce a cross-modal verification protocol, demonstrating that the agent can autonomously align visually extracted data points with textually extracted constitutive parameters ($R^2 > 0.99$), ensuring the physical self-consistency of the database. This work not only provides a critical resource for investigating time-dependent deformation across diverse material systems but also establishes a scalable paradigm for autonomous knowledge acquisition, paving the way for the next generation of self-driving laboratories.

翻译：数据驱动材料科学的发展目前面临一个根本性瓶颈：绝大多数历史实验数据仍被锁定在遗留科学文献的非结构化文本和栅格化图表中。人工整理这些知识不仅工作量巨大且易产生人为错误。为应对这一挑战，我们提出一种由大型语言模型驱动的自主智能体框架，旨在无需人工干预地从科学PDF文档中挖掘高保真数据集。通过部署模块化的"基于技能"架构，该智能体能够协调复杂的认知任务——包括语义过滤、多模态信息提取和基于物理原理的验证。我们通过构建一个物理自洽的材料蠕变力学数据库来证明该框架的有效性，该领域以复杂的图形轨迹和异构本构模型为特征。将该流程应用于243篇文献后，智能体在图形数据数字化方面的验证提取成功率超过90%。关键的是，我们引入了跨模态验证协议，证明智能体能够自主对齐视觉提取的数据点与文本提取的本构参数（$R^2 > 0.99$），从而确保数据库的物理自洽性。这项工作不仅为研究跨材料系统的时间依赖性变形提供了关键资源，而且建立了可扩展的自主知识获取范式，为下一代自主实验室的发展铺平了道路。