Synthesis procedures play a critical role in materials research, as they directly affect material properties. With data-driven approaches increasingly accelerating materials discovery, there is growing interest in extracting synthesis procedures from scientific literature as structured data. However, existing studies often rely on rigid, domain-specific schemas with predefined fields for structuring synthesis procedures or assume that synthesis procedures are linear sequences of operations, which limits their ability to capture the structural complexity of real-world procedures. To address these limitations, we adopt PROV-DM, an international standard for provenance information, which supports flexible, graph-based modeling of procedures. We present MatPROV, a dataset of PROV-DM-compliant synthesis procedures extracted from scientific literature using large language models. MatPROV captures structural complexities and causal relationships among materials, operations, and conditions through visually intuitive directed graphs. This representation enables machine-interpretable synthesis knowledge, opening opportunities for future research such as automated synthesis planning and optimization.
翻译:合成程序在材料研究中起着关键作用,因为它直接影响材料性能。随着数据驱动方法日益加速材料发现,从科学文献中提取结构化合成程序数据的需求日益增长。然而,现有研究通常依赖具有预定义字段的刚性领域特定模式来结构化合成程序,或假设合成程序是线性操作序列,这限制了其捕捉真实世界程序结构复杂性的能力。为应对这些局限性,我们采用溯源信息国际标准PROV-DM,该标准支持灵活的、基于图的程序建模。我们提出了MatPROV——一个通过大语言模型从科学文献中提取的、符合PROV-DM标准的合成程序数据集。MatPROV通过视觉直观的有向图捕捉材料、操作与条件之间的结构复杂性和因果关系。这种表示形式实现了机器可解释的合成知识,为自动化合成规划与优化等未来研究开辟了新的可能性。