Extracting structured information from unstructured text is crucial for modeling real-world processes, but traditional schema mining relies on semi-structured data, limiting scalability. This paper introduces schema-miner, a novel tool that combines large language models with human feedback to automate and refine schema extraction. Through an iterative workflow, it organizes properties from text, incorporates expert input, and integrates domain-specific ontologies for semantic depth. Applied to materials science--specifically atomic layer deposition--schema-miner demonstrates that expert-guided LLMs generate semantically rich schemas suitable for diverse real-world applications.
翻译:从非结构化文本中提取结构化信息对于现实世界过程建模至关重要,但传统模式挖掘依赖半结构化数据,限制了可扩展性。本文介绍了schema-miner这一新型工具,它将大语言模型与人类反馈相结合,以自动化并优化模式提取过程。通过迭代式工作流,该工具从文本中组织属性、整合专家输入,并融合领域特定本体以增强语义深度。在材料科学——特别是原子层沉积领域——的应用表明,经专家引导的大语言模型能够生成适用于多样化实际应用的语义丰富模式。