Knowledge in materials science is widely dispersed across extensive scientific literature, posing significant challenges for efficient discovery and integration of new materials. Traditional methods, often reliant on costly and time-consuming experimental approaches, further complicate rapid innovation. Addressing these challenges, the integration of artificial intelligence with materials science has opened avenues for accelerating the discovery process, though it also demands precise annotation, data extraction, and traceability of information. To tackle these issues, this article introduces the Materials Knowledge Graph (MKG), which utilizes advanced natural language processing techniques, integrated with large language models to extract and systematically organize a decade's worth of high-quality research into structured triples, contains 162,605 nodes and 731,772 edges. MKG categorizes information into comprehensive labels such as Name, Formula, and Application, structured around a meticulously designed ontology, thus enhancing data usability and integration. By implementing network-based algorithms, MKG not only facilitates efficient link prediction but also significantly reduces reliance on traditional experimental methods. This structured approach not only streamlines materials research but also lays the groundwork for more sophisticated science knowledge graphs.
翻译:材料科学知识广泛分散于大量科学文献中,这对新材料的有效发现与整合构成了重大挑战。传统方法通常依赖于成本高昂且耗时的实验手段,进一步加剧了快速创新的复杂性。为应对这些挑战,人工智能与材料科学的融合为加速发现进程开辟了新途径,尽管这也要求对信息进行精确标注、数据提取与可追溯性处理。针对这些问题,本文介绍了材料知识图谱,该图谱利用先进的自然语言处理技术,结合大型语言模型,提取并系统化地整理了十年间的高质量研究成果,将其组织为结构化三元组,包含162,605个节点和731,772条边。材料知识图谱将信息分类为名称、分子式和应用等综合性标签,围绕精心设计的本体进行构建,从而提升了数据的可用性与整合性。通过实施基于网络的算法,材料知识图谱不仅促进了高效的链接预测,还显著降低了对传统实验方法的依赖。这种结构化方法不仅简化了材料研究流程,也为构建更复杂的科学知识图谱奠定了基础。