Knowledge Graphs (KGs) are essential for the functionality of GraphRAGs, a form of Retrieval-Augmented Generative Systems (RAGs) that excel in tasks requiring structured reasoning and semantic understanding. However, creating KGs for GraphRAGs remains a significant challenge due to accuracy and scalability limitations of traditional methods. This paper introduces a novel approach leveraging large language models (LLMs) like GPT-4, LLaMA 2 (13B), and BERT to generate KGs directly from unstructured data, bypassing traditional pipelines. Using metrics such as Precision, Recall, F1-Score, Graph Edit Distance, and Semantic Similarity, we evaluate the models' ability to generate high-quality KGs. Results demonstrate that GPT-4 achieves superior semantic fidelity and structural accuracy, LLaMA 2 excels in lightweight, domain-specific graphs, and BERT provides insights into challenges in entity-relationship modeling. This study underscores the potential of LLMs to streamline KG creation and enhance GraphRAG accessibility for real-world applications, while setting a foundation for future advancements.
翻译:知识图谱(KGs)对于GraphRAG的功能至关重要,GraphRAG是一种检索增强生成系统(RAGs),在需要结构化推理和语义理解的任务中表现出色。然而,由于传统方法在准确性和可扩展性方面的局限,为GraphRAG创建知识图谱仍然是一个重大挑战。本文提出一种新颖方法,利用GPT-4、LLaMA 2(13B)和BERT等大型语言模型(LLMs)直接从非结构化数据生成知识图谱,绕过了传统处理流程。通过精确率、召回率、F1分数、图编辑距离和语义相似度等指标,我们评估了这些模型生成高质量知识图谱的能力。结果表明,GPT-4在语义保真度和结构准确性方面表现最优,LLaMA 2在轻量级、特定领域图谱生成方面表现突出,而BERT则揭示了实体-关系建模中的挑战。本研究强调了大型语言模型在简化知识图谱构建、提升GraphRAG在实际应用中的可访问性方面的潜力,同时为未来的发展奠定了基础。