Knowledge graphs (KGs) are vital for knowledge-intensive tasks and have shown promise in reducing hallucinations in large language models (LLMs). However, constructing high-quality KGs remains difficult, requiring accurate information extraction and structured representations that support interpretability and downstream utility. Existing LLM-based approaches often focus narrowly on entity and relation extraction, limiting coverage to sentence-level contexts or relying on predefined schemas. We propose a hierarchical extraction framework that organizes information at multiple levels, enabling the creation of semantically rich and well-structured KGs. Using state-of-the-art LLMs, we extract and construct knowledge graphs and evaluate them comprehensively from both structural and semantic perspectives. Our results highlight the strengths and shortcomings of current LLMs in KG construction and identify key challenges for future work. To advance research in this area, we also release a curated dataset of LLM-generated KGs derived from research papers on children's mental well-being. This resource aims to foster more transparent, reliable, and impactful applications in high-stakes domains such as healthcare.
翻译:知识图谱(KGs)对于知识密集型任务至关重要,并且在减少大型语言模型(LLMs)的幻觉方面展现出潜力。然而,构建高质量的知识图谱仍然困难,需要精确的信息提取和能够支持可解释性与下游效用的结构化表示。现有基于LLM的方法通常狭隘地聚焦于实体与关系抽取,将覆盖范围限制在句子级上下文或依赖于预定义的模式。我们提出了一种分层提取框架,该框架在多个层次上组织信息,从而能够创建语义丰富且结构良好的知识图谱。利用最先进的大型语言模型,我们提取并构建知识图谱,并从结构和语义两个角度对其进行全面评估。我们的结果突显了当前LLMs在知识图谱构建中的优势与不足,并指出了未来工作的关键挑战。为了推动该领域的研究,我们还发布了一个精选的数据集,其中包含源自儿童心理健康研究论文的LLM生成知识图谱。这一资源旨在促进医疗保健等高风险领域中更透明、可靠且具有影响力的应用。