Constructing domain-specific knowledge graphs from unstructured text remains challenging due to heterogeneous entity mentions, long-tail relation distributions, and the absence of standardized schemas. We present LEC-KG, a bidirectional collaborative framework that integrates the semantic understanding of Large Language Models (LLMs) with the structural reasoning of Knowledge Graph Embeddings (KGE). Our approach features three key components: (1) hierarchical coarse-to-fine relation extraction that mitigates long-tail bias, (2) evidence-guided Chain-of-Thought feedback that grounds structural suggestions in source text, and (3) semantic initialization that enables structural validation for unseen entities. The two modules enhance each other iteratively-KGE provides structure-aware feedback to refine LLM extractions, while validated triples progressively improve KGE representations. We evaluate LEC-KG on Chinese Sustainable Development Goal (SDG) reports, demonstrating substantial improvements over LLM baselines, particularly on low-frequency relations. Through iterative refinement, our framework reliably transforms unstructured policy text into validated knowledge graph triples.
翻译:从非结构化文本构建领域特定知识图谱仍然面临诸多挑战,包括异构的实体提及、长尾关系分布以及标准化模式的缺失。本文提出LEC-KG,一种双向协同框架,该框架将大语言模型的语义理解能力与知识图谱嵌入的结构推理能力相结合。我们的方法包含三个关键组成部分:(1) 分层由粗到精的关系抽取,以缓解长尾偏差;(2) 证据引导的思维链反馈,将结构建议锚定于源文本;(3) 语义初始化,实现对未见实体的结构验证。两个模块通过迭代相互增强——知识图谱嵌入提供结构感知反馈以优化大语言模型的抽取结果,而经过验证的三元组则逐步改进知识图谱嵌入的表征。我们在中文可持续发展目标报告上评估了LEC-KG,结果表明其相较于大语言模型基线方法有显著提升,尤其是在低频关系上。通过迭代优化,我们的框架能够可靠地将非结构化政策文本转化为经过验证的知识图谱三元组。