The scarcity of high-quality knowledge graphs (KGs) remains a critical bottleneck for downstream AI applications, as existing extraction methods rely heavily on error-prone pattern-matching techniques or resource-intensive large language models (LLMs). While recent tools leverage LLMs to generate KGs, their computational demands limit accessibility for low-resource environments. Our paper introduces LightKGG, a novel framework that enables efficient KG extraction from textual data using small-scale language models (SLMs) through two key technical innovations: (1) Context-integrated Graph extraction integrates contextual information with nodes and edges into a unified graph structure, reducing the reliance on complex semantic processing while maintaining more key information; (2) Topology-enhanced relationship inference leverages the inherent topology of the extracted graph to efficiently infer relationships, enabling relationship discovery without relying on complex language understanding capabilities of LLMs. By enabling accurate KG construction with minimal hardware requirements, this work bridges the gap between automated knowledge extraction and practical deployment scenarios while introducing scientifically rigorous methods for optimizing SLM efficiency in structured NLP tasks.
翻译:高质量知识图谱的稀缺仍然是下游人工智能应用的关键瓶颈,因为现有的抽取方法严重依赖容易出错的模式匹配技术或资源密集型大语言模型。虽然近期工具利用LLM生成知识图谱,但其计算需求限制了在低资源环境下的可访问性。本文提出LightKGG,一种新颖的框架,通过两项关键技术革新,使小型语言模型能够高效地从文本数据中抽取知识图谱:(1)上下文集成图抽取将上下文信息与节点和边整合为统一图结构,在保留更多关键信息的同时降低对复杂语义处理的依赖;(2)拓扑增强关系推理利用抽取图的内在拓扑结构高效推断关系,无需依赖LLM的复杂语言理解能力即可实现关系发现。该工作通过以最低硬件需求实现精确的知识图谱构建,弥合了自动化知识抽取与实际部署场景之间的鸿沟,同时为结构化NLP任务中优化SLM效率引入了科学严谨的方法。