Current generative knowledge graph construction approaches usually fail to capture structural knowledge by simply flattening natural language into serialized texts or a specification language. However, large generative language model trained on structured data such as code has demonstrated impressive capability in understanding natural language for structural prediction and reasoning tasks. Intuitively, we address the task of generative knowledge graph construction with code language model: given a code-format natural language input, the target is to generate triples which can be represented as code completion tasks. Specifically, we develop schema-aware prompts that effectively utilize the semantic structure within the knowledge graph. As code inherently possesses structure, such as class and function definitions, it serves as a useful model for prior semantic structural knowledge. Furthermore, we employ a rationale-enhanced generation method to boost the performance. Rationales provide intermediate steps, thereby improving knowledge extraction abilities. Experimental results indicate that the proposed approach can obtain better performance on benchmark datasets compared with baselines. Code and datasets are available in https://github.com/zjunlp/DeepKE/tree/main/example/llm.
翻译:当前生成式知识图谱构建方法通常通过将自然语言简单平铺为序列化文本或规范语言,难以捕捉结构化知识。然而,在代码等结构化数据上训练的大型生成式语言模型,已展现出理解自然语言以完成结构预测与推理任务的卓越能力。本文直觉性地探索利用代码语言模型处理生成式知识图谱构建任务:给定代码格式的自然语言输入,目标是将三元组生成表征为代码补全任务。具体而言,我们开发了模式感知提示(schema-aware prompts),有效利用知识图谱中的语义结构。由于代码天然具备结构特性(如类与函数定义),它可作为先验语义结构知识的有用模型。此外,我们采用基于推理链增强的生成方法提升性能——推理链提供中间步骤,从而增强知识抽取能力。实验结果表明,所提方法在基准数据集上相较于基线模型取得了更优性能。代码与数据集详见 https://github.com/zjunlp/DeepKE/tree/main/example/llm。