Autoregressive large language models (LLMs) pre-trained by next token prediction are inherently proficient in generative tasks. However, their performance on knowledge-driven tasks such as factual knowledge querying remains unsatisfactory. Knowledge graphs (KGs), as high-quality structured knowledge bases, can provide reliable knowledge for LLMs, potentially compensating for their knowledge deficiencies. Aligning LLMs with explicit, structured knowledge from KGs has been a challenge; previous attempts either failed to effectively align knowledge representations or compromised the generative capabilities of LLMs, leading to less-than-optimal outcomes. This paper proposes \textbf{KaLM}, a \textit{Knowledge-aligned Language Modeling} approach, which fine-tunes autoregressive LLMs to align with KG knowledge via the joint objective of explicit knowledge alignment and implicit knowledge alignment. The explicit knowledge alignment objective aims to directly optimize the knowledge representation of LLMs through dual-view knowledge graph contrastive learning. The implicit knowledge alignment objective focuses on incorporating textual patterns of knowledge into LLMs through triple completion language modeling. Notably, our method achieves a significant performance boost in evaluations of knowledge-driven tasks, specifically embedding-based knowledge graph completion and generation-based knowledge graph question answering.
翻译:通过下一词预测预训练的自回归大语言模型(LLMs)在生成任务上具有天然优势,但在事实知识查询等知识驱动任务上的表现仍不尽如人意。知识图谱(KGs)作为高质量的结构化知识库,能够为LLMs提供可靠的知识,从而可能弥补其知识缺陷。将LLMs与知识图谱中显式的结构化知识对齐一直是个挑战;以往的尝试要么未能有效对齐知识表示,要么损害了LLMs的生成能力,导致效果欠佳。本文提出\textbf{KaLM},一种\textit{知识对齐语言建模}方法,该方法通过显式知识对齐与隐式知识对齐的联合目标,对自回归LLMs进行微调,以使其与知识图谱知识对齐。显式知识对齐目标旨在通过双视图知识图谱对比学习直接优化LLMs的知识表示。隐式知识对齐目标则侧重于通过三元组补全语言建模将知识的文本模式融入LLMs。值得注意的是,我们的方法在知识驱动任务的评估中取得了显著性能提升,特别是在基于嵌入的知识图谱补全和基于生成的知识图谱问答方面。