Autoregressive large language models (LLMs) pre-trained by next token prediction are inherently proficient in generative tasks. However, their performance on knowledge-driven tasks such as factual knowledge querying remains unsatisfactory. Knowledge graphs (KGs), as high-quality structured knowledge bases, can provide reliable knowledge for LLMs, potentially compensating for their knowledge deficiencies. Aligning LLMs with explicit, structured knowledge from KGs has been a challenge; previous attempts either failed to effectively align knowledge representations or compromised the generative capabilities of LLMs, leading to less-than-optimal outcomes. This paper proposes \textbf{KaLM}, a \textit{Knowledge-aligned Language Modeling} approach, which fine-tunes autoregressive LLMs to align with KG knowledge via the joint objective of explicit knowledge alignment and implicit knowledge alignment. The explicit knowledge alignment objective aims to directly optimize the knowledge representation of LLMs through dual-view knowledge graph contrastive learning. The implicit knowledge alignment objective focuses on incorporating textual patterns of knowledge into LLMs through triple completion language modeling. Notably, our method achieves a significant performance boost in evaluations of knowledge-driven tasks, specifically embedding-based knowledge graph completion and generation-based knowledge graph question answering.
翻译:通过下一词预测预训练的自回归大语言模型(LLMs)在生成任务上具有天然优势,但在事实知识查询等知识驱动任务上的表现仍不尽如人意。知识图谱(KGs)作为高质量的结构化知识库,可为LLMs提供可靠知识,有望弥补其知识缺陷。将LLMs与知识图谱中显式、结构化的知识对齐一直面临挑战;先前尝试要么未能有效对齐知识表示,要么损害了LLMs的生成能力,导致效果欠佳。本文提出\textbf{KaLM},一种\textit{知识对齐语言建模}方法,通过显式知识对齐与隐式知识对齐的联合目标,微调自回归LLMs以实现与知识图谱知识的对齐。显式知识对齐目标旨在通过双视图知识图谱对比学习直接优化LLMs的知识表示。隐式知识对齐目标则侧重于通过三元组补全语言建模将知识的文本模式融入LLMs。值得注意的是,在知识驱动任务(特别是基于嵌入的知识图谱补全和基于生成的知识图谱问答)的评估中,本方法实现了显著的性能提升。