Integrating large language models (LLMs) with knowledge graphs derived from domain-specific data represents an important advancement towards more powerful and factual reasoning. As these models grow more capable, it is crucial to enable them to perform multi-step inferences over real-world knowledge graphs while minimizing hallucination. While large language models excel at conversation and text generation, their ability to reason over domain-specialized graphs of interconnected entities remains limited. For example, can we query a LLM to identify the optimal contact in a professional network for a specific goal, based on relationships and attributes in a private database? The answer is no--such capabilities lie beyond current methods. However, this question underscores a critical technical gap that must be addressed. Many high-value applications in areas such as science, security, and e-commerce rely on proprietary knowledge graphs encoding unique structures, relationships, and logical constraints. We introduce a fine-tuning framework for developing Graph-aligned LAnguage Models (GLaM) that transforms a knowledge graph into an alternate text representation with labeled question-answer pairs. We demonstrate that grounding the models in specific graph-based knowledge expands the models' capacity for structure-based reasoning. Our methodology leverages the large-language model's generative capabilities to create the dataset and proposes an efficient alternate to retrieval-augmented generation styled methods.
翻译:将大型语言模型与领域特定数据中获取的知识图谱相结合,是实现更强大且基于事实推理能力的重要进展。随着这些模型能力的不断提升,使其能对现实世界知识图谱执行多步推理并最大限度减少幻觉至关重要。尽管大型语言模型在对话和文本生成方面表现出色,但它们对互联实体构成的领域专有图谱的推理能力仍然有限。例如:能否基于私有数据库中的关系与属性,通过查询大型语言模型来识别专业网络中实现特定目标的最佳联系人?答案是否定的——这类能力已超出当前方法的范畴。然而,这一问题揭示了必须解决的关键技术差距。科学、安全、电子商务等领域中的许多高价值应用依赖于编码了独特结构、关系和逻辑约束的专属知识图谱。我们提出一种微调框架,用于开发图对齐语言模型(GLaM),该框架将知识图谱转换为带有标注问答对的替代文本表示。研究表明,将模型锚定于特定图结构知识可扩展其基于结构的推理能力。我们的方法利用大型语言模型的生成能力创建数据集,并提出一种替代检索增强生成风格方法的有效方案。