Iterative Zero-Shot LLM Prompting for Knowledge Graph Construction

In the current digitalization era, capturing and effectively representing knowledge is crucial in most real-world scenarios. In this context, knowledge graphs represent a potent tool for retrieving and organizing a vast amount of information in a properly interconnected and interpretable structure. However, their generation is still challenging and often requires considerable human effort and domain expertise, hampering the scalability and flexibility across different application fields. This paper proposes an innovative knowledge graph generation approach that leverages the potential of the latest generative large language models, such as GPT-3.5, that can address all the main critical issues in knowledge graph building. The approach is conveyed in a pipeline that comprises novel iterative zero-shot and external knowledge-agnostic strategies in the main stages of the generation process. Our unique manifold approach may encompass significant benefits to the scientific community. In particular, the main contribution can be summarized by: (i) an innovative strategy for iteratively prompting large language models to extract relevant components of the final graph; (ii) a zero-shot strategy for each prompt, meaning that there is no need for providing examples for "guiding" the prompt result; (iii) a scalable solution, as the adoption of LLMs avoids the need for any external resources or human expertise. To assess the effectiveness of our proposed model, we performed experiments on a dataset that covered a specific domain. We claim that our proposal is a suitable solution for scalable and versatile knowledge graph construction and may be applied to different and novel contexts.

翻译：在当前数字化时代，有效捕获和表示知识在大多数真实场景中至关重要。在此背景下，知识图谱作为检索和组织海量信息的强大工具，能够以互联且可解释的结构呈现数据。然而，其生成过程仍具挑战性，通常需要大量人工投入和领域专业知识，这限制了不同应用领域的可扩展性和灵活性。本文提出一种创新的知识图谱生成方法，利用最新生成式大语言模型（如GPT-3.5）的潜力，解决知识图谱构建中的关键问题。该方法通过流水线实现，包含生成过程主要阶段的新颖迭代式零样本策略和外部知识无关策略。我们独特的多样化方法可为科学界带来显著益处。具体而言，主要贡献可归纳为：（i）一种迭代提示大语言模型以提取最终图谱相关组件的创新策略；（ii）每次提示采用零样本策略，即无需提供示例来"引导"提示结果；（iii）可扩展方案，因采用大语言模型无需任何外部资源或人类专业知识。为评估所提模型的有效性，我们在覆盖特定领域的数据集上进行了实验。我们认为，该方案适用于可扩展且灵活的知识图谱构建，并可推广至不同及新颖的应用场景。