Exploring In-Context Learning Capabilities of Foundation Models for Generating Knowledge Graphs from Text

Knowledge graphs can represent information about the real-world using entities and their relations in a structured and semantically rich manner and they enable a variety of downstream applications such as question-answering, recommendation systems, semantic search, and advanced analytics. However, at the moment, building a knowledge graph involves a lot of manual effort and thus hinders their application in some situations and the automation of this process might benefit especially for small organizations. Automatically generating structured knowledge graphs from a large volume of natural language is still a challenging task and the research on sub-tasks such as named entity extraction, relation extraction, entity and relation linking, and knowledge graph construction aims to improve the state of the art of automatic construction and completion of knowledge graphs from text. The recent advancement of foundation models with billions of parameters trained in a self-supervised manner with large volumes of training data that can be adapted to a variety of downstream tasks has helped to demonstrate high performance on a large range of Natural Language Processing (NLP) tasks. In this context, one emerging paradigm is in-context learning where a language model is used as it is with a prompt that provides instructions and some examples to perform a task without changing the parameters of the model using traditional approaches such as fine-tuning. This way, no computing resources are needed for re-training/fine-tuning the models and the engineering effort is minimal. Thus, it would be beneficial to utilize such capabilities for generating knowledge graphs from text.

翻译：知识图谱能够利用实体及其关系，以结构化和语义丰富的方式表示现实世界信息，并支持问答系统、推荐系统、语义搜索及高级分析等多种下游应用。然而，当前构建知识图谱仍需大量人工投入，这限制了其在某些场景中的应用，而自动化此过程尤其有利于小型组织。从大规模自然语言中自动生成结构化知识图谱仍是一项具有挑战性的任务，针对命名实体抽取、关系抽取、实体与关系链接以及知识图谱构建等子任务的研究，旨在提升从文本中自动构建和补全知识图谱的技术水平。近期，通过自监督方式利用海量训练数据训练而成的、具有数十亿参数且可适配多种下游任务的基础模型取得了进展，已在大量自然语言处理任务中展现出高性能。在此背景下，一种新兴范式——上下文学习应运而生：它无需通过微调等传统方法改变模型参数，只需向语言模型提供包含指令和示例的提示即可执行任务。这种方式无需计算资源重新训练或微调模型，工程工作量极小。因此，利用此类能力从文本生成知识图谱将大有裨益。