GPT Struct Me: Probing GPT Models on Narrative Entity Extraction

The importance of systems that can extract structured information from textual data becomes increasingly pronounced given the ever-increasing volume of text produced on a daily basis. Having a system that can effectively extract such information in an interoperable manner would be an asset for several domains, be it finance, health, or legal. Recent developments in natural language processing led to the production of powerful language models that can, to some degree, mimic human intelligence. Such effectiveness raises a pertinent question: Can these models be leveraged for the extraction of structured information? In this work, we address this question by evaluating the capabilities of two state-of-the-art language models -- GPT-3 and GPT-3.5, commonly known as ChatGPT -- in the extraction of narrative entities, namely events, participants, and temporal expressions. This study is conducted on the Text2Story Lusa dataset, a collection of 119 Portuguese news articles whose annotation framework includes a set of entity structures along with several tags and attribute values. We first select the best prompt template through an ablation study over prompt components that provide varying degrees of information on a subset of documents of the dataset. Subsequently, we use the best templates to evaluate the effectiveness of the models on the remaining documents. The results obtained indicate that GPT models are competitive with out-of-the-box baseline systems, presenting an all-in-one alternative for practitioners with limited resources. By studying the strengths and limitations of these models in the context of information extraction, we offer insights that can guide future improvements and avenues to explore in this field.

翻译：从文本数据中提取结构化信息的系统，其重要性随着每日产生的文本量日益增长而愈发凸显。能够以可互操作方式有效提取此类信息的系统，将成为金融、医疗或法律等多个领域的宝贵资产。自然语言处理的最新进展催生了强大的语言模型，这些模型在一定程度上能够模拟人类智能。这种有效性引发了一个关键问题：能否利用这些模型提取结构化信息？本研究通过评估两种最先进的語言模型——GPT-3和GPT-3.5（即ChatGPT）在抽取叙述性实体（即事件、参与者和时间表达式）方面的能力来探讨这一问题。研究基于Text2Story Lusa数据集，该数据集包含119篇葡萄牙语新闻文章，其标注框架涵盖一组实体结构以及多个标签和属性值。我们首先通过消融研究，在数据集子集文档上对不同信息粒度的提示组件进行实验，筛选出最优提示模板。随后，利用最优模板评估模型在剩余文档上的有效性。结果表明，GPT模型与开箱即用的基线系统具有竞争力，为资源有限的实践者提供了一体化替代方案。通过剖析这些模型在信息抽取中的优势与局限，我们提供了可指导该领域未来改进方向和探索路径的洞见。