Entity Resolution (ER) is the problem of semi-automatically determining when two entities refer to the same underlying entity, with applications ranging from healthcare to e-commerce. Traditional ER solutions required considerable manual expertise, including domain-specific feature engineering, as well as identification and curation of training data. Recently released large language models (LLMs) provide an opportunity to make ER more seamless and domain-independent. However, it is also well known that LLMs can pose risks, and that the quality of their outputs can depend on how prompts are engineered. Unfortunately, a systematic experimental study on the effects of different prompting methods for addressing unsupervised ER, using LLMs like ChatGPT, has been lacking thus far. This paper aims to address this gap by conducting such a study. We consider some relatively simple and cost-efficient ER prompt engineering methods and apply them to ER on two real-world datasets widely used in the community. We use an extensive set of experimental results to show that an LLM like GPT3.5 is viable for high-performing unsupervised ER, and interestingly, that more complicated and detailed (and hence, expensive) prompting methods do not necessarily outperform simpler approaches. We provide brief discussions on qualitative and error analysis, including a study of the inter-consistency of different prompting methods to determine whether they yield stable outputs. Finally, we consider some limitations of LLMs when applied to ER.
翻译:实体解析(ER)是半自动判定两个实体是否指向同一真实世界实体的技术问题,其应用涵盖医疗健康与电子商务等领域。传统实体解析方案需要大量人工专业知识,包括领域特定特征工程以及训练数据的识别与整理。近期发布的大语言模型(LLM)为简化实体解析、实现跨领域通用提供了契机。然而,大语言模型可能存在风险,其输出质量往往取决于提示工程的构建方式。遗憾的是,目前尚缺乏针对不同提示方法在基于ChatGPT等大语言模型的无监督实体解析中效果的系统性实验研究。本文旨在通过开展此类研究填补这一空白。我们设计了若干较为简单且成本低廉的实体解析提示工程方法,并在社区广泛使用的两个真实数据集上进行了验证。基于大量实验结果,我们发现GPT3.5等大语言模型能够有效实现高性能无监督实体解析;值得注意的是,复杂详尽(因而成本较高)的提示方法未必优于简单方法。我们通过定性与错误分析展开讨论,包括研究不同提示方法间的内在一致性以判断其输出稳定性。最后,本文阐述了大语言模型应用于实体解析时存在的若干局限性。