Entity Matching is the task of deciding if two entity descriptions refer to the same real-world entity. State-of-the-art entity matching methods often rely on fine-tuning Transformer models such as BERT or RoBERTa. Two major drawbacks of using these models for entity matching are that (i) the models require significant amounts of fine-tuning data for reaching a good performance and (ii) the fine-tuned models are not robust concerning out-of-distribution entities. In this paper, we investigate using ChatGPT for entity matching as a more robust, training data-efficient alternative to traditional Transformer models. We perform experiments along three dimensions: (i) general prompt design, (ii) in-context learning, and (iii) provision of higher-level matching knowledge. We show that ChatGPT is competitive with a fine-tuned RoBERTa model, reaching a zero-shot performance of 82.35% F1 on a challenging matching task on which RoBERTa requires 2000 training examples for reaching a similar performance. Adding in-context demonstrations to the prompts further improves the F1 by up to 7.85% when using similarity-based example selection. Always using the same set of 10 handpicked demonstrations leads to an improvement of 4.92% over the zero-shot performance. Finally, we show that ChatGPT can also be guided by adding higher-level matching knowledge in the form of rules to the prompts. Providing matching rules leads to similar performance gains as providing in-context demonstrations.
翻译:实体匹配是判断两个实体描述是否指向同一现实世界实体的任务。当前最先进的实体匹配方法通常依赖于微调Transformer模型(如BERT或RoBERTa)。使用这些模型进行实体匹配存在两个主要缺陷:(i)模型需要大量微调数据才能达到良好性能,(ii)微调后的模型对分布外实体缺乏鲁棒性。本文探究了将ChatGPT作为传统Transformer模型的替代方案,用于实现更鲁棒、更节省训练数据的实体匹配方法。我们沿三个维度展开实验:(i)通用提示设计,(ii)上下文内学习,(iii)高层级匹配知识的提供。实验表明ChatGPT与微调后的RoBERTa模型性能相当,在RoBERTa需2000个训练样本才能达到相似性能的具有挑战性的匹配任务上,其零样本F1值可达82.35%。采用基于相似性的示例选择时,在提示中添加上下文内示例可使F1值最高提升7.85%。始终使用同一组10个人工精选示例可带来4.92%的零样本性能提升。最后,我们证明通过向提示中添加匹配规则形式的高层级知识也可引导ChatGPT,提供匹配规则与提供上下文内示例能带来相近的性能提升。