Entity matching (EM) is a critical step in entity resolution (ER). Recently, entity matching based on large language models (LLMs) has shown great promise. However, current LLM-based entity matching approaches typically follow a binary matching paradigm that ignores the global consistency among record relationships. In this paper, we investigate various methodologies for LLM-based entity matching that incorporate record interactions from different perspectives. Specifically, we comprehensively compare three representative strategies: matching, comparing, and selecting, and analyze their respective advantages and challenges in diverse scenarios. Based on our findings, we further design a compound entity matching framework (ComEM) that leverages the composition of multiple strategies and LLMs. ComEM benefits from the advantages of different sides and achieves improvements in both effectiveness and efficiency. Experimental results on 8 ER datasets and 10 LLMs verify the superiority of incorporating record interactions through the selecting strategy, as well as the further cost-effectiveness brought by ComEM.
翻译:实体匹配(Entity Matching,EM)是实体解析(Entity Resolution,ER)中的关键步骤。近年来,基于大语言模型(Large Language Models,LLMs)的实体匹配方法展现出巨大潜力。然而,当前基于LLM的实体匹配方法通常遵循二元匹配范式,忽略了记录间关系的全局一致性。本文研究了多种融入记录交互的基于LLM的实体匹配方法,并从不同视角进行了探讨。具体而言,我们全面比较了三种代表性策略:匹配、比较和选择,并分析了它们在多样化场景中的各自优势与挑战。基于研究发现,我们进一步设计了一个复合实体匹配框架(ComEM),该框架利用多种策略与LLMs的组合。ComEM能够综合不同策略的优势,在效果与效率上均实现提升。在8个ER数据集和10个LLMs上的实验结果表明,通过选择策略融入记录交互具有优越性,且ComEM能带来进一步的成本效益提升。