Entity matching (EM) is a critical step in entity resolution. Recently, entity matching based on large language models (LLMs) has shown great promise. However, current LLM-based entity matching approaches typically follow a binary matching paradigm that ignores the global consistency between different records. In this paper, we investigate various methodologies for LLM-based entity matching that incorporate record interactions from different perspectives. Specifically, we comprehensively compare three representative strategies: matching, comparing, and selecting, and analyze their respective advantages and challenges in diverse scenarios. Based on our findings, we further design a compositional entity matching (ComEM) framework that leverages the composition of multiple strategies and LLMs. In this way, ComEM can benefit from the advantages of different sides and achieve improvements in both effectiveness and efficiency. Experimental results show that ComEM not only achieves significant performance gains on various datasets but also reduces the cost of LLM-based entity matching in real-world application.
翻译:实体匹配(Entity Matching,EM)是实体解析中的关键步骤。近年来,基于大型语言模型(LLMs)的实体匹配展现出巨大潜力。然而,当前基于LLM的实体匹配方法通常遵循二元匹配范式,忽略了不同记录之间的全局一致性。本文研究了多种基于LLM的实体匹配方法,这些方法从不同角度融入了记录间的交互作用。具体而言,我们全面比较了三种代表性策略:匹配、比较和选择,并分析了它们在多样化场景中的各自优势与挑战。基于研究结果,我们进一步设计了一个组合式实体匹配(Compositional Entity Matching,ComEM)框架,该框架利用多种策略与LLM的组合。通过这种方式,ComEM能够综合不同策略的优势,在效果和效率两方面均实现提升。实验结果表明,ComEM不仅在多个数据集上取得了显著的性能增益,同时降低了实际应用中基于LLM的实体匹配成本。