Entity matching (EM), the task of identifying whether two descriptions refer to the same entity, is essential in data management. Traditional methods have evolved from rule-based to AI-driven approaches, yet current techniques using large language models (LLMs) often fall short due to their reliance on static knowledge and rigid, predefined prompts. In this paper, we introduce Libem, a compound AI system designed to address these limitations by incorporating a flexible, tool-oriented approach. Libem supports entity matching through dynamic tool use, self-refinement, and optimization, allowing it to adapt and refine its process based on the dataset and performance metrics. Unlike traditional solo-AI EM systems, which often suffer from a lack of modularity that hinders iterative design improvements and system optimization, Libem offers a composable and reusable toolchain. This approach aims to contribute to ongoing discussions and developments in AI-driven data management.
翻译:实体匹配(Entity Matching,EM)是识别两个描述是否指向同一实体的任务,在数据管理中至关重要。传统方法已从基于规则的方法演变为人工智能驱动的方法,然而当前使用大语言模型(LLM)的技术常常因其对静态知识和僵化、预定义提示的依赖而表现不足。本文介绍Libem,一种复合人工智能系统,旨在通过采用灵活的、工具导向的方法来解决这些限制。Libem通过动态工具使用、自我优化和性能调优支持实体匹配,使其能够根据数据集和性能指标自适应并优化其处理流程。与传统的单一人工智能实体匹配系统不同——这些系统常因缺乏模块化而阻碍迭代设计改进和系统优化——Libem提供了一个可组合且可复用的工具链。该方法旨在为人工智能驱动的数据管理领域的持续讨论和发展做出贡献。