Entity resolution (ER) is a critical task in data management which identifies whether multiple records refer to the same real-world entity. Despite its significance across domains such as healthcare, finance, and machine learning, implementing effective ER systems remains challenging due to the abundance of methodologies and tools, leading to a paradox of choice for practitioners. This paper proposes Resolvi, a reference architecture aimed at enhancing extensibility, interoperability, and scalability in ER systems. By analyzing existing ER frameworks and literature, we establish a structured approach to designing ER solutions that address common challenges. Additionally, we explore best practices for system implementation and deployment strategies to facilitate largescale entity resolution. Through this work, we aim to provide a foundational blueprint that assists researchers and practitioners in developing robust, scalable ER systems while reducing the complexity of architectural decisions.
翻译:实体解析(Entity Resolution,ER)是数据管理中的关键任务,旨在识别多条记录是否指向同一现实世界实体。尽管其在医疗、金融和机器学习等多个领域具有重要意义,但由于方法学和工具的多样性,实现高效的ER系统仍具挑战性,这为实践者带来了选择困境。本文提出Resolvi,一种旨在提升ER系统可扩展性、互操作性和可扩展性的参考架构。通过分析现有ER框架及相关文献,我们建立了一种结构化的ER解决方案设计方法,以应对常见挑战。此外,我们探讨了系统实现的最佳实践与部署策略,以促进大规模实体解析。通过这项工作,我们旨在提供一个基础蓝图,帮助研究者和实践者开发鲁棒、可扩展的ER系统,同时降低架构决策的复杂性。