Resolvi：一种可扩展、可伸缩且可互操作的实体解析参考架构 (Resolvi: A Reference Architecture for Extensible, Scalable and Interoperable Entity Resolution)

Context: Entity resolution (ER) plays a pivotal role in data management by determining whether multiple records correspond to the same real-world entity. Because of its critical importance across domains such as healthcare, finance, and machine learning and its long research history designing and implementing ER systems remains challenging in practice due to the wide array of methodologies and tools available. This diversity results in a paradox of choice for practitioners, which is further compounded by the various ER variants (record linkage, entity alignment, merge/purge, a.s.o). Objective: This paper introduces Resolvi, a reference architecture for facilitating the design of ER systems. The goal is to facilitate creating extensible, interoperable and scalable ER systems and to reduce architectural decision-making duration. Methods: Software design techniques such as the 4+1 view model or visual communication tools such as UML are used to present the reference architecture in a structured way. Source code analysis and literature review are used to derive the main elements of the reference architecture. Results: This paper identifies generic requirements and architectural qualities of ER systems. It provides design guidelines, patterns, and recommendations for creating extensible, scalable, and interoperable ER systems. Furthermore, it highlights implementation best practices and deployment strategies based on insights from existing systems. Conclusion: The proposed reference architecture offers a foundational blueprint for researchers and practitioners in developing extensible, interoperable, and scalable ER systems. Resolvi provides clear abstractions and design recommendations which simplify architecture decision making, whether designing new ER systems or improving existing designs.

翻译：背景：实体解析（ER）通过判断多条记录是否对应同一现实世界实体，在数据管理中发挥着关键作用。由于其在不同领域（如医疗保健、金融和机器学习）中的重要性以及其悠久的研究历史，设计和实现ER系统在实践中仍然具有挑战性，这主要源于可用方法和工具的多样性。这种多样性给实践者带来了选择悖论，而各种ER变体（记录链接、实体对齐、合并/清除等）进一步加剧了这一困境。目标：本文介绍Resolvi，一种用于促进ER系统设计的参考架构。其目标是促进创建可扩展、可互操作且可伸缩的ER系统，并减少架构决策的耗时。方法：采用软件设计技术（如4+1视图模型）和可视化通信工具（如UML）以结构化方式呈现参考架构。通过源代码分析和文献综述来推导参考架构的主要元素。结果：本文识别了ER系统的通用需求和架构质量。它为创建可扩展、可伸缩和可互操作的ER系统提供了设计指南、模式和建议。此外，基于对现有系统的深入分析，本文强调了最佳实现实践和部署策略。结论：所提出的参考架构为研究人员和实践者开发可扩展、可互操作且可伸缩的ER系统提供了基础蓝图。Resolvi提供了清晰的抽象和设计建议，简化了架构决策过程，无论是设计新的ER系统还是改进现有设计。