Entity resolution (ER) is the problem of identifying and linking database records that refer to the same real-world entity. Traditional ER methods use batch processing, which becomes impractical with growing data volumes due to high computational costs and lack of real-time capabilities. In many applications, users need to resolve entities for only a small portion of their data, making full data processing unnecessary -- a scenario known as "ER-on-demand". This paper proposes FastER, an efficient ER-on-demand framework for property graphs. Our approach uses graph differential dependencies (GDDs) as a knowledge encoding language to design effective filtering mechanisms that leverage both structural and attribute semantics of graphs. We construct a blocking graph from filtered subgraphs to reduce the number of candidate entity pairs requiring comparison. Additionally, FastER incorporates Progressive Profile Scheduling (PPS), allowing the system to incrementally produce results throughout the resolution process. Extensive evaluations on multiple benchmark datasets demonstrate that FastER significantly outperforms state-of-the-art ER methods in computational efficiency and real-time processing for on-demand tasks while ensuring reliability. We make FastER publicly available at: https://anonymous.4open.science/r/On_Demand_Entity_Resolution-9DFB
翻译:实体解析(Entity Resolution, ER)是指识别并关联数据库中指向同一现实世界实体的记录的问题。传统ER方法采用批处理方式,随着数据量增长,其高计算成本和缺乏实时能力的缺点使其变得不切实际。在许多应用中,用户仅需对其数据的一小部分进行实体解析,这使得全数据处理变得不必要——这种场景被称为“按需ER”。本文提出FastER,一种针对属性图的高效按需ER框架。我们的方法使用图差分依赖(Graph Differential Dependencies, GDDs)作为知识编码语言,设计出能够同时利用图的结构与属性语义的有效过滤机制。我们通过过滤后的子图构建阻塞图,以减少需要比较的候选实体对数量。此外,FastER集成了渐进式轮廓调度(Progressive Profile Scheduling, PPS),使得系统能够在整个解析过程中逐步产生结果。在多个基准数据集上的广泛评估表明,FastER在按需任务的计算效率和实时处理方面显著优于当前最先进的ER方法,同时确保了可靠性。FastER已在以下网址公开提供:https://anonymous.4open.science/r/On_Demand_Entity_Resolution-9DFB