Entity resolution (ER) is the problem of identifying and linking database records that refer to the same real-world entity. Traditional ER methods use batch processing, which becomes impractical with growing data volumes due to high computational costs and lack of real-time capabilities. In many applications, users need to resolve entities for only a small portion of their data, making full data processing unnecessary -- a scenario known as "ER-on-demand". This paper proposes FastER, an efficient ER-on-demand framework for property graphs. Our approach uses graph differential dependencies (GDDs) as a knowledge encoding language to design effective filtering mechanisms that leverage both structural and attribute semantics of graphs. We construct a blocking graph from filtered subgraphs to reduce the number of candidate entity pairs requiring comparison. Additionally, FastER incorporates Progressive Profile Scheduling (PPS), allowing the system to incrementally produce results throughout the resolution process. Extensive evaluations on multiple benchmark datasets demonstrate that FastER significantly outperforms state-of-the-art ER methods in computational efficiency and real-time processing for on-demand tasks while ensuring reliability. We make FastER publicly available at: https://anonymous.4open.science/r/On_Demand_Entity_Resolution-9DFB
翻译:实体解析(ER)是指识别并关联数据库中指向同一现实世界实体的记录。传统ER方法采用批处理方式,随着数据规模增长,其高计算成本与缺乏实时处理能力的缺陷使其难以实际应用。在许多应用场景中,用户仅需对数据中的小部分进行实体解析,无需全量数据处理——这种场景称为“按需ER”。本文提出FastER,一种面向属性图的高效按需ER框架。该方法采用图差分依赖(GDDs)作为知识编码语言,设计出能同时利用图结构与属性语义的高效过滤机制。我们通过过滤后的子图构建分块图,以减少需要比较的候选实体对数量。此外,FastER引入渐进式特征调度(PPS)机制,使系统能在解析过程中持续增量输出结果。在多个基准数据集上的大量实验表明,对于按需任务,FastER在计算效率与实时处理能力方面显著优于现有先进ER方法,同时保证了解析可靠性。FastER已在以下地址开源:https://anonymous.4open.science/r/On_Demand_Entity_Resolution-9DFB