From-scratch name disambiguation is an essential task for establishing a reliable foundation for academic platforms. It involves partitioning documents authored by identically named individuals into groups representing distinct real-life experts. Canonically, the process is divided into two decoupled tasks: locally estimating the pairwise similarities between documents followed by globally grouping these documents into appropriate clusters. However, such a decoupled approach often inhibits optimal information exchange between these intertwined tasks. Therefore, we present BOND, which bootstraps the local and global informative signals to promote each other in an end-to-end regime. Specifically, BOND harnesses local pairwise similarities to drive global clustering, subsequently generating pseudo-clustering labels. These global signals further refine local pairwise characterizations. The experimental results establish BOND's superiority, outperforming other advanced baselines by a substantial margin. Moreover, an enhanced version, BOND+, incorporating ensemble and post-match techniques, rivals the top methods in the WhoIsWho competition.
翻译:从零开始姓名消歧是为学术平台建立可靠基础的关键任务,其目标是将同名作者撰写的文档划分至代表不同真实领域专家的组别中。通常,该过程分为两个解耦步骤:局部估计文档间的成对相似性,再全局地将文档分组至恰当聚类中。然而,这种解耦方法往往抑制了这两个相互关联任务之间的最优信息交换。为此,我们提出BOND方法,通过端到端机制将局部与全局信息信号相互引导促进。具体而言,BOND利用局部成对相似性驱动全局聚类,进而生成伪聚类标签;这些全局信号进一步优化局部成对特征的刻画。实验结果表明BOND具有显著优势,以较大幅度超越其他先进基线方法。此外,增强版本BOND+融合集成与后匹配技术,其性能可与WhoIsWho竞赛中的顶尖方法相媲美。