In the academic world, the number of scientists grows every year and so does the number of authors sharing the same names. Consequently, it challenging to assign newly published papers to their respective authors. Therefore, Author Name Ambiguity (ANA) is considered a critical open problem in digital libraries. This paper proposes an Author Name Disambiguation (AND) approach that links author names to their real-world entities by leveraging their co-authors and domain of research. To this end, we use data collected from the DBLP repository that contains more than 5 million bibliographic records authored by around 2.6 million co-authors. Our approach first groups authors who share the same last names and same first name initials. The author within each group is identified by capturing the relation with his/her co-authors and area of research, represented by the titles of the validated publications of the corresponding author. To this end, we train a neural network model that learns from the representations of the co-authors and titles. We validated the effectiveness of our approach by conducting extensive experiments on a large dataset.
翻译:在学术界,科研人员数量逐年增长,同名作者的数量也随之增加。因此,将新发表的论文准确归属到对应的作者成为一项挑战。作者姓名歧义(Author Name Ambiguity, ANA)被认为是数字图书馆领域的一个关键开放问题。本文提出了一种作者姓名消歧(Author Name Disambiguation, AND)方法,通过利用作者的合著者和研究领域,将作者姓名与其真实世界实体相关联。为此,我们使用了从DBLP数据库中收集的数据,该数据库包含超过500万条书目记录,涉及约260万位合著者。该方法首先将姓氏和名字首字母相同的作者进行分组。随后,通过捕捉作者与其合著者及研究领域(由该作者已发表论文的标题表征)之间的关系来识别每组中的具体作者。为此,我们训练了一个神经网络模型,该模型从合著者和标题的表示中进行学习。通过在大规模数据集上开展广泛实验,我们验证了该方法的有效性。