Researchers analyze coauthorship networks, but author name ambiguity in their network data remains a significant challenge as it can change the number of vertices, distorting network properties. Although many scholars use straightforward heuristics for author name disambiguation using author's forename initials, these techniques can skew our understanding of network properties by merging or splitting vertices, raising concerns about the reliability and validity of these methods. This study investigates how different levels of vertex merging and splitting errors that are induced by name ambiguity impact network measures, using three large coauthorship networks with highly accurate algorithmic author name disambiguation. As a counterfactual scenario, two initial-based disambiguation methods widely used in coauthorship network research were applied to these datasets. Nine coauthorship network metrics were computed while varying randomly the numbers of merged or split vertices. Results show that initial-based disambiguation generates coauthorship networks with specific network properties underestimated, leading to the discovery of coauthorship networks that are smaller and more closely connected than they genuinely are. In contrast, other network metric values increase, making authors appear more collaborative and embedded within less fragmented research communities than they are. The study emphasizes the importance of careful disambiguation of vertex names in analyzing coauthorship networks for rigorous and valid findings.
翻译:研究人员分析合著网络时,作者姓名的歧义性对网络数据构成的重大挑战在于,它可能改变顶点数量,从而扭曲网络属性。尽管许多学者使用基于作者名字首字母的简单启发式方法进行作者姓名消歧,但这些技术可能通过合并或分裂顶点,扭曲我们对网络属性的理解,引发对这些方法可靠性和有效性的担忧。本研究利用三个经过高精度算法消歧的大型合著网络,探讨由姓名歧义引发的不同程度顶点合并与分裂误差对网络度量的影响。作为反事实场景,本研究将合著网络研究中广泛使用的两种基于首字母的消歧方法应用于这些数据集,并在随机改变合并或分裂顶点数量的条件下,计算了九项合著网络度量指标。结果表明,基于首字母的消歧方法会生成低估特定网络属性的合著网络,从而发现比真实情况更小、连接更紧密的合著网络。相反,其他网络度量值则会增加,使得作者看起来比实际情况更具合作性,且嵌入在碎片化程度更低的研究社群中。本研究强调了在分析合著网络时,对顶点名称进行仔细消歧对于获得严谨有效发现的重要性。