The potential impact of a paper is often quantified by how many citations it will receive. However, most commonly used models may underestimate the influence of newly published papers over time, and fail to encapsulate this dynamics of citation network into the graph. In this study, we construct hierarchical and heterogeneous graphs for target papers with an annual perspective. The constructed graphs can record the annual dynamics of target papers' scientific context information. Then, a novel graph neural network, Hierarchical and Heterogeneous Contrastive Graph Learning Model (H2CGL), is proposed to incorporate heterogeneity and dynamics of the citation network. H2CGL separately aggregates the heterogeneous information for each year and prioritizes the highly-cited papers and relationships among references, citations, and the target paper. It then employs a weighted GIN to capture dynamics between heterogeneous subgraphs over years. Moreover, it leverages contrastive learning to make the graph representations more sensitive to potential citations. Particularly, co-cited or co-citing papers of the target paper with large citation gap are taken as hard negative samples, while randomly dropping low-cited papers could generate positive samples. Extensive experimental results on two scholarly datasets demonstrate that the proposed H2CGL significantly outperforms a series of baseline approaches for both previously and freshly published papers. Additional analyses highlight the significance of the proposed modules. Our codes and settings have been released on Github (https://github.com/ECNU-Text-Computing/H2CGL)
翻译:论文的潜在影响力通常通过其未来被引次数来量化。然而,大多数常用模型往往会随时间低估新发表论文的影响力,且未能将引文网络的动态变化纳入图结构中。本研究以年度视角为目标论文构建层次化与异质化图结构。所构建的图能够记录目标论文科学背景信息的年度动态变化。随后,本文提出一种新型图神经网络——层次化与异质化对比图学习模型(H2CGL),以融合引文网络的异质性与动态性。H2CGL分别聚合每一年的异质信息,优先处理高被引论文以及参考文献、施引文献与目标论文之间的关系。进而采用加权GIN捕捉多年间异质子图之间的动态变化。此外,模型利用对比学习使图表示对潜在被引更敏感。具体而言,将与目标论文共被引或共施引且被引差距较大的论文作为困难负样本,而随机丢弃低被引论文则可生成正样本。在两个学术数据集上的大量实验结果表明,所提出的H2CGL在针对既往发表论文与最新论文的任务中均显著优于一系列基线方法。附加分析进一步凸显了所提出模块的重要性。我们的代码与设置已发布在Github(https://github.com/ECNU-Text-Computing/H2CGL)。