Link prediction in heterogeneous networks is crucial for understanding the intricacies of network structures and forecasting their future developments. Traditional methodologies often face significant obstacles, including over-smoothing-wherein the excessive aggregation of node features leads to the loss of critical structural details-and a dependency on human-defined meta-paths, which necessitate extensive domain knowledge and can be inherently restrictive. These limitations hinder the effective prediction and analysis of complex heterogeneous networks. In response to these challenges, we propose the Contrastive Heterogeneous grAph Transformer (CHAT). CHAT introduces a novel sampling-based graph transformer technique that selectively retains nodes of interest, thereby obviating the need for predefined meta-paths. The method employs an innovative connection-aware transformer to encode node sequences and their interconnections with high fidelity, guided by a dual-faceted loss function specifically designed for heterogeneous network link prediction. Additionally, CHAT incorporates an ensemble link predictor that synthesizes multiple samplings to achieve enhanced prediction accuracy. We conducted comprehensive evaluations of CHAT using three distinct drug-target interaction (DTI) datasets. The empirical results underscore CHAT's superior performance, outperforming both general-task approaches and models specialized in DTI prediction. These findings substantiate the efficacy of CHAT in addressing the complex problem of link prediction in heterogeneous networks.
翻译:异质网络中的链路预测对于理解网络结构的复杂性及预测其未来演化至关重要。传统方法常面临显著障碍,包括过度平滑(即节点特征的过度聚合导致关键结构信息丢失)以及对人工定义元路径的依赖(这需要大量领域知识且具有固有局限性)。这些限制阻碍了对复杂异质网络的有效预测与分析。为应对这些挑战,我们提出对比异质图Transformer(CHAT)。CHAT引入了一种基于采样的新型图Transformer技术,能够选择性保留目标节点,从而无需预定义元路径。该方法采用创新的连接感知Transformer对节点序列及其互连关系进行高保真编码,并通过专为异质网络链路预测设计的双视角损失函数进行优化。此外,CHAT集成多采样结果的链路预测器,通过综合多次采样实现更高的预测精度。我们在三个不同的药物-靶点相互作用(DTI)数据集上对CHAT进行了全面评估。实证结果凸显了CHAT的卓越性能,其表现优于通用任务方法及DTI预测专用模型。这些发现证实了CHAT在解决异质网络链路预测这一复杂问题上的有效性。