Graph Topology Information Enhanced Heterogeneous Graph Representation Learning

Real-world heterogeneous graphs are inherently noisy and usually not in the optimal graph structures for downstream tasks, which often adversely affects the performance of GRL models in downstream tasks. Although Graph Structure Learning (GSL) methods have been proposed to learn graph structures and downstream tasks simultaneously, existing methods are predominantly designed for homogeneous graphs, while GSL for heterogeneous graphs remains largely unexplored. Two challenges arise in this context. Firstly, the quality of the input graph structure has a more profound impact on GNN-based heterogeneous GRL models compared to their homogeneous counterparts. Secondly, most existing homogenous GRL models encounter memory consumption issues when applied directly to heterogeneous graphs. In this paper, we propose a novel Graph Topology learning Enhanced Heterogeneous Graph Representation Learning framework (ToGRL).ToGRL learns high-quality graph structures and representations for downstream tasks by incorporating task-relevant latent topology information. Specifically, a novel GSL module is first proposed to extract downstream task-related topology information from a raw graph structure and project it into topology embeddings. These embeddings are utilized to construct a new graph with smooth graph signals. This two-stage approach to GSL separates the optimization of the adjacency matrix from node representation learning to reduce memory consumption. Following this, a representation learning module takes the new graph as input to learn embeddings for downstream tasks. ToGRL also leverages prompt tuning to better utilize the knowledge embedded in learned representations, thus enhancing adaptability to downstream tasks. Extensive experiments on five real-world datasets show that our ToGRL outperforms state-of-the-art methods by a large margin.

翻译：现实世界中的异质图本质上存在噪声，且通常未处于适合下游任务的最优图结构状态，这往往会对下游任务中图表示学习（GRL）模型的性能产生不利影响。尽管已有图结构学习（GSL）方法被提出以同时学习图结构和下游任务，但现有方法主要针对同质图设计，而异质图的GSL研究仍鲜有探索。在此背景下出现两大挑战：首先，相比同质图，输入图结构质量对基于图神经网络的异质GRL模型影响更为显著；其次，现有大多数同质GRL模型直接应用于异质图时会遭遇内存消耗问题。本文提出了一种新颖的图拓扑学习增强的异质图表示学习框架（ToGRL）。ToGRL通过融合任务相关的潜在拓扑信息，为下游任务学习高质量的图结构与表示。具体而言，首先提出新型GSL模块，从原始图结构中提取下游任务相关的拓扑信息并将其投影为拓扑嵌入。这些嵌入被用于构建具有平滑图信号的新图。这种两阶段GSL方法将邻接矩阵优化与节点表示学习分离，从而降低内存消耗。随后，表示学习模块将新图作为输入，为下游任务学习嵌入表示。ToGRL还利用提示调优（prompt tuning）更好地挖掘学习表示中的知识，增强对下游任务的适应性。在五个真实数据集上的大量实验表明，我们的ToGRL以较大优势超越现有最优方法。