Text-attributed Graphs (TAGs) incorporate textual node attributes with graph structures to describe rich relational semantics. Recent efforts to integrate Graph Neural Networks (GNNs) and Large Language Models (LLMs) have shown promise for learning on TAGs, yet achieving well-aligned representations remains challenging. Prior studies largely rely on heuristics that perform coarse-grained matching. They lack sufficient constraints and ignore distributional alignment, leading to representation drift and limited generalization. Building on Energy-based Models (EBMs), we propose an Energy-based Representation Alignment (ERAlign) framework that projects GNN-encoded graph structure and LLM-derived text embeddings in a shared latent space to achieve distribution consistency. Concretely, layer-wise alignment is quantified by a distance metric and optimized via an EBM objective. By decreasing energy values, our framework yields well-aligned representations for downstream tasks. During training, we introduce Energy Discrepancy (ED) to avoid high sampling costs associated with intractable normalization. ED also carries theoretical guarantees of higher training efficiency and reduced energy landscape distortion. Empirical evaluations on eight TAG datasets demonstrate that ERAlign obtains state-of-the-art performance across varying levels of supervision and cross-task transfer scenarios.
翻译:文本属性图(TAGs)通过结合文本节点属性与图结构来描述丰富的语义关系。近年来,将图神经网络(GNNs)与大语言模型(LLMs)整合的研究在学习文本属性图方面展现出潜力,但实现充分对齐的表示仍面临挑战。现有研究大多依赖启发式方法进行粗粒度匹配,缺乏足够约束且忽视分布对齐,导致表示漂移及泛化能力受限。基于能量模型(EBMs),我们提出能量驱动表示对齐(ERAlign)框架,将GNN编码的图结构与LLM推导的文本嵌入投影至共享隐空间,以实现分布一致性。具体而言,层间对齐通过距离度量量化,并经由EBM目标函数优化。通过降低能量值,该框架可为下游任务生成良好对齐的表示。训练过程中,我们引入能量差异(ED)以规避因难处理的归一化带来的高采样成本。ED同时具备理论保障,可提升训练效率并减少能量景观畸变。在八个文本属性图数据集上的实验表明,ERAlign在不同监督程度及跨任务迁移场景下均取得最优性能。