E2E-GRec: An End-to-End Joint Training Framework for Graph Neural Networks and Recommender Systems

Graph Neural Networks (GNNs) have emerged as powerful tools for modeling graph-structured data and have been widely used in recommender systems, such as for capturing complex user-item and item-item relations. However, most industrial deployments adopt a two-stage pipeline: GNNs are first pre-trained offline to generate node embeddings, which are then used as static features for downstream recommender systems. This decoupled paradigm leads to two key limitations: (1) high computational overhead, since large-scale GNN inference must be repeatedly executed to refresh embeddings; and (2) lack of joint optimization, as the gradient from the recommender system cannot directly influence the GNN learning process, causing the GNN to be suboptimally informative for the recommendation task. In this paper, we propose E2E-GRec, a novel end-to-end training framework that unifies GNN training with the recommender system. Our framework is characterized by three key components: (i) efficient subgraph sampling from a large-scale cross-domain heterogeneous graph to ensure training scalability and efficiency; (ii) a Graph Feature Auto-Encoder (GFAE) serving as an auxiliary self-supervised task to guide the GNN to learn structurally meaningful embeddings; and (iii) a two-level feature fusion mechanism combined with Gradnorm-based dynamic loss balancing, which stabilizes graph-aware multi-task end-to-end training. Extensive offline evaluations, online A/B tests (e.g., a +0.133% relative improvement in stay duration, a 0.3171% reduction in the average number of videos a user skips) on large-scale production data, together with theoretical analysis, demonstrate that E2E-GRec consistently surpasses traditional approaches, yielding significant gains across multiple recommendation metrics.

翻译：图神经网络（GNNs）已成为建模图结构数据的强大工具，并广泛应用于推荐系统中，例如用于捕捉复杂的用户-物品与物品-物品关系。然而，大多数工业部署采用两阶段流水线：首先离线预训练GNN以生成节点嵌入，随后将其作为静态特征用于下游推荐系统。这种解耦范式导致两个关键局限：（1）高计算开销，因为必须重复执行大规模GNN推理以更新嵌入；（2）缺乏联合优化，由于推荐系统的梯度无法直接影响GNN学习过程，导致GNN对推荐任务的信息表达能力欠佳。本文提出E2E-GRec，一种新颖的端到端训练框架，将GNN训练与推荐系统统一起来。我们的框架具有三个关键组成部分：（i）从大规模跨域异质图中进行高效子图采样，以确保训练的可扩展性与效率；（ii）图特征自动编码器（GFAE）作为辅助自监督任务，引导GNN学习具有结构意义的嵌入；（iii）结合基于Gradnorm的动态损失平衡的双层特征融合机制，稳定了图感知多任务端到端训练。在大规模生产数据上进行的大量离线评估、在线A/B测试（例如停留时长相对提升+0.133%，用户跳过视频的平均数量减少0.3171%）及理论分析表明，E2E-GRec持续超越传统方法，在多项推荐指标上均取得显著增益。