The heterogeneous information network (HIN), which contains rich semantics depicted by meta-paths, has emerged as a potent tool for mitigating data sparsity in recommender systems. Existing HIN-based recommender systems operate under the assumption of centralized storage and model training. However, real-world data is often distributed due to privacy concerns, leading to the semantic broken issue within HINs and consequent failures in centralized HIN-based recommendations. In this paper, we suggest the HIN is partitioned into private HINs stored on the client side and shared HINs on the server. Following this setting, we propose a federated heterogeneous graph neural network (FedHGNN) based framework, which facilitates collaborative training of a recommendation model using distributed HINs while protecting user privacy. Specifically, we first formalize the privacy definition for HIN-based federated recommendation (FedRec) in the light of differential privacy, with the goal of protecting user-item interactions within private HIN as well as users' high-order patterns from shared HINs. To recover the broken meta-path based semantics and ensure proposed privacy measures, we elaborately design a semantic-preserving user interactions publishing method, which locally perturbs user's high-order patterns and related user-item interactions for publishing. Subsequently, we introduce an HGNN model for recommendation, which conducts node- and semantic-level aggregations to capture recovered semantics. Extensive experiments on four datasets demonstrate that our model outperforms existing methods by a substantial margin (up to 34% in HR@10 and 42% in NDCG@10) under a reasonable privacy budget.
翻译:异构信息网络(HIN)通过元路径所刻画的丰富语义,已成为缓解推荐系统中数据稀疏问题的有力工具。现有基于HIN的推荐系统假设在集中存储和模型训练的条件下运行。然而,由于隐私问题,现实世界中的数据往往分布存储,导致HIN中出现语义断裂问题,进而使得基于集中式HIN的推荐失效。本文中,我们提出将HIN划分为存储在客户端侧的私有HIN和服务器端的共享HIN。基于这一设定,我们提出一个基于联邦异构图神经网络(FedHGNN)的框架,该框架能够利用分布式HIN协作训练推荐模型,同时保护用户隐私。具体而言,我们首先依据差分隐私原理,为基于HIN的联邦推荐(FedRec)正式定义了隐私约束,旨在保护私有HIN中的用户-物品交互以及共享HIN中用户的高阶模式。为恢复断裂的元路径语义并确保所提出的隐私措施,我们精心设计了一种语义保持的用户交互发布方法,该方法在本地扰动用户的高阶模式及相关的用户-物品交互后进行发布。随后,我们引入了一个用于推荐的HGNN模型,通过节点级和语义级聚合来捕捉恢复后的语义。在四个数据集上的大量实验表明,在合理的隐私预算下,我们的模型以显著优势(HR@10最高提升34%,NDCG@10最高提升42%)超越了现有方法。