Heterogeneous information network (HIN), which contains rich semantics depicted by meta-paths, has become a powerful tool to alleviate data sparsity in recommender systems. Existing HIN-based recommendations hold the data centralized storage assumption and conduct centralized model training. However, the real-world data is often stored in a distributed manner for privacy concerns, resulting in the failure of centralized HIN-based recommendations. In this paper, we suggest the HIN is partitioned into private HINs stored in the client side and shared HINs in the server. Following this setting, we propose a federated heterogeneous graph neural network (FedHGNN) based framework, which can collaboratively train a recommendation model on distributed HINs without leaking user privacy. Specifically, we first formalize the privacy definition in the light of differential privacy for HIN-based federated recommendation, which aims to protect user-item interactions of private HIN as well as user's high-order patterns from shared HINs. To recover the broken meta-path based semantics caused by distributed data storage and satisfy the proposed privacy, we elaborately design a semantic-preserving user interactions publishing method, which locally perturbs user's high-order patterns as well as related user-item interactions for publishing. After that, we propose a HGNN model for recommendation, which conducts node- and semantic-level aggregations to capture recovered semantics. Extensive experiments on three datasets demonstrate our model outperforms existing methods by a large margin (up to 34% in HR@10 and 42% in NDCG@10) under an acceptable privacy budget.
翻译:异质信息网络(HIN)通过元路径刻画丰富语义,已成为缓解推荐系统中数据稀疏问题的有效工具。现有基于HIN的推荐方法均假设数据集中存储并采用集中式模型训练。然而,现实场景中出于隐私保护考量,数据通常以分布式方式存储,导致集中式HIN推荐方法失效。本文提出将HIN划分为存储在客户端的私有HIN与服务器端的共享HIN。基于此设定,我们提出联邦异构图神经网络(FedHGNN)框架,可在不泄露用户隐私的前提下,对分布式HIN进行协同推荐模型训练。具体而言,我们首先基于差分隐私理论形式化定义了HIN联邦推荐的隐私保护目标,该定义既要保护私有HIN中的用户-物品交互信息,也要保护用户从共享HIN中提取的高阶模式。针对分布式存储导致的元路径语义断裂问题,并满足所提出的隐私保护要求,我们精心设计了一种语义保持的用户交互发布方法,在本地对用户高阶模式及相关用户-物品交互进行扰动后发布。随后提出面向推荐的异构图神经网络模型,通过节点级与语义级聚合操作来捕获恢复后的语义信息。在三个数据集上的大量实验表明,本模型在可接受的隐私预算下,性能显著优于现有方法(HR@10提升最高达34%,NDCG@10提升最高达42%)。