Federated recommendation systems employ federated learning techniques to safeguard user privacy by transmitting model parameters instead of raw user data between user devices and the central server. Nevertheless, the current federated recommender system faces challenges such as heterogeneity and personalization, model performance degradation, and communication bottleneck. Previous studies have attempted to address these issues, but none have been able to solve them simultaneously. In this paper, we propose a novel framework, named PerFedRec++, to enhance the personalized federated recommendation with self-supervised pre-training. Specifically, we utilize the privacy-preserving mechanism of federated recommender systems to generate two augmented graph views, which are used as contrastive tasks in self-supervised graph learning to pre-train the model. Pre-training enhances the performance of federated models by improving the uniformity of representation learning. Also, by providing a better initial state for federated training, pre-training makes the overall training converge faster, thus alleviating the heavy communication burden. We then construct a collaborative graph to learn the client representation through a federated graph neural network. Based on these learned representations, we cluster users into different user groups and learn personalized models for each cluster. Each user learns a personalized model by combining the global federated model, the cluster-level federated model, and its own fine-tuned local model. Experiments on three real-world datasets show that our proposed method achieves superior performance over existing methods.
翻译:联邦推荐系统采用联邦学习技术,通过传输模型参数而非原始用户数据来保护用户隐私。然而,当前的联邦推荐系统面临异质性与个性化、模型性能下降以及通信瓶颈等挑战。以往研究尝试解决这些问题,但均未能全面解决。本文提出一种名为PerFedRec++的新框架,利用自监督预训练增强个性化联邦推荐。具体而言,我们利用联邦推荐系统的隐私保护机制生成两种增强图视图,并将其作为自监督图学习中的对比任务进行模型预训练。预训练通过改进表示学习的均匀性来提升联邦模型性能;同时,通过为联邦训练提供更优初始状态,加速整体训练收敛,从而缓解沉重的通信负担。我们进一步构建协作图,通过联邦图神经网络学习客户端表示,并基于这些学习到的表示将用户聚类为不同用户组,为每个聚类学习个性化模型。每个用户通过结合全局联邦模型、聚类级联邦模型以及自身微调的局部模型,学习个性化模型。在三个真实数据集上的实验表明,所提方法在性能上优于现有方法。