Feature-Indexed Federated Recommendation with Residual-Quantized Codebooks

Federated recommendation provides a privacy-preserving solution for training recommender systems without centralizing user interactions. However, existing methods follow an ID-indexed communication paradigm that transmit whole item embeddings between clients and the server, which has three major limitations: 1) consumes uncontrollable communication resources, 2) the uploaded item information cannot generalize to related non-interacted items, and 3) is sensitive to client noisy feedback. To solve these problems, it is necessary to fundamentally change the existing ID-indexed communication paradigm. Therefore, we propose a feature-indexed communication paradigm that transmits feature code embeddings as codebooks rather than raw item embeddings. Building on this paradigm, we present RQFedRec, which assigns each item a list of discrete code IDs via Residual Quantization (RQ)-Kmeans. Each client generates and trains code embeddings as codebooks based on discrete code IDs provided by the server, and the server collects and aggregates these codebooks rather than item embeddings. This design makes communication controllable since the codebooks could cover all items, enabling updates to propagate across related items in same code ID. In addition, since code embedding represents many items, which is more robust to a single noisy item. To jointly capture semantic and collaborative information, RQFedRec further adopts a collaborative-semantic dual-channel aggregation with a curriculum strategy that emphasizes semantic codes early and gradually increases the contribution of collaborative codes over training. Extensive experiments on real-world datasets demonstrate that RQFedRec consistently outperforms state-of-the-art federated recommendation baselines while significantly reducing communication overhead.

翻译：联邦推荐提供了一种保护隐私的解决方案，可在不集中用户交互数据的情况下训练推荐系统。然而，现有方法遵循ID索引的通信范式，在客户端与服务器之间传输完整的物品嵌入向量，这存在三个主要局限：1) 消耗不可控的通信资源；2) 上传的物品信息无法泛化到相关的未交互物品；3) 对客户端的噪声反馈敏感。为解决这些问题，有必要从根本上改变现有的ID索引通信范式。因此，我们提出了一种特征索引通信范式，该范式传输作为码本的特征编码嵌入，而非原始物品嵌入。基于此范式，我们提出了RQFedRec，它通过残差量化(RQ)-Kmeans为每个物品分配一个离散编码ID列表。每个客户端根据服务器提供的离散编码ID生成并训练作为码本的编码嵌入，服务器则收集并聚合这些码本而非物品嵌入。这种设计使得通信可控，因为码本可以覆盖所有物品，并使更新能够通过相同编码ID传播到相关物品。此外，由于编码嵌入代表多个物品，其对单个噪声物品的鲁棒性更强。为联合捕捉语义信息与协同信息，RQFedRec进一步采用协同-语义双通道聚合机制，并配合课程学习策略：在训练早期侧重语义编码，随后逐步增加协同编码的贡献。在真实数据集上的大量实验表明，RQFedRec在显著降低通信开销的同时，持续优于最先进的联邦推荐基线方法。