Content-based Recommender Systems (CRSs) play a crucial role in shaping user experiences in e-commerce, online advertising, and personalized recommendations. However, due to the vast amount of categorical features, the embedding tables used in CRS models pose a significant storage bottleneck for real-world deployment, especially on resource-constrained devices. To address this problem, various embedding pruning methods have been proposed, but most existing ones require expensive retraining steps for each target parameter budget, leading to enormous computation costs. In reality, this computation cost is a major hurdle in real-world applications with diverse storage requirements, such as federated learning and streaming settings. In this paper, we propose Shapley Value-guided Embedding Reduction (Shaver) as our response. With Shaver, we view the problem from a cooperative game perspective, and quantify each embedding parameter's contribution with Shapley values to facilitate contribution-based parameter pruning. To address the inherently high computation costs of Shapley values, we propose an efficient and unbiased method to estimate Shapley values of a CRS's embedding parameters. Moreover, in the pruning stage, we put forward a field-aware codebook to mitigate the information loss in the traditional zero-out treatment. Through extensive experiments on three real-world datasets, Shaver has demonstrated competitive performance with lightweight recommendation models across various parameter budgets. The source code is available at https://anonymous.4open.science/r/shaver-E808
翻译:内容推荐系统(CRSs)在电子商务、在线广告和个性化推荐中对于塑造用户体验起着至关重要的作用。然而,由于类别特征数量庞大,CRS模型中使用的嵌入表在实际部署,尤其是在资源受限的设备上,构成了显著的存储瓶颈。为解决此问题,已有多种嵌入剪枝方法被提出,但大多数现有方法需要为每个目标参数量预算进行昂贵的重训练步骤,导致巨大的计算成本。实际上,这种计算成本是现实应用中面临多样化存储需求(如联邦学习和流式设置)时的主要障碍。本文中,我们提出了Shapley值引导的嵌入缩减(Shaver)作为我们的解决方案。通过Shaver,我们从合作博弈的视角审视该问题,并利用Shapley值量化每个嵌入参数的贡献,以促进基于贡献的参数剪枝。为应对Shapley值固有的高计算成本,我们提出了一种高效且无偏的方法来估计CRS嵌入参数的Shapley值。此外,在剪枝阶段,我们提出了一种字段感知码本,以减轻传统归零处理中的信息损失。通过在三个真实世界数据集上的大量实验,Shaver在各种参数量预算下均展示了与轻量级推荐模型相竞争的性能。源代码可在 https://anonymous.4open.science/r/shaver-E808 获取。