Learning Compact Compositional Embeddings via Regularized Pruning for Recommendation

Latent factor models are the dominant backbones of contemporary recommender systems (RSs) given their performance advantages, where a unique vector embedding with a fixed dimensionality (e.g., 128) is required to represent each entity (commonly a user/item). Due to the large number of users and items on e-commerce sites, the embedding table is arguably the least memory-efficient component of RSs. For any lightweight recommender that aims to efficiently scale with the growing size of users/items or to remain applicable in resource-constrained settings, existing solutions either reduce the number of embeddings needed via hashing, or sparsify the full embedding table to switch off selected embedding dimensions. However, as hash collision arises or embeddings become overly sparse, especially when adapting to a tighter memory budget, those lightweight recommenders inevitably have to compromise their accuracy. To this end, we propose a novel compact embedding framework for RSs, namely Compositional Embedding with Regularized Pruning (CERP). Specifically, CERP represents each entity by combining a pair of embeddings from two independent, substantially smaller meta-embedding tables, which are then jointly pruned via a learnable element-wise threshold. In addition, we innovatively design a regularized pruning mechanism in CERP, such that the two sparsified meta-embedding tables are encouraged to encode information that is mutually complementary. Given the compatibility with agnostic latent factor models, we pair CERP with two popular recommendation models for extensive experiments, where results on two real-world datasets under different memory budgets demonstrate its superiority against state-of-the-art baselines. The codebase of CERP is available in https://github.com/xurong-liang/CERP.

翻译：潜在因子模型因其性能优势而成为当代推荐系统的主导骨干，其中每个实体（通常是用户/物品）需要具有固定维度（例如128维）的唯一向量嵌入。由于电商网站上的用户和物品数量庞大，嵌入表无疑是推荐系统中内存效率最低的组件。对于任何旨在高效扩展以适应不断增长的用户/物品数量或在资源受限环境中保持适用性的轻量级推荐系统而言，现有解决方案要么通过哈希减少所需嵌入数量，要么稀疏化完整嵌入表以关闭选定的嵌入维度。然而，当哈希冲突出现或嵌入变得过于稀疏（尤其是在适应更严格的内存预算时），这些轻量级推荐系统不可避免地需要妥协其准确性。为此，我们提出了一种新颖的推荐系统紧凑嵌入框架，即带正则化剪枝的组合嵌入（CERP）。具体来说，CERP通过组合两个独立且规模显著更小的元嵌入表中的一对嵌入来表征每个实体，然后通过可学习的逐元素阈值对其进行联合剪枝。此外，我们创新性地在CERP中设计了正则化剪枝机制，使得两个稀疏化的元嵌入表被鼓励编码相互补充的信息。鉴于与不可知潜在因子模型的兼容性，我们将CERP与两种流行的推荐模型配对进行广泛实验，在两种不同内存预算下的真实世界数据集上的结果表明其相对于最先进基线的优越性。CERP的代码库可在https://github.com/xurong-liang/CERP 获取。