Learning Compact Compositional Embeddings via Regularized Pruning for Recommendation

Latent factor models are the dominant backbones of contemporary recommender systems (RSs) given their performance advantages, where a unique vector embedding with a fixed dimensionality (e.g., 128) is required to represent each entity (commonly a user/item). Due to the large number of users and items on e-commerce sites, the embedding table is arguably the least memory-efficient component of RSs. For any lightweight recommender that aims to efficiently scale with the growing size of users/items or to remain applicable in resource-constrained settings, existing solutions either reduce the number of embeddings needed via hashing, or sparsify the full embedding table to switch off selected embedding dimensions. However, as hash collision arises or embeddings become overly sparse, especially when adapting to a tighter memory budget, those lightweight recommenders inevitably have to compromise their accuracy. To this end, we propose a novel compact embedding framework for RSs, namely Compositional Embedding with Regularized Pruning (CERP). Specifically, CERP represents each entity by combining a pair of embeddings from two independent, substantially smaller meta-embedding tables, which are then jointly pruned via a learnable element-wise threshold. In addition, we innovatively design a regularized pruning mechanism in CERP, such that the two sparsified meta-embedding tables are encouraged to encode information that is mutually complementary. Given the compatibility with agnostic latent factor models, we pair CERP with two popular recommendation models for extensive experiments, where results on two real-world datasets under different memory budgets demonstrate its superiority against state-of-the-art baselines. The codebase of CERP is available in https://github.com/xurong-liang/CERP.

翻译：潜在因子模型因性能优势而成为当代推荐系统（RSs）的主要骨干，每个实体（通常为用户/商品）需用固定维度（如128维）的唯一向量嵌入表示。由于电子商务网站用户和商品数量庞大，嵌入表无疑是推荐系统中内存效率最低的组件。对于旨在随用户/商品规模增长高效扩展或适用于资源受限场景的轻量级推荐系统，现有解决方案要么通过哈希减少嵌入数量，要么稀疏化完整嵌入表以关闭特定嵌入维度。然而，当哈希冲突出现或嵌入变得过度稀疏时（尤其在适应更严格的内存预算时），这些轻量级推荐系统不可避免地需牺牲准确性。为此，我们提出一种新型紧凑嵌入框架——正则化剪枝组合嵌入（CERP）。具体而言，CERP通过组合两个独立且规模大幅减小的元嵌入表中的一对嵌入来表示每个实体，随后通过可学习的元素级阈值对二者进行联合剪枝。此外，我们创新性地在CERP中设计了正则化剪枝机制，促使两个稀疏化的元嵌入表编码相互补充的信息。鉴于其与无关潜在因子模型的兼容性，我们将CERP与两种主流推荐模型配对进行广泛实验，在两种真实数据集上不同内存预算下的结果表明其相较于最先进基线的优越性。CERP的代码库发布于https://github.com/xurong-liang/CERP。