A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems

Since the creation of the Web, recommender systems (RSs) have been an indispensable mechanism in information filtering. State-of-the-art RSs primarily depend on categorical features, which ecoded by embedding vectors, resulting in excessively large embedding tables. To prevent over-parameterized embedding tables from harming scalability, both academia and industry have seen increasing efforts in compressing RS embeddings. However, despite the prosperity of lightweight embedding-based RSs (LERSs), a wide diversity is seen in evaluation protocols, resulting in obstacles when relating LERS performance to real-world usability. Moreover, despite the common goal of lightweight embeddings, LERSs are evaluated with a single choice between the two main recommendation tasks -- collaborative filtering and content-based recommendation. This lack of discussions on cross-task transferability hinders the development of unified, more scalable solutions. Motivated by these issues, this study investigates various LERSs' performance, efficiency, and cross-task transferability via a thorough benchmarking process. Additionally, we propose an efficient embedding compression method using magnitude pruning, which is an easy-to-deploy yet highly competitive baseline that outperforms various complex LERSs. Our study reveals the distinct performance of LERSs across the two tasks, shedding light on their effectiveness and generalizability. To support edge-based recommendations, we tested all LERSs on a Raspberry Pi 4, where the efficiency bottleneck is exposed. Finally, we conclude this paper with critical summaries of LERS performance, model selection suggestions, and underexplored challenges around LERSs for future research. To encourage future research, we publish source codes and artifacts at \href{this link}{https://github.com/chenxing1999/recsys-benchmark}.

翻译：自互联网诞生以来，推荐系统一直是信息过滤中不可或缺的机制。当前最先进的推荐系统主要依赖于分类特征，这些特征通过嵌入向量进行编码，导致嵌入表规模过大。为防止过度参数化的嵌入表损害可扩展性，学术界和工业界在压缩推荐系统嵌入方面投入了越来越多的努力。然而，尽管轻量级基于嵌入的推荐系统蓬勃发展，但评估协议存在广泛差异，导致在关联轻量级嵌入推荐系统性能与实际可用性时遇到障碍。此外，尽管轻量级嵌入具有共同目标，但现有研究仅在协同过滤和基于内容的推荐这两个主要推荐任务中选择单一任务进行评估。这种跨任务可迁移性讨论的缺乏，阻碍了统一、更具可扩展性解决方案的发展。受这些问题驱动，本研究通过全面的基准测试流程，调查了各种轻量级嵌入推荐系统的性能、效率及跨任务可迁移性。此外，我们提出了一种基于幅度剪枝的高效嵌入压缩方法，该方法易于部署且具有高度竞争力，其性能优于多种复杂的轻量级嵌入推荐系统。我们的研究揭示了轻量级嵌入推荐系统在两项任务中的差异化表现，阐明了其有效性和泛化能力。为支持边缘端推荐，我们在树莓派4上测试了所有轻量级嵌入推荐系统，暴露了效率瓶颈。最后，本文以对轻量级嵌入推荐系统性能的关键总结、模型选择建议以及未来研究中尚未充分探索的挑战作为结语。为促进后续研究，我们在 \href{此链接}{https://github.com/chenxing1999/recsys-benchmark} 公开了源代码及相关成果。