A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems

Since the creation of the Web, recommender systems (RSs) have been an indispensable mechanism in information filtering. State-of-the-art RSs primarily depend on categorical features, which ecoded by embedding vectors, resulting in excessively large embedding tables. To prevent over-parameterized embedding tables from harming scalability, both academia and industry have seen increasing efforts in compressing RS embeddings. However, despite the prosperity of lightweight embedding-based RSs (LERSs), a wide diversity is seen in evaluation protocols, resulting in obstacles when relating LERS performance to real-world usability. Moreover, despite the common goal of lightweight embeddings, LERSs are evaluated with a single choice between the two main recommendation tasks -- collaborative filtering and content-based recommendation. This lack of discussions on cross-task transferability hinders the development of unified, more scalable solutions. Motivated by these issues, this study investigates various LERSs' performance, efficiency, and cross-task transferability via a thorough benchmarking process. Additionally, we propose an efficient embedding compression method using magnitude pruning, which is an easy-to-deploy yet highly competitive baseline that outperforms various complex LERSs. Our study reveals the distinct performance of LERSs across the two tasks, shedding light on their effectiveness and generalizability. To support edge-based recommendations, we tested all LERSs on a Raspberry Pi 4, where the efficiency bottleneck is exposed. Finally, we conclude this paper with critical summaries of LERS performance, model selection suggestions, and underexplored challenges around LERSs for future research. To encourage future research, we publish source codes and artifacts at \href{this link}{https://github.com/chenxing1999/recsys-benchmark}.

翻译：自互联网诞生以来，推荐系统一直是信息过滤不可或缺的机制。当前最先进的推荐系统主要依赖于通过嵌入向量编码的类别特征，这导致了过大的嵌入表。为防止过度参数化的嵌入表损害可扩展性，学术界和工业界在压缩推荐系统嵌入方面投入了越来越多的努力。然而，尽管轻量级基于嵌入的推荐系统蓬勃发展，但评估协议存在广泛差异，导致在关联轻量级基于嵌入的推荐系统性能与实际可用性时遇到障碍。此外，尽管轻量级嵌入具有共同目标，但轻量级基于嵌入的推荐系统的评估仅针对两个主要推荐任务——协同过滤和基于内容的推荐——中的单一选择进行。这种对跨任务可迁移性讨论的缺乏阻碍了统一、更具可扩展性解决方案的发展。受这些问题驱动，本研究通过全面的基准测试过程，调查了各种轻量级基于嵌入的推荐系统的性能、效率和跨任务可迁移性。此外，我们提出了一种使用幅度剪枝的高效嵌入压缩方法，这是一种易于部署且极具竞争力的基线，其性能优于各种复杂的轻量级基于嵌入的推荐系统。我们的研究揭示了轻量级基于嵌入的推荐系统在这两个任务上的不同表现，阐明了其有效性和泛化能力。为支持基于边缘设备的推荐，我们在树莓派4上测试了所有轻量级基于嵌入的推荐系统，从而暴露了效率瓶颈。最后，我们以对轻量级基于嵌入的推荐系统性能的关键总结、模型选择建议以及未来研究中围绕轻量级基于嵌入的推荐系统尚未充分探索的挑战作为本文的结论。为鼓励未来研究，我们在 \href{此链接}{https://github.com/chenxing1999/recsys-benchmark} 发布了源代码和相关资料。