Understanding and Scaling Collaborative Filtering Optimization from the Perspective of Matrix Rank

Collaborative Filtering (CF) methods dominate real-world recommender systems given their ability to learn high-quality, sparse ID-embedding tables that effectively capture user preferences. These tables scale linearly with the number of users and items, and are trained to ensure high similarity between embeddings of interacted user-item pairs, while maintaining low similarity for non-interacted pairs. Despite their high performance, encouraging dispersion for non-interacted pairs necessitates expensive regularization (e.g., negative sampling), hurting runtime and scalability. Existing research tends to address these challenges by simplifying the learning process, either by reducing model complexity or sampling data, trading performance for runtime. In this work, we move beyond model-level modifications and study the properties of the embedding tables under different learning strategies. Through theoretical analysis, we find that the singular values of the embedding tables are intrinsically linked to different CF loss functions. These findings are empirically validated on real-world datasets, demonstrating the practical benefits of higher stable rank, a continuous version of matrix rank which encodes the distribution of singular values. Based on these insights, we propose an efficient warm-start strategy that regularizes the stable rank of the user and item embeddings. We show that stable rank regularization during early training phases can promote higher-quality embeddings, resulting in training speed improvements of up to 66%. Additionally, stable rank regularization can act as a proxy for negative sampling, allowing for performance gains of up to 21% over loss functions with small negative sampling ratios. Overall, our analysis unifies current CF methods under a new perspective, their optimization of stable rank, motivating a flexible regularization method.

翻译：协同过滤（CF）方法因其能够学习高质量、稀疏的ID嵌入表而主导现实世界的推荐系统，这些嵌入表能有效捕捉用户偏好。这些表的规模随用户和物品数量线性增长，并通过训练确保交互过的用户-物品对之间的嵌入具有高相似性，同时保持非交互对之间的低相似性。尽管性能优异，但鼓励非交互对的分散性需要昂贵的正则化（例如负采样），损害了运行时间和可扩展性。现有研究倾向于通过简化学习过程来应对这些挑战，要么降低模型复杂度，要么采样数据，以牺牲性能换取运行时间。在本工作中，我们超越模型层面的修改，研究不同学习策略下嵌入表的性质。通过理论分析，我们发现嵌入表的奇异值与不同的CF损失函数存在内在关联。这些发现在真实世界数据集上得到了实证验证，证明了更高稳定秩的实际益处——稳定秩是矩阵秩的连续版本，编码了奇异值的分布。基于这些洞见，我们提出了一种高效的预热启动策略，用于正则化用户和物品嵌入的稳定秩。我们表明，在早期训练阶段进行稳定秩正则化可以促进更高质量的嵌入，从而实现高达66%的训练速度提升。此外，稳定秩正则化可以作为负采样的替代方案，使得在负采样率较小的损失函数上实现高达21%的性能提升。总体而言，我们的分析将当前CF方法统一在一个新视角下——即它们对稳定秩的优化，从而推动了一种灵活的正则化方法。