Scalability issue plays a crucial role in productionizing modern recommender systems. Even lightweight architectures may suffer from high computational overload due to intermediate calculations, limiting their practicality in real-world applications. Specifically, applying full Cross-Entropy (CE) loss often yields state-of-the-art performance in terms of recommendations quality. Still, it suffers from excessive GPU memory utilization when dealing with large item catalogs. This paper introduces a novel Scalable Cross-Entropy (SCE) loss function in the sequential learning setup. It approximates the CE loss for datasets with large-size catalogs, enhancing both time efficiency and memory usage without compromising recommendations quality. Unlike traditional negative sampling methods, our approach utilizes a selective GPU-efficient computation strategy, focusing on the most informative elements of the catalog, particularly those most likely to be false positives. This is achieved by approximating the softmax distribution over a subset of the model outputs through the maximum inner product search. Experimental results on multiple datasets demonstrate the effectiveness of SCE in reducing peak memory usage by a factor of up to 100 compared to the alternatives, retaining or even exceeding their metrics values. The proposed approach also opens new perspectives for large-scale developments in different domains, such as large language models.
翻译:可扩展性问题在现代推荐系统的实际部署中起着至关重要的作用。即使是轻量级架构,也可能因中间计算而产生高昂的计算开销,从而限制其在现实应用中的实用性。具体而言,应用完整的交叉熵(CE)损失函数通常在推荐质量方面能取得最先进的性能。然而,在处理大规模物品目录时,它仍会遭受过高的GPU内存占用。本文在序列学习框架下引入了一种新颖的可扩展交叉熵(SCE)损失函数。它针对具有大规模目录的数据集近似计算CE损失,在不损害推荐质量的前提下,同时提升了时间效率和内存使用效率。与传统的负采样方法不同,我们的方法采用了一种选择性的GPU高效计算策略,专注于目录中最具信息量的元素,特别是那些最可能成为误报的项。这是通过最大内积搜索来近似模型输出在子集上的softmax分布实现的。在多个数据集上的实验结果表明,与现有方法相比,SCE能将峰值内存使用量降低高达100倍,同时保持甚至超越其指标值。所提出的方法也为不同领域(例如大语言模型)的大规模发展开辟了新的视角。