MALLOC: Benchmarking the Memory-aware Long Sequence Compression for Large Sequential Recommendation

The scaling law, which indicates that model performance improves with increasing dataset and model capacity, has fueled a growing trend in expanding recommendation models in both industry and academia. However, the advent of large-scale recommenders also brings significantly higher computational costs, particularly under the long-sequence dependencies inherent in the user intent of recommendation systems. Current approaches often rely on pre-storing the intermediate states of the past behavior for each user, thereby reducing the quadratic re-computation cost for the following requests. Despite their effectiveness, these methods often treat memory merely as a medium for acceleration, without adequately considering the space overhead it introduces. This presents a critical challenge in real-world recommendation systems with billions of users, each of whom might initiate thousands of interactions and require massive memory for state storage. Fortunately, there have been several memory management strategies examined for compression in LLM, while most have not been evaluated on the recommendation task. To mitigate this gap, we introduce MALLOC, a comprehensive benchmark for memory-aware long sequence compression. MALLOC presents a comprehensive investigation and systematic classification of memory management techniques applicable to large sequential recommendations. These techniques are integrated into state-of-the-art recommenders, enabling a reproducible and accessible evaluation platform. Through extensive experiments across accuracy, efficiency, and complexity, we demonstrate the holistic reliability of MALLOC in advancing large-scale recommendation. Code is available at https://anonymous.4open.science/r/MALLOC.

翻译：缩放定律表明模型性能随数据集和模型容量的增加而提升，这一规律推动了工业界和学术界不断扩展推荐模型的趋势。然而，大规模推荐系统的出现也带来了显著更高的计算成本，尤其是在推荐系统用户意图所固有的长序列依赖场景下。现有方法通常依赖预存储每位用户历史行为的中间状态，从而降低后续请求的二次重计算成本。尽管这些方法有效，但它们往往仅将内存视为加速媒介，未能充分考虑其引入的空间开销。这在具有数十亿用户的现实推荐系统中构成了关键挑战——每位用户可能产生数千次交互，需要海量内存进行状态存储。值得庆幸的是，已有多种面向大语言模型压缩的内存管理策略被研究，但大多尚未在推荐任务中得到评估。为弥补这一空白，我们提出MALLOC——一个面向内存感知长序列压缩的综合基准测试框架。MALLOC对适用于大规模序列推荐的内存管理技术进行了全面调研与系统分类，并将这些技术集成至前沿推荐模型中，构建了可复现且易用的评估平台。通过在多维度（准确性、效率与复杂度）上的大量实验，我们证明了MALLOC在推进大规模推荐系统发展方面的整体可靠性。代码发布于https://anonymous.4open.science/r/MALLOC。