Beyond Dense Connectivity: Explicit Sparsity for Scalable Recommendation

Recent progress in scaling large models has motivated recommender systems to increase model depth and capacity to better leverage massive behavioral data. However, recommendation inputs are high-dimensional and extremely sparse, and simply scaling dense backbones (e.g., deep MLPs) often yields diminishing returns or even performance degradation. Our analysis of industrial CTR models reveals a phenomenon of implicit connection sparsity: most learned connection weights tend towards zero, while only a small fraction remain prominent. This indicates a structural mismatch between dense connectivity and sparse recommendation data; by compelling the model to process vast low-utility connections instead of valid signals, the dense architecture itself becomes the primary bottleneck to effective pattern modeling. We propose SSR (Explicit Sparsity for Scalable Recommendation), a framework that incorporates sparsity explicitly into the architecture. SSR employs a multi-view "filter-then-fuse" mechanism, decomposing inputs into parallel views for dimension-level sparse filtering followed by dense fusion. Specifically, we realize the sparsity via two strategies: a Static Random Filter that achieves efficient structural sparsity via fixed dimension subsets, and Iterative Competitive Sparse (ICS), a differentiable dynamic mechanism that employs bio-inspired competition to adaptively retain high-response dimensions. Experiments on three public datasets and a billion-scale industrial dataset from AliExpress (a global e-commerce platform) show that SSR outperforms state-of-the-art baselines under similar budgets. Crucially, SSR exhibits superior scalability, delivering continuous performance gains where dense models saturate.

翻译：大规模模型的最新进展推动推荐系统增加模型深度和容量，以更好地利用海量行为数据。然而，推荐输入具有高维且极度稀疏的特性，单纯扩展密集骨干网络（如深层MLP）往往导致收益递减甚至性能退化。我们对工业级CTR模型的分析揭示了隐式连接稀疏性现象：大多数学习到的连接权重趋近于零，仅少数权重保持显著。这表明密集连接与稀疏推荐数据之间存在结构性错配——迫使模型处理大量低效用连接而非有效信号，使得密集架构本身成为有效模式建模的主要瓶颈。我们提出SSR（面向可扩展推荐系统的显式稀疏性）框架，将稀疏性显式融入架构设计。SSR采用多视角"滤波-融合"机制，将输入分解为并行视图，先进行维度级稀疏滤波再执行密集融合。具体而言，我们通过两种策略实现稀疏性：静态随机滤波器通过固定维度子集实现高效结构稀疏性，以及迭代竞争稀疏机制（ICS）——一种受生物竞争启发的可微分动态机制，自适应保留高响应维度。在三个公开数据集和来自全球电商平台AliExpress的十亿级工业数据集上的实验表明，SSR在相似预算下优于现有基线模型。关键在于，SSR展现出卓越的可扩展性，在密集模型饱和的场景中仍能持续获得性能增益。