Beyond Dense Connectivity: Explicit Sparsity for Scalable Recommendation

Recent progress in scaling large models has motivated recommender systems to increase model depth and capacity to better leverage massive behavioral data. However, recommendation inputs are high-dimensional and extremely sparse, and simply scaling dense backbones (e.g., deep MLPs) often yields diminishing returns or even performance degradation. Our analysis of industrial CTR models reveals a phenomenon of implicit connection sparsity: most learned connection weights tend towards zero, while only a small fraction remain prominent. This indicates a structural mismatch between dense connectivity and sparse recommendation data; by compelling the model to process vast low-utility connections instead of valid signals, the dense architecture itself becomes the primary bottleneck to effective pattern modeling. We propose \textbf{SSR} (Explicit \textbf{S}parsity for \textbf{S}calable \textbf{R}ecommendation), a framework that incorporates sparsity explicitly into the architecture. SSR employs a multi-view "filter-then-fuse" mechanism, decomposing inputs into parallel views for dimension-level sparse filtering followed by dense fusion. Specifically, we realize the sparsity via two strategies: a Static Random Filter that achieves efficient structural sparsity via fixed dimension subsets, and Iterative Competitive Sparse (ICS), a differentiable dynamic mechanism that employs bio-inspired competition to adaptively retain high-response dimensions. Experiments on three public datasets and a billion-scale industrial dataset from AliExpress (a global e-commerce platform) show that SSR outperforms state-of-the-art baselines under similar budgets. Crucially, SSR exhibits superior scalability, delivering continuous performance gains where dense models saturate.

翻译：大规模模型的最新进展推动了推荐系统增加模型深度和容量，以更好地利用海量行为数据。然而，推荐输入具有高维度和极度稀疏的特性，单纯扩展稠密骨干网络（例如深度多层感知机）往往导致收益递减甚至性能退化。我们针对工业点击率模型的剖析揭示了一种隐式连接稀疏性现象：大多数学习到的连接权重趋近于零，仅有少量权重保持显著。这表明稠密连接与稀疏推荐数据之间存在结构性失配——稠密架构迫使模型处理大量低效连接而非有效信号，从而成为有效模式建模的主要瓶颈。我们提出 **SSR**（面向可扩展推荐的显式稀疏性），一种显式将稀疏性融入架构的框架。SSR采用多视角“过滤-融合”机制，将输入分解为并行视角，进行维度级稀疏过滤后再进行稠密融合。具体而言，我们通过两种策略实现稀疏性：静态随机过滤器通过固定维度子集实现高效结构性稀疏，以及迭代竞争稀疏机制（ICS）——一种基于生物启发的可微分动态机制，自适应保留高响应维度。在三个公开数据集以及来自全球电商平台速卖通的十亿级工业数据集上的实验表明，SSR在相近预算下优于最先进基线方法。关键在于，SSR展现出卓越的可扩展性，在稠密模型趋于饱和时仍能持续提升性能。