Beyond Dense Connectivity: Explicit Sparsity for Scalable Recommendation

Recent progress in scaling large models has motivated recommender systems to increase model depth and capacity to better leverage massive behavioral data. However, recommendation inputs are high-dimensional and extremely sparse, and simply scaling dense backbones (e.g., deep MLPs) often yields diminishing returns or even performance degradation. Our analysis of industrial CTR models reveals a phenomenon of implicit connection sparsity: most learned connection weights tend towards zero, while only a small fraction remain prominent. This indicates a structural mismatch between dense connectivity and sparse recommendation data; by compelling the model to process vast low-utility connections instead of valid signals, the dense architecture itself becomes the primary bottleneck to effective pattern modeling. We propose \textbf{SSR} (Explicit \textbf{S}parsity for \textbf{S}calable \textbf{R}ecommendation), a framework that incorporates sparsity explicitly into the architecture. SSR employs a multi-view "filter-then-fuse" mechanism, decomposing inputs into parallel views for dimension-level sparse filtering followed by dense fusion. Specifically, we realize the sparsity via two strategies: a Static Random Filter that achieves efficient structural sparsity via fixed dimension subsets, and Iterative Competitive Sparse (ICS), a differentiable dynamic mechanism that employs bio-inspired competition to adaptively retain high-response dimensions. Experiments on three public datasets and a billion-scale industrial dataset from AliExpress (a global e-commerce platform) show that SSR outperforms state-of-the-art baselines under similar budgets. Crucially, SSR exhibits superior scalability, delivering continuous performance gains where dense models saturate.

翻译：大规模模型的最新进展促使推荐系统通过增加模型深度与容量，以更充分地利用海量行为数据。然而，推荐输入具有高维且极度稀疏的特性，单纯扩展密集骨干网络（如深层MLP）往往导致边际效益递减甚至性能退化。我们对工业级CTR模型的分析揭示了隐式连接稀疏性现象：大多数学习到的连接权重趋近于零，仅极小部分保持显著。这表明密集连接与稀疏推荐数据之间存在结构性失配：密集架构迫使模型处理大量低效连接而非有效信号，其本身成为有效模式建模的主要瓶颈。我们提出SSR（面向可扩展推荐的显式稀疏性）框架，该框架将稀疏性显式融入架构设计。SSR采用多视图"过滤-融合"机制，将输入分解为并行视图，先进行维度级稀疏过滤，再执行密集融合。具体而言，我们通过两种策略实现稀疏性：静态随机过滤通过固定维度子集实现高效结构性稀疏，以及可迭代竞争稀疏（ICS）这一可微分动态机制，采用仿生竞争自适应保留高响应维度。在三个公开数据集及阿里全球电商平台的大规模工业级数据集上的实验表明，SSR在相近计算预算下优于现有最先进基线模型。关键地，SSR展现出优越的可扩展性，在密集模型趋于饱和时仍能持续获得性能提升。