Recent progress in scaling large models has motivated recommender systems to increase model depth and capacity to better leverage massive behavioral data. However, recommendation inputs are high-dimensional and extremely sparse, and simply scaling dense backbones (e.g., deep MLPs) often yields diminishing returns or even performance degradation. Our analysis of industrial CTR models reveals a phenomenon of implicit connection sparsity: most learned connection weights tend towards zero, while only a small fraction remain prominent. This indicates a structural mismatch between dense connectivity and sparse recommendation data; by compelling the model to process vast low-utility connections instead of valid signals, the dense architecture itself becomes the primary bottleneck to effective pattern modeling. We propose \textbf{SSR} (Explicit \textbf{S}parsity for \textbf{S}calable \textbf{R}ecommendation), a framework that incorporates sparsity explicitly into the architecture. SSR employs a multi-view "filter-then-fuse" mechanism, decomposing inputs into parallel views for dimension-level sparse filtering followed by dense fusion. Specifically, we realize the sparsity via two strategies: a Static Random Filter that achieves efficient structural sparsity via fixed dimension subsets, and Iterative Competitive Sparse (ICS), a differentiable dynamic mechanism that employs bio-inspired competition to adaptively retain high-response dimensions. Experiments on three public datasets and a billion-scale industrial dataset from AliExpress (a global e-commerce platform) show that SSR outperforms state-of-the-art baselines under similar budgets. Crucially, SSR exhibits superior scalability, delivering continuous performance gains where dense models saturate.
翻译:近期大规模模型扩展的进展促使推荐系统增加模型深度与容量,以更充分利用海量行为数据。然而推荐输入具有高维且极端稀疏的特性,单纯扩展密集骨干网络(如深度多层感知机)往往导致收益递减甚至性能退化。我们对工业级点击率模型的分析揭示了隐式连接稀疏性现象:多数学习到的连接权重趋近于零,仅少数显著权重保持突出。这表明密集连接与稀疏推荐数据之间存在结构性失配——密集架构迫使模型处理大量低效用连接而非有效信号,其本身成为有效模式建模的主要瓶颈。我们提出SSR(面向可扩展推荐的显式稀疏性框架),该框架将稀疏性显式融入架构,采用多视角"过滤-融合"机制,将输入分解为并行视图以实现维度级稀疏过滤,随后进行密集融合。具体而言,我们通过两种策略实现稀疏性:静态随机过滤器通过固定维度子集实现高效结构稀疏性,以及可微分动态机制——迭代竞争稀疏(ICS),其通过仿生竞争机制自适应保留高响应维度。在三个公开数据集及来自全球电商平台速卖通的十亿级工业数据集上的实验表明,SSR在相似预算下优于现有最优基线。关键的是,SSR展现出卓越的可扩展性,在密集模型性能饱和时仍能持续获得性能提升。