Item indexing, which maps a large corpus of items into compact discrete representations, is critical for both discriminative and generative recommender systems, yet existing Vector Quantization (VQ)-based approaches struggle with the highly skewed and non-stationary item distributions common in streaming industry recommenders, leading to poor assignment accuracy, imbalanced cluster occupancy, and insufficient cluster separation. To address these challenges, we propose MERGE, a next-generation item indexing paradigm that adaptively constructs clusters from scratch, dynamically monitors cluster occupancy, and forms hierarchical index structures via fine-to-coarse merging. Extensive experiments demonstrate that MERGE significantly improves assignment accuracy, cluster uniformity, and cluster separation compared with existing indexing methods, while online A/B tests show substantial gains in key business metrics, highlighting its potential as a foundational indexing approach for large-scale recommendation.
翻译:物品索引将大规模物品库映射为紧凑的离散表示,对于判别式与生成式推荐系统均至关重要。然而,现有基于向量量化(VQ)的方法难以应对流式工业推荐系统中常见的高度偏态与非平稳的物品分布,导致分配准确率低下、聚类占用不均衡以及聚类分离度不足。为应对这些挑战,我们提出MERGE——一种新一代物品索引范式,该范式能够自适应地从头构建聚类、动态监测聚类占用情况,并通过从细粒度到粗粒度的合并形成层次化索引结构。大量实验表明,相较于现有索引方法,MERGE在分配准确率、聚类均匀性与聚类分离度方面均有显著提升;同时在线A/B测试显示关键业务指标获得大幅增长,突显其作为大规模推荐基础索引方法的潜力。