Efficient Retrieval Scaling with Hierarchical Indexing for Large Scale Recommendation

Dongqi Fu,Kaushik Rangadurai,Haiyu Lu,Yunchen Pu,Siyang Yuan,Minhui Huang,Yiqun Liu,Golnaz Ghasemiesfeh,Xingfeng He,Fangzhou Xu,Andrew Cui,Vidhoon Viswanathan,Lin Yang,Liang Wang,Jiyan Yang,Chonglin Sun

from arxiv, 11 pages, 5 figures

The increase in data volume, computational resources, and model parameters during training has led to the development of numerous large-scale industrial retrieval models for recommendation tasks. However, effectively and efficiently deploying these large-scale foundational retrieval models remains a critical challenge that has not been fully addressed. Common quick-win solutions for deploying these massive models include relying on offline computations (such as cached user dictionaries) or distilling large models into smaller ones. Yet, both approaches fall short of fully leveraging the representational and inference capabilities of foundational models. In this paper, we explore whether it is possible to learn a hierarchical organization over the memory of foundational retrieval models. Such a hierarchical structure would enable more efficient search by reducing retrieval costs while preserving exactness. To achieve this, we propose jointly learning a hierarchical index using cross-attention and residual quantization for large-scale retrieval models. We also present its real-world deployment at Meta, supporting daily advertisement recommendations for billions of Facebook and Instagram users. Interestingly, we discovered that the intermediate nodes in the learned index correspond to a small set of high-quality data. Fine-tuning the model on this set further improves inference performance, and concretize the concept of "test-time training" within the recommendation system domain. We demonstrate these findings using both internal and public datasets with strong baseline comparisons and hope they contribute to the community's efforts in developing the next generation of foundational retrieval models.

翻译：随着训练过程中数据量、计算资源和模型参数的增长，涌现出众多面向推荐任务的大规模工业级检索模型。然而，如何有效且高效地部署这些大规模基础检索模型，仍是一个尚未完全解决的关键挑战。部署此类巨型模型的常见速效方案包括依赖离线计算（如缓存用户字典）或将大模型蒸馏为轻量模型。但这两类方法均未能充分发挥基础模型表征与推理能力。本文探究能否在基础检索模型的记忆空间上学习层次化组织结构：这种层次结构可通过降低检索成本实现更高效搜索，同时保持结果精确性。为此，我们提出联合学习基于交叉注意力与残差量化的层次化索引方法，并将其部署于Meta公司实际业务系统，支撑面向数十亿Facebook和Instagram用户的日常广告推荐。有趣的是，我们发现学习所得索引的中间节点对应少量高质量数据子集。在此基础上微调模型可进一步提升推理性能，并具体化了推荐系统领域"测试时训练"的概念。通过内部数据集与公开数据集上的强基线对比实验验证了这些发现，期望能为社区开发下一代基础检索模型提供助力。