With the explosive growth of users and items, Recommender Systems (RS) are facing unprecedented challenges on both retrieval efficiency and storage cost. Fortunately, Learning to Hash (L2H) techniques have been shown as a promising solution to address the two dilemmas, whose core idea is encoding high-dimensional data into compact hash codes. To this end, L2H for RS (HashRec for short) has recently received widespread attention to support large-scale recommendations. In this survey, we present a comprehensive review of current HashRec algorithms. Specifically, we first introduce the commonly used two-tower models in the recall stage and identify two search strategies frequently employed in L2H. Then, we categorize prior works into two-tier taxonomy based on: (i) the type of loss function and (ii) the optimization strategy. We also introduce some commonly used evaluation metrics to measure the performance of HashRec algorithms. Finally, we shed light on the limitations of the current research and outline the future research directions. Furthermore, the summary of HashRec methods reviewed in this survey can be found at \href{https://github.com/Luo-Fangyuan/HashRec}{https://github.com/Luo-Fangyuan/HashRec}.
翻译:随着用户和物品数量的爆炸式增长,推荐系统在检索效率和存储成本方面均面临着前所未有的挑战。幸运的是,哈希学习技术已被证明是解决这两个困境的有效方案,其核心思想是将高维数据编码为紧凑的哈希码。为此,面向推荐系统的哈希学习(简称HashRec)近年来受到广泛关注,以支持大规模推荐。本文对现有的HashRec算法进行了全面综述。具体而言,我们首先介绍了召回阶段常用的双塔模型,并梳理了哈希学习中常用的两种搜索策略。随后,我们基于(i)损失函数类型和(ii)优化策略,将现有工作划分为双层分类体系。我们还介绍了一些常用的评估指标,以衡量HashRec算法的性能。最后,我们阐明了当前研究的局限性,并展望了未来的研究方向。此外,本综述所涵盖的HashRec方法总结可在 \href{https://github.com/Luo-Fangyuan/HashRec}{https://github.com/Luo-Fangyuan/HashRec} 找到。