Candidate retrieval is the first stage in recommendation systems, where a light-weight system is used to retrieve potentially relevant items for an input user. These candidate items are then ranked and pruned in later stages of recommender systems using a more complex ranking model. As the top of the recommendation funnel, it is important to retrieve a high-recall candidate set to feed into downstream ranking models. A common approach is to leverage approximate nearest neighbor (ANN) search from a single dense query embedding; however, this approach this can yield a low-diversity result set with many near duplicates. As users often have multiple interests, candidate retrieval should ideally return a diverse set of candidates reflective of the user's multiple interests. To this end, we introduce kNN-Embed, a general approach to improving diversity in dense ANN-based retrieval. kNN-Embed represents each user as a smoothed mixture over learned item clusters that represent distinct "interests" of the user. By querying each of a user's mixture component in proportion to their mixture weights, we retrieve a high-diversity set of candidates reflecting elements from each of a user's interests. We experimentally compare kNN-Embed to standard ANN candidate retrieval, and show significant improvements in overall recall and improved diversity across three datasets. Accompanying this work, we open source a large Twitter follow-graph dataset (https://huggingface.co/datasets/Twitter/TwitterFollowGraph), to spur further research in graph-mining and representation learning for recommender systems.
翻译:候选检索是推荐系统的第一阶段,在此阶段使用轻量级系统为用户检索潜在相关物品。这些候选物品在后续阶段通过更复杂的排序模型进行排序和筛选。作为推荐漏斗的顶端,检索高召回率候选集对于向下游排序模型提供输入至关重要。常见方法是基于单一稠密查询嵌入的近似最近邻搜索,但这种方法可能导致结果集多样性低且包含大量近似重复项。由于用户通常具有多种兴趣,理想的候选检索应返回反映用户多兴趣的多样化候选集。为此,我们提出kNN-Embed,一种提升基于稠密近似最近邻检索多样性的通用方法。kNN-Embed将每个用户表示为基于已学习物品簇的平滑混合模型,这些物品簇代表用户的不同"兴趣"。通过按混合权重比例查询用户的每个混合分量,我们能够检索出反映用户各项兴趣元素的高多样性候选集。实验将kNN-Embed与标准近似最近邻候选检索进行对比,结果表明在三个数据集上,该方法在全量召回率和多样性指标上均有显著提升。同时,我们开源了大规模Twitter关注图数据集以推动推荐系统中图挖掘与表示学习的进一步研究。