Recommendation systems are a core feature of social media companies with their uses including recommending organic and promoted contents. Many modern recommendation systems are split into multiple stages - candidate generation and heavy ranking - to balance computational cost against recommendation quality. We focus on the candidate generation phase of a large-scale ads recommendation problem in this paper, and present a machine learning first heterogeneous re-architecture of this stage which we term TwERC. We show that a system that combines a real-time light ranker with sourcing strategies capable of capturing additional information provides validated gains. We present two strategies. The first strategy uses a notion of similarity in the interaction graph, while the second strategy caches previous scores from the ranking stage. The graph based strategy achieves a 4.08% revenue gain and the rankscore based strategy achieves a 1.38% gain. These two strategies have biases that complement both the light ranker and one another. Finally, we describe a set of metrics that we believe are valuable as a means of understanding the complex product trade offs inherent in industrial candidate generation systems.
翻译:推荐系统是社交媒体公司的核心功能,其应用包括自然内容与推广内容的推荐。为平衡计算成本与推荐质量,许多现代推荐系统被划分为多个阶段——候选生成与粗排序。本文聚焦于大规模广告推荐问题中的候选生成阶段,提出了一种以机器学习优先的异构架构重构方案,并将其命名为TwERC。研究表明,将实时轻量排序器与能够捕获额外信息的数据源策略相结合的系统可带来经验证的性能提升。我们提出两种策略:第一种策略利用交互图中的相似性概念,第二种策略则缓存排序阶段的先前得分。基于图的策略实现了4.08%的收入增长,而基于排名分数的策略实现了1.38%的增长。这两种策略的偏差特性既与轻量排序器互补,彼此之间也形成互补。最后,我们描述了一组我们认为有价值的评估指标,用于理解工业级候选生成系统中固有的复杂产品权衡关系。