Retrieval finds a small number of relevant candidates from a large corpus for information retrieval and recommendation applications. A key component of retrieval is to model (user, item) similarity, which is commonly represented as the dot product of two learned embeddings. This formulation permits efficient inference, commonly known as Maximum Inner Product Search (MIPS). Despite its popularity, dot products cannot capture complex user-item interactions, which are multifaceted and likely high rank. We hence examine non-dot-product retrieval settings on accelerators, and propose \textit{mixture of logits} (MoL), which models (user, item) similarity as an adaptive composition of elementary similarity functions. This new formulation is expressive, capable of modeling high rank (user, item) interactions, and further generalizes to the long tail. When combined with a hierarchical retrieval strategy, \textit{h-indexer}, we are able to scale up MoL to 100M corpus on a single GPU with latency comparable to MIPS baselines. On public datasets, our approach leads to uplifts of up to 77.3\% in hit rate (HR). Experiments on a large recommendation surface at Meta showed strong metric gains and reduced popularity bias, validating the proposed approach's performance and improved generalization.
翻译:检索是从大规模语料库中找出少量相关候选对象的信息检索与推荐应用的核心组件。检索的关键在于建模(用户,物品)相似性,通常以两个学习嵌入向量的点积表示。这种形式允许高效推理,即最大内积搜索(MIPS)。尽管点积广泛使用,但无法捕捉用户与物品间多面且可能高秩的复杂交互。因此,我们探究加速器上非点积检索设定,并提出对数混合(MoL),该方法将(用户,物品)相似性建模为基本相似性函数的自适应组合。这一新形式具有表现力,能建模高秩(用户,物品)交互,并进一步泛化至长尾场景。结合分层检索策略h-indexer,我们能在单GPU上扩展MoL至1亿级语料库,且延迟与MIPS基线相当。在公开数据集上,该方法使命中率(HR)提升高达77.3%。在Meta的推荐大面上实验显示显著指标增益与流行度偏差降低,验证了所提方法的性能与泛化能力提升。