Generative Retrieval introduces a new approach to Information Retrieval by reframing it as a constrained generation task, leveraging recent advancements in Autoregressive (AR) language models. However, AR-based Generative Retrieval methods suffer from high inference latency and cost compared to traditional dense retrieval techniques, limiting their practical applicability. This paper investigates fully Non-autoregressive (NAR) language models as a more efficient alternative for generative retrieval. While standard NAR models alleviate latency and cost concerns, they exhibit a significant drop in retrieval performance (compared to AR models) due to their inability to capture dependencies between target tokens. To address this, we question the conventional choice of limiting the target token space to solely words or sub-words. We propose PIXAR, a novel approach that expands the target vocabulary of NAR models to include multi-word entities and common phrases (up to 5 million tokens), thereby reducing token dependencies. PIXAR employs inference optimization strategies to maintain low inference latency despite the significantly larger vocabulary. Our results demonstrate that PIXAR achieves a relative improvement of 31.0% in MRR@10 on MS MARCO and 23.2% in Hits@5 on Natural Questions compared to standard NAR models with similar latency and cost. Furthermore, online A/B experiments on a large commercial search engine show that PIXAR increases ad clicks by 5.08% and revenue by 4.02%.
翻译:生成式检索通过将其重新定义为受约束的生成任务,并利用自回归(AR)语言模型的最新进展,为信息检索引入了一种新方法。然而,与传统密集检索技术相比,基于AR的生成式检索方法存在推理延迟高和成本高的问题,限制了其实用性。本文研究了完全非自回归(NAR)语言模型作为生成式检索的更高效替代方案。虽然标准NAR模型缓解了延迟和成本问题,但由于无法捕捉目标标记之间的依赖关系,其检索性能(与AR模型相比)显著下降。为解决这一问题,我们质疑了将目标标记空间限制为仅包含单词或子单词的传统选择。我们提出PIXAR,一种新颖方法,将NAR模型的目标词汇表扩展至包含多词实体和常见短语(最多500万个标记),从而减少标记依赖关系。PIXAR采用推理优化策略,在词汇表显著增大的情况下仍保持低推理延迟。实验结果表明,与具有相似延迟和成本的标准NAR模型相比,PIXAR在MS MARCO上的MRR@10相对提升了31.0%,在Natural Questions上的Hits@5相对提升了23.2%。此外,在大型商业搜索引擎上的在线A/B实验显示,PIXAR使广告点击量增加了5.08%,收入增加了4.02%。