Generative Retrieval (GR) is an emerging paradigm in information retrieval that leverages generative models to directly map queries to relevant document identifiers (DocIDs) without the need for traditional query processing or document reranking. This survey provides a comprehensive overview of GR, highlighting key developments, indexing and retrieval strategies, and challenges. We discuss various document identifier strategies, including numerical and string-based identifiers, and explore different document representation methods. Our primary contribution lies in outlining future research directions that could profoundly impact the field: improving the quality of query generation, exploring learnable document identifiers, enhancing scalability, and integrating GR with multi-task learning frameworks. By examining state-of-the-art GR techniques and their applications, this survey aims to provide a foundational understanding of GR and inspire further innovations in this transformative approach to information retrieval. We also make the complementary materials such as paper collection publicly available at https://github.com/MiuLab/GenIR-Survey/
翻译:生成式检索是信息检索领域的一种新兴范式,它利用生成模型将查询直接映射到相关文档标识符,而无需传统的查询处理或文档重排序。本综述全面概述了生成式检索,重点介绍了关键进展、索引与检索策略以及面临的挑战。我们讨论了多种文档标识符策略,包括基于数值和字符串的标识符,并探讨了不同的文档表示方法。我们的主要贡献在于指出了可能深刻影响该领域的未来研究方向:提升查询生成质量、探索可学习的文档标识符、增强可扩展性,以及将生成式检索与多任务学习框架相结合。通过审视最先进的生成式检索技术及其应用,本综述旨在提供对生成式检索的基础性理解,并激发这种变革性信息检索方法的进一步创新。我们还在 https://github.com/MiuLab/GenIR-Survey/ 公开提供了论文集合等补充材料。