Recent advances in large language models have enabled the development of viable generative information retrieval systems. A generative retrieval system returns a grounded generated text in response to an information need instead of the traditional document ranking. Quantifying the utility of these types of responses is essential for evaluating generative retrieval systems. As the established evaluation methodology for ranking-based ad hoc retrieval may seem unsuitable for generative retrieval, new approaches for reliable, repeatable, and reproducible experimentation are required. In this paper, we survey the relevant information retrieval and natural language processing literature, identify search tasks and system architectures in generative retrieval, develop a corresponding user model, and study its operationalization. This theoretical analysis provides a foundation and new insights for the evaluation of generative ad hoc retrieval systems.
翻译:近年来大语言模型的进展使得发展可行的生成式信息检索系统成为可能。生成式检索系统根据信息需求返回基于事实生成的文本,而非传统的文档排序。量化这些响应的效用对于评估生成式检索系统至关重要。由于基于排序的临时检索的既定评估方法可能不适用于生成式检索,需要开发可靠、可重复、可复现的实验新方法。本文通过调研相关信息检索与自然语言处理文献,识别生成式检索中的搜索任务与系统架构,构建相应的用户模型,并研究其实践应用。这一理论分析为生成式临时检索系统的评估提供了基础与新的见解。