Retrieval Augmented Generation (RAG)'s success depends on the utility the LLM derives from the content used for grounding. Quantifying content utility does not have a definitive specification and existing metrics ignore model-specific capabilities and/or rely on costly annotations. In this paper, we propose Grounding Generation Utility (GroGU), a model-specific and reference-free metric that defines utility as a function of the downstream LLM's generation confidence based on entropy. Despite having no annotation requirements, GroGU is largely faithful in distinguishing ground-truth documents while capturing nuances ignored by LLM-agnostic metrics. We apply GroGU to train a query-rewriter for RAG by identifying high-utility preference data for Direct Preference Optimization. Experiments show improvements by up to 18.2 points in Mean Reciprocal Rank and up to 9.4 points in answer accuracy.
翻译:检索增强生成(RAG)的成功取决于大语言模型(LLM)从用于基础支撑的内容中所获得的效用。量化内容效用尚无明确的规范,现有指标要么忽略了模型特定的能力,要么依赖于成本高昂的人工标注。本文提出基础生成效用(GroGU),这是一种模型特定且无需参考的指标,其将效用定义为下游LLM基于熵的生成置信度的函数。尽管无需任何标注,GroGU在区分真实基础文档方面具有较高的忠实度,同时能捕捉到与LLM无关的指标所忽略的细微差别。我们应用GroGU来训练RAG的查询重写器,方法是为直接偏好优化识别高效用偏好数据。实验表明,在平均倒数排名上提升了高达18.2分,在答案准确率上提升了高达9.4分。