Re-Ranking Through an Attribution Lens for Citation Quality in Legal QA

from arxiv, 11 pages, 4 tables, 1 figure. Published at ASAIL 2026 (8th Workshop on Automated Semantic Analysis of Information in Legal Text), co-located with ICAIL 2026, Singapore

Retrieval-augmented generation systems for legal question answering typically retrieve passages based on semantic similarity and provide them to a language model, which then generates cited answers. Prior work assumes that highly ranked passages are most likely to be usefully cited by the model. Perturbation-based attribution methods, such as C-LIME, have been used exclusively for post-hoc explanation. However, on the AQuAECHR benchmark, semantic similarity does not correlate with passage attribution. Within a retriever's candidate pool, similarity-based ranking performs worse than random selection at surfacing gold citation paragraphs. To address this limitation, a lightweight cross-encoder is trained on continuous perturbation-based attribution scores to re-rank passages prior to generation. This approach is evaluated on the AQuAECHR benchmark, using two language models and five-fold cross-validation. The re-ranker substantially improves citation faithfulness and alignment with gold expert answers. Notably, two re-rankers trained independently on different models converge beyond their raw attribution agreement. This finding indicates that the cross-encoder reduces model-specific noise and produces a shared relevance signal that partially transfers across models, although same-model re-ranking remains more effective. These results demonstrate that perturbation-based attribution provides a practical, model-agnostic training signal for citation-aware retrieval.

翻译：摘要：面向法律问答的检索增强生成系统通常基于语义相似度检索段落并输入语言模型，由其生成带引用的答案。既有研究假设高排序段落最有可能被模型有效引用。扰动归因方法（如C-LIME）此前仅用于事后解释。然而，在AQuAECHR基准测试中，语义相似度与段落归因并不相关。在检索器的候选池中，基于相似度的排序在呈现黄金引用段落方面的表现甚至不如随机选择。为解决此局限，本文训练了一个轻量级跨编码器，基于连续扰动归因得分在生成前对段落进行重排序。该方法在AQuAECHR基准上使用两个语言模型及五折交叉验证进行评估。重排序器显著提升了引用忠实度及与专家黄金答案的对齐程度。值得注意的是，基于不同模型独立训练的两个重排序器展现的收敛性超越了其原始归因一致性。这一发现表明，跨编码器可降低模型特定噪声，产生可跨模型部分迁移的共享相关性信号，尽管同模型重排序仍更有效。这些结果证明，扰动归因能为引用感知检索提供实用的、模型无关的训练信号。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【ICML2025】使用树搜索重新排序推理上下文，使大型视觉语言模型更强大

专知会员服务

7+阅读 · 2025年6月10日

RAG+LLM=？同济大学等最新《大型语言模型的检索增强生成》综述

专知会员服务

111+阅读 · 2023年12月19日

《大型语言模型归因》综述

专知会员服务

75+阅读 · 2023年11月8日