Legal case retrieval, which aims to retrieve relevant cases to a given query case, benefits judgment justice and attracts increasing attention. Unlike generic retrieval queries, legal case queries are typically long and the definition of relevance is closely related to legal-specific elements. Therefore, legal case queries may suffer from noise and sparsity of salient content, which hinders retrieval models from perceiving correct information in a query. While previous studies have paid attention to improving retrieval models and understanding relevance judgments, we focus on enhancing legal case retrieval by utilizing the salient content in legal case queries. We first annotate the salient content in queries manually and investigate how sparse and dense retrieval models attend to those content. Then we experiment with various query content selection methods utilizing large language models (LLMs) to extract or summarize salient content and incorporate it into the retrieval models. Experimental results show that reformulating long queries using LLMs improves the performance of both sparse and dense models in legal case retrieval.
翻译:法律案例检索旨在针对给定查询案例检索相关案例,有助于实现司法公正,并日益受到关注。与通用检索查询不同,法律案例查询通常较长,且相关性的定义与法律特定要素密切相关。因此,法律案例查询可能受到噪声和显著内容稀疏性的影响,这阻碍了检索模型正确感知查询中的信息。以往研究主要关注改进检索模型和理解相关性判断,而本文聚焦于通过利用法律案例查询中的显著内容来增强法律案例检索。我们首先人工标注查询中的显著内容,并探究稀疏和密集检索模型对这些内容的关注程度。随后,我们尝试利用大语言模型进行多种查询内容选择方法,以提取或总结显著内容,并将其融入检索模型中。实验结果表明,使用大语言模型重构长查询可提升稀疏和密集模型在法律案例检索中的性能。