Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness. However, existing SCR datasets only focus on the fact description section when judging the similarity between cases, ignoring other valuable sections (e.g., the court's opinion) that can provide insightful reasoning process behind. Furthermore, the case similarities are typically measured solely by the textual semantics of the fact descriptions, which may fail to capture the full complexity of legal cases from the perspective of legal knowledge. In this work, we present MUSER, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations. Specifically, we select three perspectives (legal fact, dispute focus, and law statutory) and build a comprehensive and structured label schema of legal elements for each of them, to enable accurate and knowledgeable evaluation of case similarities. The constructed dataset originates from Chinese civil cases and contains 100 query cases and 4,024 candidate cases. We implement several text classification algorithms for legal element prediction and various retrieval methods for retrieving similar cases on MUSER. The experimental results indicate that incorporating legal elements can benefit the performance of SCR models, but further efforts are still required to address the remaining challenges posed by MUSER. The source code and dataset are released at https://github.com/THUlawtech/MUSER.
翻译:相似案例检索(SCR)是代表性法律人工智能应用,在促进司法公正中发挥着关键作用。然而现有SCR数据集仅依据事实描述部分判断案例相似性,忽略了其他能够提供深入推理过程的有价值章节(如法院意见)。此外,案例相似性通常仅通过事实描述文本语义进行衡量,这难以从法律知识角度完整捕捉法律案例的复杂性。本文提出MUSER——基于多视角相似性度量和综合法律要素(含句子级法律要素标注)的相似案例检索数据集。具体而言,我们从三个视角(法律事实、争议焦点、法律依据)出发,为每个视角构建了结构化、全面的法律要素标注模式,以实现准确且具有知识性的案例相似性评估。该数据集源于中国民事案例,包含100个查询案例与4,024个候选案例。我们在MUSER上实现了多种文本分类算法用于法律要素预测,以及多种检索方法用于相似案例检索。实验结果表明,引入法律要素有利于提升SCR模型性能,但MUSER数据集带来的挑战仍需进一步研究。源代码与数据集已在https://github.com/THUlawtech/MUSER 发布。