Statute retrieval aims to find relevant statutory articles for specific queries. This process is the basis of a wide range of legal applications such as legal advice, automated judicial decisions, legal document drafting, etc. Existing statute retrieval benchmarks focus on formal and professional queries from sources like bar exams and legal case documents, thereby neglecting non-professional queries from the general public, which often lack precise legal terminology and references. To address this gap, we introduce the STAtute Retrieval Dataset (STARD), a Chinese dataset comprising 1,543 query cases collected from real-world legal consultations and 55,348 candidate statutory articles. Unlike existing statute retrieval datasets, which primarily focus on professional legal queries, STARD captures the complexity and diversity of real queries from the general public. Through a comprehensive evaluation of various retrieval baselines, we reveal that existing retrieval approaches all fall short of these real queries issued by non-professional users. The best method only achieves a Recall@100 of 0.907, suggesting the necessity for further exploration and additional research in this area. All the codes and datasets are available at: https://github.com/oneal2000/STARD/tree/main
翻译:法规检索旨在为特定查询找到相关的法律条文。这一过程是法律咨询、自动化司法判决、法律文书起草等多种法律应用的基础。现有的法规检索基准主要关注来自司法考试和法律案件文档等形式化、专业化的查询,从而忽视了来自公众的非专业查询,这些查询往往缺乏精确的法律术语和引用。为填补这一空白,我们引入了法规检索数据集(STARD),这是一个中文数据集,包含从真实法律咨询中收集的1,543个查询案例以及55,348个候选法律条文。与现有主要关注专业法律查询的法规检索数据集不同,STARD捕捉了来自公众的真实查询的复杂性和多样性。通过对多种检索基线方法的全面评估,我们发现现有检索方法在处理非专业用户提出的这些真实查询时均存在不足。最佳方法仅能达到0.907的Recall@100,这表明该领域需要进一步的探索和更多的研究。所有代码和数据集均可在以下网址获取:https://github.com/oneal2000/STARD/tree/main