Legal article retrieval is critical for building traceable and reliable legal AI systems, where conclusions must be grounded in specific legal articles. However, existing open-domain retrieval methods rely heavily on surface-level lexical or semantic similarity, making it difficult for them to distinguish legally relevant articles from those that are textually similar but legally inapplicable or misaligned with the user's underlying intent. To bridge this gap, we propose \textsc{LexPath}, a domain-oriented multi-path framework comprising a multi-path retrieval module and an intent-aware reranking module. The retrieval module combines two complementary legal-specific paths to collect candidate articles: an IRAC-guided sparse path that expands queries with legally informative keywords, and a structure-guided dense path trained with hard negatives derived from legal hierarchy and citation relations. Then, the reranking module further refines the candidate ranking by incorporating the intent consistency score between queries and legal articles. We evaluate \textsc{LexPath} on two publicly available benchmarks focusing on general-public queries and a self-constructed benchmark targeting domain-professional scenarios. Experimental results demonstrate that \textsc{LexPath} consistently outperforms lexical, dense, hybrid, and adaptive retrieval-augmented generation (RAG) baselines. Ablation studies further verify the effectiveness of each component.
翻译:法律条文检索对于构建可追溯且可靠的法律人工智能系统至关重要,这类系统中的结论必须基于特定法律条文。然而,现有开放域检索方法过度依赖表层词汇或语义相似度,难以区分法律相关条文与那些文本相似但在法律上不适用或与用户潜在意图不符的条文。为此,我们提出LexPath——一个面向领域的多路径框架,包含多路径检索模块和意图感知重排序模块。检索模块整合两条互补的法律专用路径以收集候选条文:一条是基于IRAC指导的稀疏路径,通过法律信息性关键词扩展查询;另一条是基于结构指导的稠密路径,利用法律层级与引用关系生成的硬负样本进行训练。随后,重排序模块通过引入查询与法律条文之间的意图一致性得分,进一步优化候选排序结果。我们在两个面向普通公众查询的公开基准数据集以及一个面向领域专业场景的自主构建基准数据集上评估了LexPath。实验结果表明,LexPath始终优于词汇、稠密、混合及自适应检索增强生成(RAG)基线方法。消融实验进一步验证了各组件的有效性。