Finding truly analogous precedents requires capturing legal reasoning beyond surface word overlap. We present a two-stage, section-aware framework for legal case retrieval that first segments raw judgments into facts, issues, decision, and reasoning using a deterministic large language model (LLM) offline. In Stage 1, we combine parallel lexical (BM25) and semantic (dense ANN) whole-document searches via Reciprocal Rank Fusion (RRF) to form a high-recall candidate pool. In Stage 2, we perform fine-grained, like-for-like comparisons (e.g., query reasoning vs. candidate reasoning). To address the scale mismatch between unbounded lexical scores and cosine similarities, we apply query-wise Z-score normalization before aggregating signals with learned section weights. For the top results, the system returns the relevant section text with a concise, grounded rationale and party-stance labels. We evaluate on a jurisdiction-scale benchmark, demonstrating consistent gains over strong lexical and neural baselines while maintaining high candidate coverage
翻译:寻找真正相似的判例需要捕捉超越表面词汇重叠的法律推理过程。我们提出了一种两阶段、面向章节的法律案例检索框架:首先使用确定性大语言模型(LLM)离线将原始判决书分割为事实、争议焦点、判决和推理部分。在第一阶段,通过互逆排序融合(RRF)将并行词汇(BM25)与语义(稠密ANN)全文检索相结合,形成高召回候选池。第二阶段执行精细化的同类对比(例如:查询推理与候选推理)。为解决无界词汇得分与余弦相似度之间的量纲不匹配问题,我们在聚合带学习章节权重的信号前引入查询级Z-score标准化。针对最优结果,系统会返回相关章节文本,并附上简洁的推理依据和当事人立场标签。我们在司法辖区级基准数据集上评估,证明该方法在保持高候选覆盖率的同时,相较于强词汇和神经基线模型具有持续优势。