Legal citation in common-law systems depends not only on factual similarity, but also on the legal principle for which a precedent is invoked. However, existing benchmarks for legal citation retrieval use case facts, citation context, or full judgments as inputs, where the governing legal principle is often missing or only implicitly expressed and entangled with broader context. As a result, models may retrieve precedents that are factually similar yet doctrinally irrelevant. This limitation is particularly consequential in Singapore, where the legal system has evolved independently: only domestic precedents are binding, while foreign authorities serve merely as persuasive references. Thus, we propose a new retrieval paradigm that ranks cited cases based on queries integrating case facts and explicit legal principles, inspired by real-world legal reasoning workflows. To support this paradigm, we introduce SG-LegalCite, a dataset of 100,890 case-principle pairs extracted from 8,523 Singapore Supreme Court judgments spanning from 2000 to 2025. Experiments across 11 baselines demonstrate the effectiveness of our principle-augmented retrieval paradigm, showing that explicit legal principles provide strong discriminative signals for legal citation retrieval.
翻译:在普通法系中,法律引文不仅依赖于事实相似性,还取决于援引先例所依据的法律原则。然而,现有的法律引文检索基准以案件事实、引文上下文或完整判决书作为输入,其中支配性的法律原则常常缺失,或仅隐含表述并与更广泛的上下文相互纠缠。因此,模型可能检索到事实相似但法理无关的先例。这一局限性在新加坡尤为显著,因为其法律体系已独立演变:仅国内先例具有约束力,而外国权威仅作为说服性参考。为此,我们提出一种新的检索范式,受真实法律推理流程启发,该范式基于融合案件事实与显式法律原则的查询对引用的案例进行排序。为支持这一范式,我们构建了SG-LegalCite数据集,其中包含从2000年至2025年的8,523份新加坡最高法院判决书中提取的100,890个案-原则对。跨越11个基线的实验证明了我们原则增强型检索范式的有效性,表明显式的法律原则为法律引文检索提供了强有力的判别性信号。