Legal judgment prediction (LJP) aims to predict judicial outcomes from case facts and typically includes law article, charge, and sentencing prediction. While recent methods perform well on the first two subtasks, legal sentencing prediction (LSP) remains difficult due to its need for fine-grained objective knowledge and flexible subjective reasoning. To address these limitations, we propose $MSR^2$, a framework that integrates multi-source retrieval and reasoning in LLMs with reinforcement learning. $MSR^2$ enables LLMs to perform multi-source retrieval based on reasoning needs and applies a process-level reward to guide intermediate subjective reasoning steps. Experiments on two real-world datasets show that $MSR^2$ improves both accuracy and interpretability in LSP, providing a promising step toward practical legal AI. Our code is available at https://anonymous.4open.science/r/MSR2-FC3B.
翻译:法律判决预测旨在根据案件事实预测司法结果,通常包括法条预测、罪名预测和量刑预测。尽管现有方法在前两个子任务上表现良好,但法律量刑预测因其对细粒度客观知识和灵活主观推理的需求而仍然具有挑战性。为应对这些局限,我们提出$MSR^2$框架,该框架通过强化学习将多源检索与大型语言模型中的推理能力相结合。$MSR^2$使大型语言模型能够根据推理需求执行多源检索,并应用过程级奖励机制来引导中间主观推理步骤。在两个真实数据集上的实验表明,$MSR^2$在法律量刑预测中同时提升了准确性与可解释性,为实用法律人工智能的发展提供了可行路径。我们的代码公开于https://anonymous.4open.science/r/MSR2-FC3B。