Automating the drafting of judgment documents is pivotal to judicial efficiency, yet it remains challenging due to the dual requirements of comprehensive retrieval of legal information and rigorous logical reasoning. Existing approaches, typically relying on standard Retrieval-Augmented Generation and Supervised Fine-Tuning, often suffer from insufficient evidence recall, hallucinated statutory references, and logically flawed legal reasoning. To bridge this gap, we propose Judge-R1, a unified framework designed to enhance LLM-based judgment document generation by jointly improving legal information collection and judgment document generation. First, we introduce Agentic Legal Information Collection, which employs a dynamic planning agent to retrieve precise statutes and precedents from multiple sources. Second, we implement Rubric-Guided Optimization, a reinforcement learning phase utilizing Group Relative Policy Optimization (GRPO) with a comprehensive legal reward function to enforce adherence to judicial standards and reasoning logic. Extensive experiments on the JuDGE benchmark demonstrate that Judge-R1 significantly outperforms state-of-the-art baselines in both legal accuracy and generation quality.
翻译:自动化判决文书的撰写对于提升司法效率至关重要,但由于需要兼顾法律信息的全面检索与严谨的逻辑推理,这一任务仍充满挑战。现有方法通常依赖标准的检索增强生成和监督微调,常面临证据召回不足、法条引用虚构以及法律推理逻辑缺陷等问题。为解决这一瓶颈,我们提出了Judge-R1,这是一个统一框架,旨在通过联合改进法律信息收集与判决文书生成来增强基于大语言模型的判决文书生成能力。首先,我们引入了智能法律信息收集机制,采用动态规划智能体从多个来源检索精确的法条与判例。其次,我们实现了评分引导优化,这是一个利用分组相对策略优化(GRPO)并配合全面法律奖励函数的强化学习阶段,以强制模型遵循司法标准与推理逻辑。在JuDGE基准上的大量实验表明,Judge-R1在法律准确性与生成质量方面均显著优于现有最优基线模型。