Legal judgment generation is a critical task in legal intelligence. However, existing research in legal judgment generation has predominantly focused on first-instance trials, relying on static fact-to-verdict mappings while neglecting the dialectical nature of appellate (second-instance) review. To address this, we introduce AppellateGen, a benchmark for second-instance legal judgment generation comprising 7,351 case pairs. The task requires models to draft legally binding judgments by reasoning over the initial verdict and evidentiary updates, thereby modeling the causal dependency between trial stages. We further propose a judicial Standard Operating Procedure (SOP)-based Legal Multi-Agent System (SLMAS) to simulate judicial workflows, which decomposes the generation process into discrete stages of issue identification, retrieval, and drafting. Experimental results indicate that while SLMAS improves logical consistency, the complexity of appellate reasoning remains a substantial challenge for current LLMs. The dataset and code are publicly available at: https://anonymous.4open.science/r/AppellateGen-5763.
翻译:摘要:法律判决生成是法律智能中的关键任务。然而,现有法律判决生成研究主要聚焦于一审审判,依赖静态事实到判决的映射,而忽视了上诉审(二审)审查的辩证特性。为此,我们提出AppellateGen——一个包含7,351个案对的上诉审法律判决生成基准。该任务要求模型通过推理初始判决及证据更新来起草具有法律约束力的判决,从而模拟审判阶段间的因果依赖关系。我们进一步提出基于司法标准操作程序(SOP)的法律多智能体系统(SLMAS),通过将生成过程分解为问题识别、检索和起草等离散阶段来模拟司法工作流程。实验结果表明,尽管SLMAS提升了逻辑一致性,但上诉推理的复杂性对当前大语言模型(LLMs)仍是重大挑战。数据集与代码已公开于:https://anonymous.4open.science/r/AppellateGen-5763