AppellateGen：上诉法律判决生成的基准 (AppellateGen: A Benchmark for Appellate Legal Judgment Generation)

Legal judgment generation is a critical task in legal intelligence. However, existing research in legal judgment generation has predominantly focused on first-instance trials, relying on static fact-to-verdict mappings while neglecting the dialectical nature of appellate (second-instance) review. To address this, we introduce AppellateGen, a benchmark for second-instance legal judgment generation comprising 7,351 case pairs. The task requires models to draft legally binding judgments by reasoning over the initial verdict and evidentiary updates, thereby modeling the causal dependency between trial stages. We further propose a judicial Standard Operating Procedure (SOP)-based Legal Multi-Agent System (SLMAS) to simulate judicial workflows, which decomposes the generation process into discrete stages of issue identification, retrieval, and drafting. Experimental results indicate that while SLMAS improves logical consistency, the complexity of appellate reasoning remains a substantial challenge for current LLMs. The dataset and code are publicly available at: https://anonymous.4open.science/r/AppellateGen-5763.

翻译：法律判决生成是法律智能中的关键任务。然而，现有法律判决生成研究主要集中于一审审判，依赖于静态的事实到判决的映射，而忽视了上诉（二审）审查的辩证性质。为解决此问题，我们引入了AppellateGen，一个用于二审法律判决生成的基准，包含7,351个案例对。该任务要求模型通过推理初始判决和证据更新来起草具有法律约束力的判决，从而建模审判阶段之间的因果依赖关系。我们进一步提出了一种基于司法标准操作程序（SOP）的法律多智能体系统（SLMAS）来模拟司法工作流程，该系统将生成过程分解为问题识别、检索和起草等离散阶段。实验结果表明，虽然SLMAS提高了逻辑一致性，但上诉推理的复杂性对当前大型语言模型而言仍然是一个重大挑战。数据集和代码公开于：https://anonymous.4open.science/r/AppellateGen-5763。