Reasoning-focused large language models (LLMs) are rapidly evolving across various domains, yet their capabilities in handling complex legal problems remains underexplored. In this paper, we introduce Unilaw-R1, a large language model tailored for legal reasoning. With a lightweight 7-billion parameter scale, Unilaw-R1 significantly reduces deployment cost while effectively tackling three core challenges in the legal domain: insufficient legal knowledge, unreliable reasoning logic, and weak business generalization. To address these issues, we first construct Unilaw-R1-Data, a high-quality dataset containing 17K distilled and screened chain-of-thought (CoT) samples. Based on this, we adopt a two-stage training strategy combining Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), which significantly boosts the performance on complex legal reasoning tasks and supports interpretable decision-making in legal AI applications. To assess legal reasoning ability, we also introduce Unilaw-R1-Eval, a dedicated benchmark designed to evaluate models across single- and multi-choice legal tasks. Unilaw-R1 demonstrates strong results on authoritative benchmarks, outperforming all models of similar scale and achieving performance on par with the much larger DeepSeek-R1-Distill-Qwen-32B (54.9%). Following domain-specific training, it also showed significant gains on LawBench and LexEval, exceeding Qwen-2.5-7B-Instruct (46.6%) by an average margin of 6.6%.
翻译:面向推理的大语言模型(LLMs)在各领域快速发展,但其处理复杂法律问题的能力仍待深入探索。本文提出Unilaw-R1,一种专为法律推理定制的大语言模型。该模型采用轻量化的70亿参数规模,在显著降低部署成本的同时,有效应对法律领域三大核心挑战:法律知识不足、推理逻辑不可靠以及业务泛化能力弱。为解决这些问题,我们首先构建了Unilaw-R1-Data——一个包含1.7万条经蒸馏筛选的思维链(CoT)样本的高质量数据集。基于此,我们采用监督微调(SFT)与强化学习(RL)相结合的两阶段训练策略,显著提升了模型在复杂法律推理任务上的表现,并支持法律AI应用中的可解释决策。为评估法律推理能力,我们还提出了Unilaw-R1-Eval基准测试,专门用于评估模型在单选与多选法律任务上的表现。Unilaw-R1在权威基准测试中展现出强劲性能,超越所有同规模模型,并与参数量大得多的DeepSeek-R1-Distill-Qwen-32B(54.9%)达到相当水平。经过领域特定训练后,该模型在LawBench和LexEval基准上亦取得显著提升,平均超越Qwen-2.5-7B-Instruct(46.6%)6.6个百分点。