Unilaw-R1：一种基于强化学习与迭代推理的法律推理大语言模型 (Unilaw-R1: A Large Language Model for Legal Reasoning with Reinforcement Learning and Iterative Inference) - 专知论文

会员服务 ·

0

法律 · 法律推理 · 基准 · 语言模型 · 迭代推理 ·

2025 年 12 月 8 日

Unilaw-R1: A Large Language Model for Legal Reasoning with Reinforcement Learning and Iterative Inference

翻译：Unilaw-R1：一种基于强化学习与迭代推理的法律推理大语言模型

Hua Cai,Shuang Zhao,Liang Zhang,Xuli Shen,Qing Xu,Weilin Shen,Zihao Wen,Tianke Ban

Reasoning-focused large language models (LLMs) are rapidly evolving across various domains, yet their capabilities in handling complex legal problems remains underexplored. In this paper, we introduce Unilaw-R1, a large language model tailored for legal reasoning. With a lightweight 7-billion parameter scale, Unilaw-R1 significantly reduces deployment cost while effectively tackling three core challenges in the legal domain: insufficient legal knowledge, unreliable reasoning logic, and weak business generalization. To address these issues, we first construct Unilaw-R1-Data, a high-quality dataset containing 17K distilled and screened chain-of-thought (CoT) samples. Based on this, we adopt a two-stage training strategy combining Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), which significantly boosts the performance on complex legal reasoning tasks and supports interpretable decision-making in legal AI applications. To assess legal reasoning ability, we also introduce Unilaw-R1-Eval, a dedicated benchmark designed to evaluate models across single- and multi-choice legal tasks. Unilaw-R1 demonstrates strong results on authoritative benchmarks, outperforming all models of similar scale and achieving performance on par with the much larger DeepSeek-R1-Distill-Qwen-32B (54.9%). Following domain-specific training, it also showed significant gains on LawBench and LexEval, exceeding Qwen-2.5-7B-Instruct (46.6%) by an average margin of 6.6%.

翻译：面向推理的大语言模型（LLMs）在各领域快速发展，但其处理复杂法律问题的能力仍待深入探索。本文提出Unilaw-R1，一种专为法律推理定制的大语言模型。该模型采用轻量化的70亿参数规模，在显著降低部署成本的同时，有效应对法律领域三大核心挑战：法律知识不足、推理逻辑不可靠以及业务泛化能力弱。为解决这些问题，我们首先构建了Unilaw-R1-Data——一个包含1.7万条经蒸馏筛选的思维链（CoT）样本的高质量数据集。基于此，我们采用监督微调（SFT）与强化学习（RL）相结合的两阶段训练策略，显著提升了模型在复杂法律推理任务上的表现，并支持法律AI应用中的可解释决策。为评估法律推理能力，我们还提出了Unilaw-R1-Eval基准测试，专门用于评估模型在单选与多选法律任务上的表现。Unilaw-R1在权威基准测试中展现出强劲性能，超越所有同规模模型，并与参数量大得多的DeepSeek-R1-Distill-Qwen-32B（54.9%）达到相当水平。经过领域特定训练后，该模型在LawBench和LexEval基准上亦取得显著提升，平均超越Qwen-2.5-7B-Instruct（46.6%）6.6个百分点。

0

相关内容

法律是国家制定或认可的，由国家强制力保证实施的，以规定权利和义务为内容的具有普遍约束力的社会规范。

DeepSeek模型综述：V1 V2 V3 R1-Zero

DeepSeek模型综述：V1 V2 V3 R1-Zero

专知会员服务

116+阅读 · 2025年2月11日

LLM4SR：关于大规模语言模型在科学研究中的应用综述

LLM4SR：关于大规模语言模型在科学研究中的应用综述

专知会员服务

42+阅读 · 2025年1月9日

【TPAMI2022】关联关系驱动的多模态分类，AF: An Association-based Fusion Method for Multi-Modal Classification

【TPAMI2022】关联关系驱动的多模态分类，AF: An Association-based Fusion Method for Multi-Modal Classification

专知会员服务

27+阅读 · 2022年3月22日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

【KDD 2020 Oral】 TIMME: 应用于推特用户意识形态分类的多任务多关系嵌入模型

【KDD 2020 Oral】 TIMME: 应用于推特用户意识形态分类的多任务多关系嵌入模型

专知会员服务

24+阅读 · 2020年7月15日

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

专知会员服务

195+阅读 · 2020年5月31日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

自回归模型:PixelCNN

自回归模型:PixelCNN

专知会员服务

29+阅读 · 2020年3月21日

【深度图相似学习综述】Deep Graph Similarity Learning: A Survey，29页pdf，117条参考文献

【深度图相似学习综述】Deep Graph Similarity Learning: A Survey，29页pdf，117条参考文献

专知会员服务

98+阅读 · 2019年12月31日

【《Scikit-Learn、Keras与TensorFlow机器学习实用指南(第二版)》电子书与代码(Notebooks)】Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

【《Scikit-Learn、Keras与TensorFlow机器学习实用指南(第二版)》电子书与代码(Notebooks)】Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

专知会员服务

219+阅读 · 2019年12月18日

Distributional Soft Actor-Critic (DSAC)强化学习算法的设计与验证

Distributional Soft Actor-Critic (DSAC)强化学习算法的设计与验证

深度强化学习实验室

19+阅读 · 2020年8月11日

IBM-小样本学习（Few-shot Learning）State of the art 方法及论文讲解

IBM-小样本学习（Few-shot Learning）State of the art 方法及论文讲解

专知

105+阅读 · 2019年4月15日

ECCV2018教程146页《对抗机器学习》PPT教程（附PPT下载）

ECCV2018教程146页《对抗机器学习》PPT教程（附PPT下载）

专知

21+阅读 · 2018年9月7日

【推荐系统论文笔记】DKN: 基于深度知识感知的新闻推荐网络（WWW2018 ）

【推荐系统论文笔记】DKN: 基于深度知识感知的新闻推荐网络（WWW2018 ）

专知

18+阅读 · 2018年4月2日

机器翻译新时代：Facebook 开源无监督机器翻译模型和大规模训练语料

机器翻译新时代：Facebook 开源无监督机器翻译模型和大规模训练语料

机器学习研究会

12+阅读 · 2017年12月24日

斯坦福Jure Leskovec图表示学习：无监督和有监督方法（附PPT下载）

斯坦福Jure Leskovec图表示学习：无监督和有监督方法（附PPT下载）

专知

24+阅读 · 2017年12月17日

VAE、GAN、Info-GAN：全解深度学习三大生成模型

VAE、GAN、Info-GAN：全解深度学习三大生成模型

数据派THU

20+阅读 · 2017年9月23日

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

AI研习社

18+阅读 · 2017年8月31日

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

炼数成金订阅号

26+阅读 · 2017年7月10日

NLP自然语言处理（二）——基础文本分析

NLP自然语言处理（二）——基础文本分析

乐享数据DataScientists

12+阅读 · 2017年2月7日

基于深层特征学习的RGB-D人体行为识别方法

国家自然科学基金

4+阅读 · 2015年12月31日

求解一类公平疏散问题的高性能混合算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于形态和多词的有限语料蒙汉互译调序优化方法

国家自然科学基金

0+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

波动率微笑：隐含信息与动态建模

国家自然科学基金

2+阅读 · 2014年12月31日

面向时空变化的GIS数据模型

国家自然科学基金

6+阅读 · 2014年12月31日

面向汉语文本理解的语义计算方法

国家自然科学基金

8+阅读 · 2014年12月31日

基于组合Hodge理论的图像视频质量评价方法

国家自然科学基金

0+阅读 · 2014年12月31日

基于贝叶斯推理的模糊逻辑强化学习模型研究

国家自然科学基金

18+阅读 · 2012年12月31日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

42+阅读 · 2023年4月19日

A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material

Arxiv

87+阅读 · 2023年4月4日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

499+阅读 · 2023年3月31日

Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

Arxiv

154+阅读 · 2023年3月29日

ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models

Arxiv

64+阅读 · 2023年3月29日

Nature Language Reasoning, A Survey

Arxiv

83+阅读 · 2023年3月26日

Knowledge Graphs: Opportunities and Challenges

Arxiv

181+阅读 · 2023年3月24日

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Arxiv

51+阅读 · 2023年3月22日

A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?

Arxiv

88+阅读 · 2023年3月21日

Data-centric Artificial Intelligence: A Survey

Arxiv

27+阅读 · 2023年3月17日

VIP会员

文章信息

相关主题

相关VIP内容

DeepSeek模型综述：V1 V2 V3 R1-Zero

DeepSeek模型综述：V1 V2 V3 R1-Zero

专知会员服务

116+阅读 · 2025年2月11日

LLM4SR：关于大规模语言模型在科学研究中的应用综述

LLM4SR：关于大规模语言模型在科学研究中的应用综述

专知会员服务

42+阅读 · 2025年1月9日

【TPAMI2022】关联关系驱动的多模态分类，AF: An Association-based Fusion Method for Multi-Modal Classification

【TPAMI2022】关联关系驱动的多模态分类，AF: An Association-based Fusion Method for Multi-Modal Classification

专知会员服务

27+阅读 · 2022年3月22日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

【KDD 2020 Oral】 TIMME: 应用于推特用户意识形态分类的多任务多关系嵌入模型

【KDD 2020 Oral】 TIMME: 应用于推特用户意识形态分类的多任务多关系嵌入模型

专知会员服务

24+阅读 · 2020年7月15日

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

专知会员服务

195+阅读 · 2020年5月31日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

自回归模型:PixelCNN

自回归模型:PixelCNN

专知会员服务

29+阅读 · 2020年3月21日

【深度图相似学习综述】Deep Graph Similarity Learning: A Survey，29页pdf，117条参考文献

【深度图相似学习综述】Deep Graph Similarity Learning: A Survey，29页pdf，117条参考文献

专知会员服务

98+阅读 · 2019年12月31日

【《Scikit-Learn、Keras与TensorFlow机器学习实用指南(第二版)》电子书与代码(Notebooks)】Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

【《Scikit-Learn、Keras与TensorFlow机器学习实用指南(第二版)》电子书与代码(Notebooks)】Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

专知会员服务

219+阅读 · 2019年12月18日

热门VIP内容

开通专知VIP会员享更多权益服务

《无人机与战争：被忽视的环境影响及无人机保护潜力》

俄罗斯规划未来无人机驱动军队

《整合杀伤链：一个用于边缘目标验证与战术推理的零样本框架》最新资料

《人工智能、武器与影响力：前沿模型在模拟核危机中展现复杂推理》2026最新46页报告

相关资讯

Distributional Soft Actor-Critic (DSAC)强化学习算法的设计与验证

Distributional Soft Actor-Critic (DSAC)强化学习算法的设计与验证

深度强化学习实验室

19+阅读 · 2020年8月11日

IBM-小样本学习（Few-shot Learning）State of the art 方法及论文讲解

IBM-小样本学习（Few-shot Learning）State of the art 方法及论文讲解

专知

105+阅读 · 2019年4月15日

ECCV2018教程146页《对抗机器学习》PPT教程（附PPT下载）

ECCV2018教程146页《对抗机器学习》PPT教程（附PPT下载）

专知

21+阅读 · 2018年9月7日

【推荐系统论文笔记】DKN: 基于深度知识感知的新闻推荐网络（WWW2018 ）

【推荐系统论文笔记】DKN: 基于深度知识感知的新闻推荐网络（WWW2018 ）

专知

18+阅读 · 2018年4月2日

机器翻译新时代：Facebook 开源无监督机器翻译模型和大规模训练语料

机器翻译新时代：Facebook 开源无监督机器翻译模型和大规模训练语料

机器学习研究会

12+阅读 · 2017年12月24日

斯坦福Jure Leskovec图表示学习：无监督和有监督方法（附PPT下载）

斯坦福Jure Leskovec图表示学习：无监督和有监督方法（附PPT下载）

专知

24+阅读 · 2017年12月17日

VAE、GAN、Info-GAN：全解深度学习三大生成模型

VAE、GAN、Info-GAN：全解深度学习三大生成模型

数据派THU

20+阅读 · 2017年9月23日

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

AI研习社

18+阅读 · 2017年8月31日

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

炼数成金订阅号

26+阅读 · 2017年7月10日

NLP自然语言处理（二）——基础文本分析

NLP自然语言处理（二）——基础文本分析

乐享数据DataScientists

12+阅读 · 2017年2月7日

相关论文

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

42+阅读 · 2023年4月19日

A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material

Arxiv

87+阅读 · 2023年4月4日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

499+阅读 · 2023年3月31日

Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

Arxiv

154+阅读 · 2023年3月29日

ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models

Arxiv

64+阅读 · 2023年3月29日

Nature Language Reasoning, A Survey

Arxiv

83+阅读 · 2023年3月26日

Knowledge Graphs: Opportunities and Challenges

Arxiv

181+阅读 · 2023年3月24日

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Arxiv

51+阅读 · 2023年3月22日

A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?

Arxiv

88+阅读 · 2023年3月21日

Data-centric Artificial Intelligence: A Survey

Arxiv

27+阅读 · 2023年3月17日

相关基金

基于深层特征学习的RGB-D人体行为识别方法

国家自然科学基金

4+阅读 · 2015年12月31日

求解一类公平疏散问题的高性能混合算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于形态和多词的有限语料蒙汉互译调序优化方法

国家自然科学基金

0+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

波动率微笑：隐含信息与动态建模

国家自然科学基金

2+阅读 · 2014年12月31日

面向时空变化的GIS数据模型

国家自然科学基金

6+阅读 · 2014年12月31日

面向汉语文本理解的语义计算方法

国家自然科学基金

8+阅读 · 2014年12月31日

基于组合Hodge理论的图像视频质量评价方法

国家自然科学基金

0+阅读 · 2014年12月31日

基于贝叶斯推理的模糊逻辑强化学习模型研究

国家自然科学基金

18+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员