ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation

Legal claims refer to the plaintiff's demands in a case and are essential to guiding judicial reasoning and case resolution. While many works have focused on improving the efficiency of legal professionals, the research on helping non-professionals (e.g., plaintiffs) remains unexplored. This paper explores the problem of legal claim generation based on the given case's facts. First, we construct ClaimGen-CN, the first dataset for Chinese legal claim generation task, from various real-world legal disputes. Additionally, we design an evaluation metric tailored for assessing the generated claims, which encompasses two essential dimensions: factuality and clarity. Building on this, we conduct a comprehensive zero-shot evaluation of state-of-the-art general and legal-domain large language models. Our findings highlight the limitations of the current models in factual precision and expressive clarity, pointing to the need for more targeted development in this domain. To encourage further exploration of this important task, we will make the dataset publicly available.

翻译：法律诉求指案件中原告的主张，对于引导司法推理与案件解决至关重要。尽管已有许多研究致力于提升法律专业人士的工作效率，但针对帮助非专业人士（如原告）的相关探索仍属空白。本文基于给定案件事实，探讨法律诉求的自动生成问题。首先，我们从多类现实法律纠纷中构建了ClaimGen-CN——首个面向中文法律诉求生成任务的数据集。此外，我们设计了专门用于评估生成诉求的指标，涵盖事实性与清晰度两个核心维度。在此基础上，我们对前沿的通用大语言模型及法律领域大语言模型进行了全面的零样本评估。研究结果揭示了当前模型在事实准确性与表达清晰度方面的局限性，表明该领域需要更具针对性的开发。为促进这一重要任务的进一步探索，我们将公开该数据集。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日