Beyond Guilt: Legal Judgment Prediction with Trichotomous Reasoning

In legal practice, judges apply the trichotomous dogmatics of criminal law, sequentially assessing the elements of the offense, unlawfulness, and culpability to determine whether an individual's conduct constitutes a crime. Although current legal large language models (LLMs) show promising accuracy in judgment prediction, they lack trichotomous reasoning capabilities due to the absence of an appropriate benchmark dataset, preventing them from predicting innocent outcomes. As a result, every input is automatically assigned a charge, limiting their practical utility in legal contexts. To bridge this gap, we introduce LJPIV, the first benchmark dataset for Legal Judgment Prediction with Innocent Verdicts. Adhering to the trichotomous dogmatics, we extend three widely-used legal datasets through LLM-based augmentation and manual verification. Our experiments with state-of-the-art legal LLMs and novel strategies that integrate trichotomous reasoning into zero-shot prompting and fine-tuning reveal: (1) current legal LLMs have significant room for improvement, with even the best models achieving an F1 score of less than 0.3 on LJPIV; and (2) our strategies notably enhance both in-domain and cross-domain judgment prediction accuracy, especially for cases resulting in an innocent verdict.

翻译：在法律实践中，法官运用刑法的三阶层犯罪论体系，依次审查构成要件该当性、违法性与有责性，以判定个体行为是否构成犯罪。尽管当前的法律大语言模型在判决预测中展现出可观的准确率，但由于缺乏合适的基准数据集，这些模型不具备三阶层推理能力，无法预测无罪判决结果。因此，所有输入均被自动判定有罪，这限制了其在法律实务中的应用价值。为填补这一空白，我们提出了LJPIV——首个包含无罪判决的法律判决预测基准数据集。遵循三阶层犯罪论体系，我们通过基于大语言模型的数据增强与人工校验，扩展了三个广泛使用的法律数据集。通过对前沿法律大语言模型的实验，以及将三阶层推理融入零样本提示与微调过程的新策略，我们发现：（1）当前法律大语言模型仍有显著改进空间，即使在LJPIV数据集上表现最佳的模型，其F1分数也低于0.3；（2）我们提出的策略显著提升了模型在领域内与跨领域的判决预测准确率，尤其对于最终判处无罪的案件。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日