Recent works considering professional legal-linguistic style (PLLS) texts have shown promising results on the charge prediction task. However, unprofessional users also show an increasing demand on such a prediction service. There is a clear domain discrepancy between PLLS texts and non-PLLS texts expressed by those laypersons, which degrades the current SOTA models' performance on non-PLLS texts. A key challenge is the scarcity of non-PLLS data for most charge classes. This paper proposes a novel few-shot domain adaptation (FSDA) method named Disentangled Legal Content for Charge Prediction (DLCCP). Compared with existing FSDA works, which solely perform instance-level alignment without considering the negative impact of text style information existing in latent features, DLCCP (1) disentangles the content and style representations for better domain-invariant legal content learning with carefully designed optimization goals for content and style spaces and, (2) employs the constitutive elements knowledge of charges to extract and align element-level and instance-level content representations simultaneously. We contribute the first publicly available non-PLLS dataset named NCCP for developing layperson-friendly charge prediction models. Experiments on NCCP show the superiority of our methods over competitive baselines.
翻译:近年,针对专业法律语言风格(PLLS)文本的研究在罪名预测任务上取得了显著成果。然而,非专业用户对此类预测服务的需求日益增长。这些外行人士表达的非专业法律语言风格(非PLLS)文本与PLLS文本之间存在明显的领域差异,导致当前最先进模型在非PLLS文本上的性能下降。一个关键挑战在于大多数罪名类别的非PLLS数据稀缺。本文提出一种名为“面向罪名预测的解耦法律内容”(DLCCP)的新型少样本领域自适应方法。与现有仅进行实例级对齐而未考虑潜在特征中文本风格信息负面影响的少样本领域自适应研究相比,DLCCP(1)通过针对内容空间和风格空间精心设计的优化目标,解耦内容与风格表示,以实现更优的领域不变法律内容学习;(2)利用罪名的构成要件知识,同时提取并对齐要素级与实例级内容表示。我们贡献了首个公开的非PLLS数据集NCCP,用于开发对非专业人士友好的罪名预测模型。在NCCP上的实验表明,我们的方法优于具有竞争力的基线模型。