Automatic legal judgment prediction and its explanation suffer from the problem of long case documents exceeding tens of thousands of words, in general, and having a non-uniform structure. Predicting judgments from such documents and extracting their explanation becomes a challenging task, more so on documents with no structural annotation. We define this problem as "scarce annotated legal documents" and explore their lack of structural information and their long lengths with a deep-learning-based classification framework which we call MESc; "Multi-stage Encoder-based Supervised with-clustering"; for judgment prediction. We explore the adaptability of LLMs with multi-billion parameters (GPT-Neo, and GPT-J) to legal texts and their intra-domain(legal) transfer learning capacity. Alongside this, we compare their performance and adaptability with MESc and the impact of combining embeddings from their last layers. For such hierarchical models, we also propose an explanation extraction algorithm named ORSE; Occlusion sensitivity-based Relevant Sentence Extractor; based on the input-occlusion sensitivity of the model, to explain the predictions with the most relevant sentences from the document. We explore these methods and test their effectiveness with extensive experiments and ablation studies on legal documents from India, the European Union, and the United States with the ILDC dataset and a subset of the LexGLUE dataset. MESc achieves a minimum total performance gain of approximately 2 points over previous state-of-the-art proposed methods, while ORSE applied on MESc achieves a total average gain of 50% over the baseline explainability scores.
翻译:自动法律判决预测及其解释普遍面临案件文档过长(通常超过数万字)且结构非均匀的问题。从此类文档中预测判决并提取解释成为一项具有挑战性的任务,对于缺乏结构标注的文档尤为困难。我们将此问题定义为"标注稀缺的法律文档",并针对判决预测任务,提出一种基于深度学习的分类框架MESc("基于多阶段编码器的监督聚类框架"),以探究此类文档结构信息缺失与篇幅过长的特性。我们研究了具有数十亿参数的大型语言模型(GPT-Neo与GPT-J)对法律文本的适应性及其在法律领域内的迁移学习能力。同时,我们比较了这些模型与MESc的性能及适应性,并分析了融合其末层嵌入向量的影响。针对此类分层模型,我们进一步提出名为ORSE("基于遮挡敏感性的相关语句提取器")的解释提取算法,该算法依据模型的输入遮挡敏感性,从文档中提取最相关语句以解释预测结果。我们通过大量实验与消融研究,在印度、欧盟和美国的法律文档(采用ILDC数据集及LexGLUE数据子集)上验证了这些方法的有效性。实验表明,MESc相比先前最优方法至少获得约2个百分点的综合性能提升,而应用于MESc的ORSE算法在可解释性评分上相比基线实现了50%的平均综合提升。