Automatic legal judgment prediction and its explanation suffer from the problem of long case documents exceeding tens of thousands of words, in general, and having a non-uniform structure. Predicting judgments from such documents and extracting their explanation becomes a challenging task, more so on documents with no structural annotation. We define this problem as "scarce annotated legal documents" and explore their lack of structural information and their long lengths with a deep-learning-based classification framework which we call MESc; "Multi-stage Encoder-based Supervised with-clustering"; for judgment prediction. We explore the adaptability of LLMs with multi-billion parameters (GPT-Neo, and GPT-J) to legal texts and their intra-domain(legal) transfer learning capacity. Alongside this, we compare their performance and adaptability with MESc and the impact of combining embeddings from their last layers. For such hierarchical models, we also propose an explanation extraction algorithm named ORSE; Occlusion sensitivity-based Relevant Sentence Extractor; based on the input-occlusion sensitivity of the model, to explain the predictions with the most relevant sentences from the document. We explore these methods and test their effectiveness with extensive experiments and ablation studies on legal documents from India, the European Union, and the United States with the ILDC dataset and a subset of the LexGLUE dataset. MESc achieves a minimum total performance gain of approximately 2 points over previous state-of-the-art proposed methods, while ORSE applied on MESc achieves a total average gain of 50% over the baseline explainability scores.
翻译:自动法律判决预测及其解释面临两个主要问题:案件文档普遍超过数万字且结构不统一。从这类文档中预测判决并提取解释是一项具有挑战性的任务,尤其是在缺乏结构标注的文档上。我们将该问题定义为"稀缺标注法律文档",并探索其结构信息缺失与文档长度超长特性,提出基于深度学习的分类框架MESc(多阶段编码器监督聚类框架)用于判决预测。我们研究具有数十亿参数的大语言模型(GPT-Neo和GPT-J)在法律文本上的适应性及其领域内(法律)迁移学习能力,同时比较其与MESc的性能表现以及结合它们最后层嵌入的影响。针对此类分层模型,我们提出基于输入遮挡敏感度的解释提取算法ORSE(基于遮挡敏感度的相关句子提取器),用于从文档中提取最相关的句子来解释预测结果。通过在印度、欧盟和美国法律文档上的ILDC数据集及LexGLUE数据集子集上开展大量实验和消融研究,验证了这些方法的有效性。MESc相比现有最优方法实现了至少2个百分点的总性能提升,而将ORSE应用于MESc后,基线可解释性分数平均提升了50%。