Automatic legal judgment prediction and its explanation suffer from the problem of long case documents exceeding tens of thousands of words, in general, and having a non-uniform structure. Predicting judgments from such documents and extracting their explanation becomes a challenging task, more so on documents with no structural annotation. We define this problem as "scarce annotated legal documents" and explore their lack of structural information and their long lengths with a deep learning-based classification framework which we call MESc; "Multi-stage Encoder-based Supervised with-clustering"; for judgment prediction. Specifically, we divide a document into parts to extract their embeddings from the last four layers of a custom fine-tuned Large Language Model, and try to approximate their structure through unsupervised clustering. Which we use in another set of transformer encoder layers to learn the inter-chunk representations. We explore the adaptability of LLMs with multi-billion parameters (GPT-Neo, and GPT-J) to legal texts and their intra-domain(legal) transfer learning capacity. Alongside this, we compare their performance with MESc and the impact of combining embeddings from their last layers. For such hierarchical models, we also propose an explanation extraction algorithm named ORSE; Occlusion sensitivity-based Relevant Sentence Extractor;
翻译:自动法律判决预测及其解释面临一般案件文档长度超过数万词且结构不统一的难题。从这类文档中预测判决并提取其解释成为一项具有挑战性的任务,尤其是对于缺乏结构标注的文档。我们将此问题定义为"稀缺标注法律文档",并探索其结构信息缺失与超长篇幅特性,提出基于深度学习的分类框架MESc(多阶段编码器监督聚类),用于判决预测。具体而言,我们将文档划分为多个片段,从定制微调的大型语言模型最后四层提取其嵌入向量,并通过无监督聚类近似文档结构。该聚类结果被用于另一组Transformer编码器层,以学习跨片段表示。我们研究了具备数百亿参数的大型语言模型(GPT-Neo和GPT-J)对法律文本的适应性及其域内(法律)迁移学习能力,同时对比了这些模型与MESc的性能差异,以及组合其最后层级嵌入向量的影响。针对此类层级化模型,我们还提出了一种解释提取算法ORSE(基于遮挡敏感度的相关句子提取器)。