This study investigates judgment prediction in a realistic scenario within the context of Indian judgments, utilizing a range of transformer-based models, including InLegalBERT, BERT, and XLNet, alongside LLMs such as Llama-2 and GPT-3.5 Turbo. In this realistic scenario, we simulate how judgments are predicted at the point when a case is presented for a decision in court, using only the information available at that time, such as the facts of the case, statutes, precedents, and arguments. This approach mimics real-world conditions, where decisions must be made without the benefit of hindsight, unlike retrospective analyses often found in previous studies. For transformer models, we experiment with hierarchical transformers and the summarization of judgment facts to optimize input for these models. Our experiments with LLMs reveal that GPT-3.5 Turbo excels in realistic scenarios, demonstrating robust performance in judgment prediction. Furthermore, incorporating additional legal information, such as statutes and precedents, significantly improves the outcome of the prediction task. The LLMs also provide explanations for their predictions. To evaluate the quality of these predictions and explanations, we introduce two human evaluation metrics: Clarity and Linking. Our findings from both automatic and human evaluations indicate that, despite advancements in LLMs, they are yet to achieve expert-level performance in judgment prediction and explanation tasks.
翻译:本研究在印度判决背景下,探讨现实场景中的判决预测问题,使用了包括InLegalBERT、BERT和XLNet在内的多种基于Transformer的模型,以及Llama-2和GPT-3.5 Turbo等大语言模型。在此现实场景中,我们模拟案件提交法庭裁决时的判决预测过程,仅利用当时可获取的信息——如案件事实、法规、判例和辩论要点。这种方法模拟了现实世界条件,即决策必须在缺乏事后认知的情况下作出,这与以往研究中常见的回顾性分析形成鲜明对比。对于Transformer模型,我们尝试采用分层Transformer架构并对判决事实进行摘要处理,以优化模型输入。我们的大语言模型实验表明,GPT-3.5 Turbo在现实场景中表现卓越,展现出稳健的判决预测能力。此外,引入法规和判例等额外法律信息能显著提升预测任务的效果。大语言模型还能为其预测提供解释。为评估这些预测和解释的质量,我们引入了两个人工评估指标:清晰度与关联性。通过自动评估和人工评估的结果表明,尽管大语言模型取得进展,但在判决预测和解释任务方面尚未达到专家级水平。