In jurisdictions like India, where courts face an extensive backlog of cases, artificial intelligence offers transformative potential for legal judgment prediction. A critical subset of this backlog comprises appellate cases, which are formal decisions issued by higher courts reviewing the rulings of lower courts. To this end, we present Vichara, a novel framework tailored to the Indian judicial system that predicts and explains appellate judgments. Vichara processes English-language appellate case proceeding documents and decomposes them into decision points. Decision points are discrete legal determinations that encapsulate the legal issue, deciding authority, outcome, reasoning, and temporal context. The structured representation isolates the core determinations and their context, enabling accurate predictions and interpretable explanations. Vichara's explanations follow a structured format inspired by the IRAC (Issue-Rule-Application-Conclusion) framework and adapted for Indian legal reasoning. This enhances interpretability, allowing legal professionals to assess the soundness of predictions efficiently. We evaluate Vichara on two datasets, PredEx and the expert-annotated subset of the Indian Legal Documents Corpus (ILDC_expert), using four large language models: GPT-4o mini, Llama-3.1-8B, Mistral-7B, and Qwen2.5-7B. Vichara surpasses existing judgment prediction benchmarks on both datasets, with GPT-4o mini achieving the highest performance (F1: 81.5 on PredEx, 80.3 on ILDC_expert), followed by Llama-3.1-8B. Human evaluation of the generated explanations across Clarity, Linking, and Usefulness metrics highlights GPT-4o mini's superior interpretability.
翻译:在印度等司法管辖区,法院面临着大量积压案件,人工智能为法律判决预测带来了变革性潜力。这些积压案件中的一个关键子集是上诉案件,即上级法院审查下级法院裁决后作出的正式决定。为此,我们提出了Vichara,一个专为印度司法系统定制的新颖框架,用于预测和解释上诉判决。Vichara处理英文上诉案件审理文件,并将其分解为判决要点。判决要点是离散的法律判定,封装了法律问题、裁决机关、结果、推理和时间背景。这种结构化表征分离了核心判定及其背景,从而实现了准确的预测和可解释的说明。Vichara的解释遵循一种结构化格式,其灵感来源于IRAC(问题-规则-适用-结论)框架,并针对印度法律推理进行了调整。这增强了可解释性,使法律专业人员能够有效评估预测的合理性。我们在两个数据集(PredEx和印度法律文档语料库的专家标注子集ILDC_expert)上使用四种大语言模型(GPT-4o mini、Llama-3.1-8B、Mistral-7B和Qwen2.5-7B)评估了Vichara。Vichara在两个数据集上均超越了现有的判决预测基准,其中GPT-4o mini取得了最高性能(PredEx上F1分数:81.5,ILDC_expert上F1分数:80.3),其次是Llama-3.1-8B。在清晰度、关联性和实用性指标上对生成解释的人工评估突显了GPT-4o mini卓越的可解释性。