TathyaNyaya与FactLegalLlama：在印度法律语境中推进事实判决预测与解释 (TathyaNyaya and FactLegalLlama: Advancing Factual Judgment Prediction and Explanation in the Indian Legal Context)

In the landscape of Fact-based Judgment Prediction and Explanation (FJPE), reliance on factual data is essential for developing robust and realistic AI-driven decision-making tools. This paper introduces TathyaNyaya, the largest annotated dataset for FJPE tailored to the Indian legal context, encompassing judgments from the Supreme Court of India and various High Courts. Derived from the Hindi terms "Tathya" (fact) and "Nyaya" (justice), the TathyaNyaya dataset is uniquely designed to focus on factual statements rather than complete legal texts, reflecting real-world judicial processes where factual data drives outcomes. Complementing this dataset, we present FactLegalLlama, an instruction-tuned variant of the LLaMa-3-8B Large Language Model (LLM), optimized for generating high-quality explanations in FJPE tasks. Finetuned on the factual data in TathyaNyaya, FactLegalLlama integrates predictive accuracy with coherent, contextually relevant explanations, addressing the critical need for transparency and interpretability in AI-assisted legal systems. Our methodology combines transformers for binary judgment prediction with FactLegalLlama for explanation generation, creating a robust framework for advancing FJPE in the Indian legal domain. TathyaNyaya not only surpasses existing datasets in scale and diversity but also establishes a benchmark for building explainable AI systems in legal analysis. The findings underscore the importance of factual precision and domain-specific tuning in enhancing predictive performance and interpretability, positioning TathyaNyaya and FactLegalLlama as foundational resources for AI-assisted legal decision-making.

翻译：在基于事实的判决预测与解释（FJPE）领域，依赖事实数据对于开发稳健且现实的AI驱动决策工具至关重要。本文介绍了TathyaNyaya，这是针对印度法律语境定制的最大规模FJPE标注数据集，涵盖印度最高法院及各高等法院的判决。该数据集名称源自印地语词汇“Tathya”（事实）与“Nyaya”（正义），其独特设计聚焦于事实陈述而非完整法律文本，反映了现实司法过程中事实数据驱动判决的特点。作为数据集的补充，我们提出了FactLegalLlama——基于LLaMa-3-8B大语言模型（LLM）进行指令微调的变体，专门针对FJPE任务中的高质量解释生成进行优化。通过在TathyaNyaya事实数据上的微调，FactLegalLlama将预测准确性与连贯、语境相关的解释相结合，应对了AI辅助法律系统中对透明度和可解释性的关键需求。我们的方法结合了用于二元判决预测的Transformer模型与用于解释生成的FactLegalLlama，构建了推进印度法律领域FJPE研究的稳健框架。TathyaNyaya不仅在规模和多样性上超越现有数据集，更为法律分析中可解释AI系统的建立设立了基准。研究结果强调了事实精确性和领域特定调优对提升预测性能与可解释性的重要性，使TathyaNyaya与FactLegalLlama成为AI辅助法律决策的基础性资源。