Legal systems worldwide are inundated with exponential growth in cases and documents. There is an imminent need to develop NLP and ML techniques for automatically processing and understanding legal documents to streamline the legal system. However, evaluating and comparing various NLP models designed specifically for the legal domain is challenging. This paper addresses this challenge by proposing IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning. IL-TUR contains monolingual (English, Hindi) and multi-lingual (9 Indian languages) domain-specific tasks that address different aspects of the legal system from the point of view of understanding and reasoning over Indian legal documents. We present baseline models (including LLM-based) for each task, outlining the gap between models and the ground truth. To foster further research in the legal domain, we create a leaderboard (available at: https://exploration-lab.github.io/IL-TUR/) where the research community can upload and compare legal text understanding systems.
翻译:全球法律系统正面临着案件与文档数量的指数级增长。为简化法律流程,开发用于自动处理和理解法律文档的自然语言处理与机器学习技术已迫在眉睫。然而,对专门针对法律领域设计的各类自然语言处理模型进行评估和比较仍具挑战性。本文通过提出IL-TUR:印度法律文本理解与推理基准来应对这一挑战。IL-TUR包含单语(英语、印地语)及多语(9种印度语言)的领域特定任务,这些任务从理解和推理印度法律文档的角度出发,涵盖了法律系统的不同层面。我们为每项任务提供了基线模型(包括基于大语言模型的方案),并阐明了模型与真实情况之间的差距。为促进法律领域的进一步研究,我们创建了一个排行榜(访问地址:https://exploration-lab.github.io/IL-TUR/),研究社区可在此上传并比较法律文本理解系统。