Are LLMs Court-Ready? Evaluating Frontier Models on Indian Legal Reasoning

Large language models (LLMs) are entering legal workflows, yet we lack a jurisdiction-specific framework to assess their baseline competence therein. We use India's public legal examinations as a transparent proxy. Our multi-year benchmark assembles objective screens from top national and state exams and evaluates open and frontier LLMs under real-world exam conditions. To probe beyond multiple-choice questions, we also include a lawyer-graded, paired-blinded study of long-form answers from the Supreme Court's Advocate-on-Record exam. This is, to our knowledge, the first exam-grounded, India-specific yardstick for LLM court-readiness released with datasets and protocols. Our work shows that while frontier systems consistently clear historical cutoffs and often match or exceed recent top-scorer bands on objective exams, none surpasses the human topper on long-form reasoning. Grader notes converge on three reliability failure modes: procedural or format compliance, authority or citation discipline, and forum-appropriate voice and structure. These findings delineate where LLMs can assist (checks, cross-statute consistency, statute and precedent lookups) and where human leadership remains essential: forum-specific drafting and filing, procedural and relief strategy, reconciling authorities and exceptions, and ethical, accountable judgment.

翻译：大型语言模型（LLMs）正逐步进入法律工作流程，但目前尚缺乏针对特定司法管辖区的基准能力评估框架。本研究以印度公开法律考试作为透明化评估基准，通过整合国家级与邦级顶尖考试的客观筛选环节，构建了跨年度的评测体系，并在真实考试环境下对开源及前沿LLMs进行系统评估。为突破多项选择题的局限，本研究还引入了律师双盲评阅机制，对印度最高法院"案卷律师资格考试"中的长文本答案进行分级评估。据我们所知，这是首个基于真实考试、针对印度司法场景的LLMs法庭适用性评估基准，并同步公开了数据集与评估协议。研究表明：前沿模型在客观考试中能稳定达到历史及格线，且常达到或超越近年高分区间，但在长文本推理任务中均未超越人类最高水平。评阅意见集中指出三类可靠性缺陷：程序或格式合规性、法律依据或引证规范性、以及法庭场景适配的语体与结构。这些发现明确了LLMs的适用边界：其可辅助完成法律检索、跨法条一致性核查、法规与判例查询等工作，但在法庭文书起草与呈递、程序与救济策略制定、法律依据与例外条款的协调、以及需要伦理考量与责任认定的司法判断等领域，仍需人类专业主导。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/