We introduce IndiaFinBench, to our knowledge the first publicly available evaluation benchmark for assessing large language model (LLM) performance on Indian financial regulatory text. Existing financial NLP benchmarks draw exclusively from Western financial corpora (SEC filings, US earnings reports, and English-language financial news), leaving a significant gap in coverage of non-Western regulatory frameworks. IndiaFinBench addresses this gap with 406 expert-annotated question-answer pairs drawn from 192 documents sourced from the Securities and Exchange Board of India (SEBI) and the Reserve Bank of India (RBI), spanning four task types: regulatory interpretation (174 items), numerical reasoning (92 items), contradiction detection (62 items), and temporal reasoning (78 items). Annotation quality is validated through a model-based secondary pass (kappa=0.918 on contradiction detection) and a 60-item human inter-annotator agreement evaluation (kappa=0.611; 76.7% overall agreement). We evaluate twelve models under zero-shot conditions, with accuracy ranging from 70.4% (Gemma 4 E4B) to 89.7% (Gemini 2.5 Flash). All models substantially outperform a non-specialist human baseline of 60.0%. Numerical reasoning is the most discriminative task, with a 35.9 percentage-point spread across models. Bootstrap significance testing (10,000 resamples) reveals three statistically distinct performance tiers. The dataset, evaluation code, and all model outputs are available at https://github.com/rajveerpall/IndiaFinBench
翻译:我们提出了IndiaFinBench,据我们所知,这是首个公开可用的、用于评估大语言模型(LLM)在印度金融监管文本上性能的基准测试。现有的金融NLP基准测试仅来源于西方金融语料库(美国证券交易委员会(SEC)文件、美国收益报告和英文金融新闻),在非西方监管框架的覆盖上存在显著空白。IndiaFinBench通过从印度证券交易委员会(SEBI)和印度储备银行(RBI)的192份文件中提取406个专家标注的问答对填补了这一空白,涵盖四种任务类型:监管解释(174项)、数值推理(92项)、矛盾检测(62项)和时间推理(78项)。标注质量通过基于模型的二次校验(矛盾检测Kappa=0.918)和60项人工标注者间一致性评估(Kappa=0.611;总体一致性76.7%)得到验证。我们在零样本条件下评估了十二个模型,准确率从70.4%(Gemma 4 E4B)到89.7%(Gemini 2.5 Flash)不等。所有模型均大幅优于60.0%的非专业人工基线。数值推理是最具区分度的任务,模型间准确率差达35.9个百分点。Bootstrap显著性检验(10,000次重采样)揭示了三个统计上显著的性能层级。数据集、评估代码及所有模型输出均可在https://github.com/rajveerpall/IndiaFinBench 获取。