IndicDB -- Benchmarking Multilingual Text-to-SQL Capabilities in Indian Languages

While Large Language Models (LLMs) have significantly advanced Text-to-SQL performance, existing benchmarks predominantly focus on Western contexts and simplified schemas, leaving a gap in real-world, non-Western applications. We present IndicDB, a multilingual Text-to-SQL benchmark for evaluating cross-lingual semantic parsing across diverse Indic languages. The relational schemas are sourced from open-data platforms, including the National Data and Analytics Platform (NDAP) and the India Data Portal (IDP), ensuring realistic administrative data complexity. IndicDB comprises 20 databases across 237 tables. To convert denormalized government data into rich relational structures, we employ an iterative three-agent framework (Architect, Auditor, Refiner) to ensure structural rigor and high relational density (11.85 tables per database; join depths up to six). Our pipeline is value-aware, difficulty-calibrated, and join-enforced, generating 15,617 tasks across English, Hindi, and five Indic languages. We evaluate cross-lingual semantic parsing performance of state-of-the-art models (DeepSeek v3.2, MiniMax 2.7, LLaMA 3.3, Qwen3) across seven linguistic variants. Results show a 9.00% performance drop from English to Indic languages, revealing an "Indic Gap" driven by harder schema linking, increased structural ambiguity, and limited external knowledge. IndicDB serves as a rigorous benchmark for multilingual Text-to-SQL. Code and data: https://anonymous.4open.science/r/multilingualText2Sql-Indic--DDCC/

翻译：尽管大型语言模型（LLMs）已显著提升了文本到SQL的性能，但现有基准测试主要聚焦于西方语境和简化模式，在真实世界的非西方应用中存在空白。我们提出IndicDB，这是一个多语言文本到SQL基准测试，用于评估跨多种印度语言的跨语言语义解析能力。其关系模式来源于开放数据平台，包括国家数据与分析平台（NDAP）和印度数据门户（IDP），确保了真实的行政数据复杂性。IndicDB包含20个数据库，覆盖237张表。为将非规范化的政府数据转换为丰富的关联结构，我们采用迭代的三智能体框架（架构师、审计师、优化师），确保结构严谨性与高关系密度（每数据库11.85个表；连接深度达六层）。我们的流水线具备值感知、难度校准与连接增强特性，生成了涵盖英语、印地语及五种印度语言的15，617个任务。我们评估了最先进模型（DeepSeek v3.2、MiniMax 2.7、LLaMA 3.3、Qwen3）在七种语言变体上的跨语言语义解析性能。结果表明，从英语到印度语言的性能下降9.00%，揭示了由更难的模式链接、增加的结构歧义及有限的外部知识驱动的“印度差距”。IndicDB为多语言文本到SQL提供了严格的基准测试。代码和数据：https://anonymous.4open.science/r/multilingualText2Sql-Indic--DDCC/