We address the task of routing natural language queries in multi-database enterprise environments. We construct realistic benchmarks by extending existing NL-to-SQL datasets. Our study shows that routing becomes increasingly challenging with larger, domain-overlapping DB repositories and ambiguous queries, motivating the need for more structured and robust reasoning-based solutions. By explicitly modelling schema coverage, structural connectivity, and fine-grained semantic alignment, the proposed modular, reasoning-driven reranking strategy consistently outperforms embedding-only and direct LLM-prompting baselines across all the metrics.
翻译:本文研究多数据库企业环境中自然语言查询的路由任务。我们通过扩展现有的自然语言转SQL数据集构建了真实场景的基准测试。研究表明,随着数据库库规模增大、领域重叠度提高以及查询歧义性增强,路由任务变得日益复杂,这凸显了对更结构化、更稳健的基于推理的解决方案的需求。通过显式建模模式覆盖度、结构连通性和细粒度语义对齐,所提出的模块化推理驱动重排序策略在所有评估指标上均持续优于纯嵌入方法和直接LLM提示基线。