Large language models have significantly improved natural language interfaces to databases by translating user questions into executable queries. In particular, Text2Cypher focuses on generating Cypher queries for graph databases, enabling users to access graph data without query language expertise. Most existing Text2Cypher systems assume a single preselected graph database, where queries are generated over a known schema. However, real-world systems are often distributed across multiple independent graph databases organized by domain or system boundaries, where relevant information may span multiple sources. To address this limitation, we propose a shift from single-database query generation to multi-database query reasoning. Instead of assuming a fixed execution context, the system must reason about (i) relevant databases, (ii) how to decompose a question across them, and (iii) how to integrate partial results. We formalize this setting through a three-phase roadmap: database routing, multi-database decomposition, and heterogeneous query reasoning across database types and query languages. This work provides a structured formulation of multi-database reasoning for Text2Cypher and identifies challenges in source selection, query decomposition, and result integration, aiming to support more realistic and scalable natural language interfaces to graph databases.
翻译:大语言模型通过将用户问题转换为可执行查询,显著改善了数据库的自然语言接口。其中,Text2Cypher专注于为图数据库生成Cypher查询,使用户无需掌握查询语言即可访问图数据。现有Text2Cypher系统大多假设单个预设的图数据库,查询基于已知模式生成。然而,真实系统通常分布在多个按领域或系统边界组织的独立图数据库中,相关信息可能跨越多个数据源。为解决这一局限,我们提出从单数据库查询生成转向多数据库查询推理。系统不再假设固定的执行上下文,而是必须推理:(i)相关数据库,(ii)如何跨数据库分解问题,以及(iii)如何整合部分结果。我们通过三阶段路线图形式化这一场景:数据库路由、多数据库分解以及跨数据库类型与查询语言的异构查询推理。本工作为Text2Cypher提供了多数据库推理的结构化表述,并指出了源选择、查询分解和结果整合中的挑战,旨在支持更真实、可扩展的图数据库自然语言接口。