DynaQuery: A Self-Adapting Framework for Querying Structured and Multimodal Data

from arxiv, 15 pages, 2 figures, 10 tables. Source code and experimental artifacts are available at: https://github.com/aymanehassini/DynaQuery . The 'DynaQuery-Eval-5K' benchmark, introduced in this work, is also publicly available at: https://www.kaggle.com/datasets/aymanehassini/dynaquery-eval-5k-benchmark

The rise of Large Language Models (LLMs) has accelerated the long-standing goal of enabling natural language querying over complex, hybrid databases. Yet, this ambition exposes a dual challenge: reasoning jointly over structured, multi-relational schemas and the semantic content of linked unstructured assets. To overcome this, we present DynaQuery - a unified, self-adapting framework that serves as a practical blueprint for next-generation "Unbound Databases." At the heart of DynaQuery lies the Schema Introspection and Linking Engine (SILE), a novel systems primitive that elevates schema linking to a first-class query planning phase. We conduct a rigorous, multi-benchmark empirical evaluation of this structure-aware architecture against the prevalent unstructured Retrieval-Augmented Generation (RAG) paradigm. Our results demonstrate that the unstructured retrieval paradigm is architecturally susceptible to catastrophic contextual failures, such as SCHEMA_HALLUCINATION, leading to unreliable query generation. In contrast, our SILE-based design establishes a substantially more robust foundation, nearly eliminating this failure mode. Moreover, end-to-end validation on a complex, newly curated benchmark uncovers a key generalization principle: the transition from pure schema-awareness to holistic semantics-awareness. Taken together, our findings provide a validated architectural basis for developing natural language database interfaces that are robust, adaptable, and predictably consistent.

翻译：大型语言模型（LLM）的兴起加速了实现复杂混合数据库自然语言查询的长期目标。然而，这一愿景揭示了一个双重挑战：如何对结构化多关系模式与关联非结构化资产的语义内容进行联合推理。为克服此挑战，我们提出了DynaQuery——一个统一的自适应框架，可作为下一代“无界数据库”的实用蓝图。DynaQuery的核心是模式内省与链接引擎（SILE），这是一种新颖的系统原语，将模式链接提升为一类查询规划阶段。我们通过严谨的多基准实证评估，将此结构感知架构与主流的非结构化检索增强生成（RAG）范式进行比较。实验结果表明，非结构化检索范式在架构上易受灾难性上下文故障（如SCHEMA_HALLUCINATION）的影响，导致查询生成不可靠。相比之下，我们基于SILE的设计建立了显著更鲁棒的基础，几乎消除了此类故障模式。此外，在一个复杂且新构建的基准上进行端到端验证，揭示了一个关键泛化原则：从纯模式感知到整体语义感知的转变。综上所述，我们的研究为开发鲁棒、自适应且可预测一致的自然语言数据库接口提供了经过验证的架构基础。