While recent advances in large language models have significantly improved Text-to-SQL and table question answering systems, most existing approaches assume that all query-relevant information is explicitly represented in structured schemas. In practice, many enterprise databases contain hybrid schemas where structured attributes coexist with free-form textual fields, requiring systems to reason over both types of information. To address this challenge, we introduce OmniTQA, a cost-aware hybrid query processing framework that operates over both structured and semi-structured data. OmniTQA treats semantic reasoning as a first-class query operator, seamlessly integrating LLM-based semantic operations with classical relational operators into an executable directed acyclic graph. To manage the high latency and cost of LLM inference, it extends classical query optimization with data-aware planning, combining atomic query decomposition and operator reordering to minimize semantic workload. The framework also features a dual-engine execution architecture that dynamically routes tasks between a relational database and an LLM module, using operator-aware batching to scale efficiently. Extensive experiments across a diverse suite of structured and semi-structured table question answering benchmarks demonstrate that OmniTQA consistently outperforms existing symbolic, semantic, and hybrid baselines in both accuracy and cost efficiency. These gains are particularly pronounced for complex queries, large tables and multi-relation schemas.
翻译:尽管大型语言模型的最新进展显著提升了文本到SQL及表格问答系统的性能,但现有方法大多假设所有与查询相关的信息均显式表达于结构化模式中。然而在实际应用中,许多企业数据库包含结构化属性与自由文本字段共存的混合模式,要求系统能同时推理两类信息。为应对这一挑战,我们提出OmniTQA——一种面向结构化与半结构化数据的成本感知混合查询处理框架。该框架将语义推理视为一等查询算子,将基于LLM的语义操作与经典关系算子无缝集成为可执行的有向无环图。为管理LLM推理的高延迟与成本,OmniTQA通过数据感知规划扩展经典查询优化,结合原子查询分解与算子重排序以最小化语义计算负载。该框架还采用双引擎执行架构,通过算子感知批处理在关系数据库与LLM模块间动态路由任务,实现高效扩展。在涵盖结构化与半结构化表格问答基准的多样化实验套件中,OmniTQA在准确性与成本效率上持续优于现有符号化、语义及混合基线方法,尤其在复杂查询、大规模表格及多关系模式下表现显著更优。