In many industrial settings, users wish to ask questions in natural language, the answers to which require assembling information from diverse structured data sources. With the advent of Large Language Models (LLMs), applications can now translate natural language questions into a set of API calls or database calls, execute them, and combine the results into an appropriate natural language response. However, these applications remain impractical in realistic industrial settings because they do not cope with the data source heterogeneity that typifies such environments. In this work, we simulate the heterogeneity of real industry settings by introducing two extensions of the popular Spider benchmark dataset that require a combination of database and API calls. Then, we introduce a declarative approach to handling such data heterogeneity and demonstrate that it copes with data source heterogeneity significantly better than state-of-the-art LLM-based agentic or imperative code generation systems. Our augmented benchmarks are available to the research community.
翻译:在许多工业场景中,用户希望通过自然语言提出问题,而回答这些问题需要整合来自不同结构化数据源的信息。随着大语言模型(LLMs)的出现,应用程序现在能够将自然语言问题转化为一系列API调用或数据库查询,执行这些调用,并将结果组合成恰当的自然语言回复。然而,由于这些应用无法应对此类环境中典型的数据源异构性问题,它们在现实的工业场景中仍然缺乏实用性。在本研究中,我们通过引入流行基准数据集Spider的两个扩展版本来模拟真实工业环境的异构性,这些扩展要求结合数据库调用与API调用。随后,我们提出了一种声明式方法来处理此类数据异构问题,并证明其在应对数据源异构性方面显著优于当前最先进的基于LLM的智能体系统或命令式代码生成系统。我们增强后的基准数据集已向研究社区开放。