The Natural Language to SQL (NL2SQL) technology provides non-expert users who are unfamiliar with databases the opportunity to use SQL for data analysis.Converting Natural Language to Business Intelligence (NL2BI) is a popular practical scenario for NL2SQL in actual production systems. Compared to NL2SQL, NL2BI introduces more challenges. In this paper, we propose ChatBI, a comprehensive and efficient technology for solving the NL2BI task. First, we analyze the interaction mode, an important module where NL2SQL and NL2BI differ in use, and design a smaller and cheaper model to match this interaction mode. In BI scenarios, tables contain a huge number of columns, making it impossible for existing NL2SQL methods that rely on Large Language Models (LLMs) for schema linking to proceed due to token limitations. The higher proportion of ambiguous columns in BI scenarios also makes schema linking difficult. ChatBI combines existing view technology in the database community to first decompose the schema linking problem into a Single View Selection problem and then uses a smaller and cheaper machine learning model to select the single view with a significantly reduced number of columns. The columns of this single view are then passed as the required columns for schema linking into the LLM. Finally, ChatBI proposes a phased process flow different from existing process flows, which allows ChatBI to generate SQL containing complex semantics and comparison relations more accurately. We have deployed ChatBI on Baidu's data platform and integrated it into multiple product lines for large-scale production task evaluation. The obtained results highlight its superiority in practicality, versatility, and efficiency. At the same time, compared with the current mainstream NL2SQL technology under our real BI scenario data tables and queries, it also achieved the best results.
翻译:自然语言转SQL(NL2SQL)技术为不熟悉数据库的非专业用户提供了使用SQL进行数据分析的途径。自然语言转商业智能(NL2BI)是NL2SQL在实际生产系统中的热门应用场景。与NL2SQL相比,NL2BI引入了更多挑战。本文提出ChatBI,一种全面高效的NL2BI任务解决方案。首先,我们分析了交互模式这一关键模块——NL2SQL与NL2BI在应用中的差异点,并设计了一个更小、更经济的模型来匹配该交互模式。在BI场景中,表格包含海量列,使得依赖大语言模型(LLM)进行模式链接的现有NL2SQL方法因令牌限制而无法进行。BI场景中更高比例的歧义列也使模式链接变得困难。ChatBI结合数据库社区现有的视图技术,首先将模式链接问题分解为单视图选择问题,然后使用更小、更经济的机器学习模型选择列数量显著减少的单视图。该单视图的列随后作为模式链接所需的列传入LLM。最后,ChatBI提出一种不同于现有流程的分阶段处理流程,使其能够更准确地生成包含复杂语义和比较关系的SQL。我们已将ChatBI部署在百度数据平台上,并集成到多个产品线中进行大规模生产任务评估。结果突显了其在实用性、通用性和效率方面的优越性。同时,在真实的BI场景数据表和查询下,与当前主流NL2SQL技术相比,它也取得了最佳效果。