The emergence of natural language processing has revolutionized the way users interact with tabular data, enabling a shift from traditional query languages and manual plotting to more intuitive, language-based interfaces. The rise of large language models (LLMs) such as ChatGPT and its successors has further advanced this field, opening new avenues for natural language processing techniques. This survey presents a comprehensive overview of natural language interfaces for tabular data querying and visualization, which allow users to interact with data using natural language queries. We introduce the fundamental concepts and techniques underlying these interfaces with a particular emphasis on semantic parsing, the key technology facilitating the translation from natural language to SQL queries or data visualization commands. We then delve into the recent advancements in Text-to-SQL and Text-to-Vis problems from the perspectives of datasets, methodologies, metrics, and system designs. This includes a deep dive into the influence of LLMs, highlighting their strengths, limitations, and potential for future improvements. Through this survey, we aim to provide a roadmap for researchers and practitioners interested in developing and applying natural language interfaces for data interaction in the era of large language models.
翻译:自然语言处理的出现彻底改变了用户与表格数据交互的方式,实现了从传统查询语言和手动绘图到更直观、基于语言的界面的转变。大型语言模型(LLM),如ChatGPT及其后继者,进一步推动了这一领域的发展,为自然语言处理技术开辟了新途径。本综述全面概述了用于表格数据查询和可视化的自然语言界面,该界面允许用户使用自然语言查询与数据进行交互。我们介绍了这些界面背后的基本概念和技术,特别强调了语义解析,这是促进从自然语言转换为SQL查询或数据可视化命令的关键技术。然后,我们从数据集、方法论、指标和系统设计的角度深入探讨了Text-to-SQL和Text-to-Vis问题的最新进展。这包括对LLM影响的深入分析,突显其优势、局限性以及未来改进的潜力。通过本综述,我们旨在为对在大型语言模型时代开发和应用自然语言界面进行数据交互感兴趣的研究人员和实践者提供路线图。