The emergence of natural language processing has revolutionized the way users interact with tabular data, enabling a shift from traditional query languages and manual plotting to more intuitive, language-based interfaces. The rise of large language models (LLMs) such as ChatGPT and its successors has further advanced this field, opening new avenues for natural language processing techniques. This survey presents a comprehensive overview of natural language interfaces for tabular data querying and visualization, which allow users to interact with data using natural language queries. We introduce the fundamental concepts and techniques underlying these interfaces with a particular emphasis on semantic parsing, the key technology facilitating the translation from natural language to SQL queries or data visualization commands. We then delve into the recent advancements in Text-to-SQL and Text-to-Vis problems from the perspectives of datasets, methodologies, metrics, and system designs. This includes a deep dive into the influence of LLMs, highlighting their strengths, limitations, and potential for future improvements. Through this survey, we aim to provide a roadmap for researchers and practitioners interested in developing and applying natural language interfaces for data interaction in the era of large language models.
翻译:自然语言处理的兴起彻底改变了用户与表格数据的交互方式,促使传统查询语言和手动绘图向更直观的基于语言的接口转变。以ChatGPT及其后续模型为代表的大型语言模型(LLMs)的崛起,进一步推动了这一领域的发展,开辟了自然语言处理技术的新方向。本综述全面概述了用于表格数据查询与可视化的自然语言接口,这些接口使用户能够通过自然语言查询与数据进行交互。我们介绍了这些接口所依托的基本概念与技术,尤其侧重于语义解析——这一关键技术支持了从自然语言到SQL查询或数据可视化命令的转换。随后,我们从数据集、方法论、评估指标和系统设计等角度深入探讨了Text-to-SQL和Text-to-Vis问题的最新进展,并详细剖析了大型语言模型的影响,强调了其优势、局限性及未来改进的潜力。通过本综述,我们旨在为在大语言模型时代对开发和应用自然语言数据交互接口感兴趣的研究人员与实践者提供一份路线图。