Insights into Natural Language Database Query Errors: From Attention Misalignment to User Handling Strategies

Querying structured databases with natural language (NL2SQL) has remained a difficult problem for years. Recently, the advancement of machine learning (ML), natural language processing (NLP), and large language models (LLM) have led to significant improvements in performance, with the best model achieving ~85% percent accuracy on the benchmark Spider dataset. However, there is a lack of a systematic understanding of the types, causes, and effectiveness of error-handling mechanisms of errors for erroneous queries nowadays. To bridge the gap, a taxonomy of errors made by four representative NL2SQL models was built in this work, along with an in-depth analysis of the errors. Second, the causes of model errors were explored by analyzing the model-human attention alignment to the natural language query. Last, a within-subjects user study with 26 participants was conducted to investigate the effectiveness of three interactive error-handling mechanisms in NL2SQL. Findings from this paper shed light on the design of model structure and error discovery and repair strategies for natural language data query interfaces in the future.

翻译：多年来，使用自然语言查询结构化数据库（NL2SQL）一直是一个难题。近期，机器学习（ML）、自然语言处理（NLP）和大语言模型（LLM）的进步显著提升了性能，最优模型在Spider基准数据集上的准确率达到约85%。然而，目前缺乏对错误查询的类型、原因以及错误处理机制有效性的系统性理解。为弥补这一不足，本研究首先构建了四种代表性NL2SQL模型错误的分类体系，并对错误进行了深入分析。其次，通过分析模型与人类在自然语言查询上的注意力对齐程度，探究了模型错误的成因。最后，开展了一项包含26名参与者的被试内用户研究，考察了NL2SQL中三种交互式错误处理机制的有效性。本文的研究成果为未来自然语言数据查询界面的模型结构设计以及错误发现与修复策略提供了启示。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/