NoSQL databases have become increasingly popular due to their outstanding performance in handling large-scale, unstructured, and semi-structured data, highlighting the need for user-friendly interfaces to bridge the gap between non-technical users and complex database queries. In this paper, we introduce the Text-to-NoSQL task, which aims to convert natural language queries into NoSQL queries, thereby lowering the technical barrier for non-expert users. To promote research in this area, we developed a novel automated dataset construction process and released a large-scale and open-source dataset for this task, named TEND (short for Text-to-NoSQL Dataset). Additionally, we designed a SLM (Small Language Model)-assisted and RAG (Retrieval-augmented Generation)-assisted multi-step framework called SMART, which is specifically designed for Text-to-NoSQL conversion. To ensure comprehensive evaluation of the models, we also introduced a detailed set of metrics that assess the model's performance from both the query itself and its execution results. Our experimental results demonstrate the effectiveness of our approach and establish a benchmark for future research in this emerging field. We believe that our contributions will pave the way for more accessible and intuitive interactions with NoSQL databases.
翻译:NoSQL数据库因其在处理大规模、非结构化和半结构化数据方面的卓越性能而日益普及,这凸显了对用户友好界面的需求,以弥合非技术用户与复杂数据库查询之间的差距。本文介绍了文本到NoSQL任务,该任务旨在将自然语言查询转换为NoSQL查询,从而降低非专业用户的技术门槛。为促进该领域的研究,我们开发了一种新颖的自动化数据集构建流程,并为此任务发布了一个大规模开源数据集,命名为TEND(Text-to-NoSQL Dataset的简称)。此外,我们设计了一个名为SMART的SLM(小型语言模型)辅助与RAG(检索增强生成)辅助的多步骤框架,专门用于文本到NoSQL转换。为确保对模型的全面评估,我们还引入了一套详细的评估指标,从查询本身及其执行结果两方面评估模型性能。我们的实验结果证明了所提方法的有效性,并为这一新兴领域的未来研究建立了基准。我们相信,我们的贡献将为更易用、更直观的NoSQL数据库交互铺平道路。