$R^3$-NL2GQL: A Hybrid Models Approach for for Accuracy Enhancing and Hallucinations Mitigation

While current NL2SQL tasks constructed using Foundation Models have achieved commendable results, their direct application to Natural Language to Graph Query Language (NL2GQL) tasks poses challenges due to the significant differences between GQL and SQL expressions, as well as the numerous types of GQL. Our extensive experiments reveal that in NL2GQL tasks, larger Foundation Models demonstrate superior cross-schema generalization abilities, while smaller Foundation Models struggle to improve their GQL generation capabilities through fine-tuning. However, after fine-tuning, smaller models exhibit better intent comprehension and higher grammatical accuracy. Diverging from rule-based and slot-filling techniques, we introduce R3-NL2GQL, which employs both smaller and larger Foundation Models as reranker, rewriter and refiner. The approach harnesses the comprehension ability of smaller models for information reranker and rewriter, and the exceptional generalization and generation capabilities of larger models to transform input natural language queries and code structure schema into any form of GQLs. Recognizing the lack of established datasets in this nascent domain, we have created a bilingual dataset derived from graph database documentation and some open-source Knowledge Graphs (KGs). We tested our approach on this dataset and the experimental results showed that delivers promising performance and robustness.Our code and dataset is available at https://github.com/zhiqix/NL2GQL

翻译：尽管当前基于基础模型构建的NL2SQL任务已取得可观成果，但由于图查询语言（GQL）与SQL在表达方式上的显著差异以及GQL类型的多样性，将这些方法直接应用于自然语言到图查询语言（NL2GQL）任务仍面临挑战。我们的广泛实验表明，在NL2GQL任务中，较大规模的基础模型展现出更优的跨模式泛化能力，而较小规模的基础模型难以通过微调提升其GQL生成能力。然而，经微调后，较小模型展现出更强的意图理解能力和更高的语法准确性。与基于规则和槽填充的技术不同，我们提出了R³-NL2GQL方法，该方法同时利用较小和较大规模的基础模型分别作为重排序器、改写器和优化器。该技术利用较小模型在信息重排序和改写方面的理解能力，结合较大模型卓越的泛化与生成能力，将输入的自然语言查询和代码结构模式转化为任意形式的GQL。考虑到这一新兴领域缺乏基准数据集，我们基于图数据库文档和部分开源知识图谱（KG）构建了一个双语数据集。在该数据集上进行的实验结果表明，该方法展现出优异的性能与鲁棒性。我们的代码和数据集已开源至https://github.com/zhiqix/NL2GQL。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日