Natural language question answering (QA) over structured data sources such as tables and knowledge graphs have been widely investigated, especially with Large Language Models (LLMs) in recent years. The main solutions include question to formal query parsing and retrieval-based answer generation. However, current methods of the former often suffer from weak generalization, failing to dealing with multi-types of sources, while the later is limited in trustfulness. In this paper, we propose TrustUQA, a trustful QA framework that can simultaneously support multiple types of structured data in a unified way. To this end, it adopts an LLM-friendly and unified knowledge representation method called Condition Graph(CG), and uses an LLM and demonstration-based two-level method for CG querying. For enhancement, it is also equipped with dynamic demonstration retrieval. We have evaluated TrustUQA with 5 benchmarks covering 3 types of structured data. It outperforms 2 existing unified structured data QA methods. In comparison with the baselines that are specific to one data type, it achieves state-of-the-art on 2 of the datasets. Further more, we have demonstrated the potential of our method for more general QA tasks, QA over mixed structured data and QA across structured data. The code is available at https://github.com/zjukg/TrustUQA.
翻译:针对表格和知识图谱等结构化数据的自然语言问答(QA)研究已得到广泛开展,近年来尤其是随着大语言模型(LLM)的发展。主要解决方案包括问题到形式化查询的解析和基于检索的答案生成。然而,当前的前一类方法通常泛化能力较弱,难以处理多种类型的数据源;而后一类方法则在可信度方面存在局限。本文提出TrustUQA,一个可信的问答框架,能够以统一的方式同时支持多种类型的结构化数据。为此,它采用了一种对大语言模型友好且统一的知识表示方法,称为条件图(CG),并使用一种基于LLM和示例的两层次方法进行CG查询。为增强性能,该框架还配备了动态示例检索机制。我们在涵盖3种结构化数据类型的5个基准数据集上评估了TrustUQA。其性能优于现有的2种统一结构化数据QA方法。与针对单一数据类型的基线方法相比,它在其中2个数据集上达到了最先进的性能。此外,我们还展示了该方法在更通用QA任务上的潜力,包括针对混合结构化数据的问答以及跨结构化数据的问答。代码发布于 https://github.com/zjukg/TrustUQA。