Natural language question answering (QA) over structured data sources such as tables and knowledge graphs (KGs) have been widely investigated, for example with Large Language Models (LLMs). The main solutions include question to formal query parsing and retrieval-based answer generation. However, current methods of the former often suffer from weak generalization, failing to dealing with multiple sources simultaneously, while the later is limited in trustfulness. In this paper, we propose UnifiedTQA, a trustful QA framework that can simultaneously support multiple types of structured data in a unified way. To this end, it adopts an LLM-friendly and unified knowledge representation method called Condition Graph (CG), and uses an LLM and demonstration-based two-level method for CG querying. For enhancement, it is also equipped with dynamic demonstration retrieval. We have evaluated UnifiedTQA with 5 benchmarks covering 3 types of structured data. It outperforms 2 existing unified structured data QA methods and in comparison with the baselines that are specific to a data type, it achieves state-of-the-art on 2 of them. Further more, we demonstrates potential of our method for more general QA tasks, QA over mixed structured data and QA across structured data.
翻译:针对表格和知识图谱等结构化数据源的自然语言问答已得到广泛研究,例如利用大语言模型。主要解决方案包括问题到形式化查询的解析和基于检索的答案生成。然而,前一类方法通常泛化能力较弱,难以同时处理多数据源;后一类方法则在可信性方面存在局限。本文提出UnifiedTQA,一种能够以统一方式同时支持多种结构化数据类型的可信问答框架。为此,它采用一种对大语言模型友好且统一的知识表示方法——条件图,并利用基于大语言模型与示例的两层级方法进行条件图查询。为增强性能,该框架还配备了动态示例检索机制。我们在涵盖3类结构化数据的5个基准测试上评估了UnifiedTQA。其性能优于两种现有的统一结构化数据问答方法;与针对特定数据类型设计的基线模型相比,在其中2个基准上达到了最先进水平。此外,我们验证了该方法在更广泛问答任务中的潜力,包括混合结构化数据问答及跨结构化数据问答。