Table reasoning has shown remarkable progress in a wide range of table-based tasks. These challenging tasks require reasoning over both free-form natural language (NL) questions and semi-structured tabular data. However, previous table reasoning solutions suffer from significant performance degradation on "huge" tables. In addition, most existing methods struggle to reason over complex questions since they lack essential information or they are scattered in different places. To alleviate these challenges, we exploit a table provider, namely TAP4LLM, on versatile sampling, augmentation, and packing methods to achieve effective semi-structured data reasoning using large language models (LLMs), which 1) decompose raw tables into sub-tables with specific rows or columns based on the rules or semantic similarity; 2) augment table information by extracting semantic and statistical metadata from raw tables while retrieving relevant knowledge from trustworthy knowledge sources (e.g., Wolfram Alpha, Wikipedia); 3) pack sampled tables with augmented knowledge into sequence prompts for LLMs reasoning while balancing the token allocation trade-off. We show that TAP4LLM allows for different components as plug-ins, enhancing LLMs' understanding of structured data in diverse tabular tasks.
翻译:表格推理在多种基于表格的任务中取得了显著进展。这些具有挑战性的任务需要对自由形式的自然语言问题和半结构化表格数据同时进行推理。然而,先前的表格推理方案在处理"庞大"表格时性能严重下降。此外,大多数现有方法因缺乏关键信息或信息分散于不同位置,难以应对复杂问题推理。为解决这些问题,我们提出了一种表提供器TAP4LLM,通过灵活采样、增强和打包方法,利用大语言模型实现有效的半结构化数据推理:1)基于规则或语义相似性将原始表格分解为包含特定行或列的子表;2)通过从原始表格中提取语义和统计元数据,同时从可信知识源(如Wolfram Alpha、维基百科)检索相关知识来增强表格信息;3)将采样表格与增强知识打包成序列提示供LLM推理,同时平衡词元分配权衡。实验表明,TAP4LLM支持不同组件作为插件,提升LLM在多种表格任务中对结构化数据的理解能力。