While extensive research has explored the use of large language models (LLMs) for table-based reasoning, most approaches struggle with scalability when applied to large tables. To maintain the superior comprehension abilities of LLMs in these scenarios, we introduce ALTER(Augmentation for Large-Table-Based Reasoning)-a framework designed to harness the latent augmentation potential in both free-form natural language (NL) questions, via the query augmentor, and semi-structured tabular data, through the table augmentor. By utilizing only a small subset of relevant data from the table and supplementing it with pre-augmented schema, semantic, and literal information, ALTER achieves outstanding performance on table-based reasoning benchmarks. We also provide a detailed analysis of large-table scenarios, comparing different methods and various partitioning principles. In these scenarios, our method outperforms all other approaches and exhibits robustness and efficiency against perturbations.
翻译:尽管已有大量研究探索了利用大语言模型进行表格推理,但现有方法在处理大规模表格时普遍面临可扩展性挑战。为保持大语言模型在此类场景下的卓越理解能力,本文提出ALTER(面向大规模表格推理的数据增强框架)——该框架通过查询增强器挖掘自由形式自然语言问题的潜在增强能力,同时借助表格增强器开发半结构化表格数据的增强潜力。仅需利用表格中的小规模相关数据子集,并辅以预增强的模式信息、语义信息与字面信息,ALTER在多项表格推理基准测试中均取得了优异性能。本文还针对大规模表格场景进行了详细分析,比较了不同方法与多种划分原则。实验表明,在此类场景下我们的方法优于所有现有方案,并在应对数据扰动时展现出良好的鲁棒性与计算效率。