Large Language Models (LLMs) have shown to be capable of various tasks, yet their capability in interpreting and reasoning over tabular data remains an underexplored area. In this context, this study investigates from three core perspectives: the robustness of LLMs to structural perturbations in tables, the comparative analysis of textual and symbolic reasoning on tables, and the potential of boosting model performance through the aggregation of multiple reasoning pathways. We discover that structural variance of tables presenting the same content reveals a notable performance decline, particularly in symbolic reasoning tasks. This prompts the proposal of a method for table structure normalization. Moreover, textual reasoning slightly edges out symbolic reasoning, and a detailed error analysis reveals that each exhibits different strengths depending on the specific tasks. Notably, the aggregation of textual and symbolic reasoning pathways, bolstered by a mix self-consistency mechanism, resulted in achieving SOTA performance, with an accuracy of 73.6% on WIKITABLEQUESTIONS, representing a substantial advancement over previous existing table processing paradigms of LLMs.
翻译:大语言模型(LLMs)已被证明能够胜任多种任务,但其在解释和推理表格数据方面的能力仍是一个未充分探索的领域。在此背景下,本研究从三个核心角度展开探究:LLMs对表格结构扰动的鲁棒性、表格上文本推理与符号推理的比较分析,以及通过聚合多条推理路径提升模型性能的潜力。我们发现,呈现相同内容的表格在结构变化时会导致性能显著下降,尤其是在符号推理任务中。这促使我们提出一种表格结构归一化方法。此外,文本推理略优于符号推理,而详细的错误分析揭示,两者在不同具体任务上展现出不同的优势。值得注意的是,通过混合自一致性机制增强的文本与符号推理路径聚合,在WIKITABLEQUESTIONS上实现了SOTA性能,准确率达73.6%,相较于以往的LLMs表格处理范式取得了重大进步。