Large Language Models (LLMs), excel in natural language understanding, but their capability for complex mathematical reasoning with an amalgamation of structured tables and unstructured text is uncertain. This study explores LLMs' mathematical reasoning on four financial tabular question-answering datasets: TATQA, FinQA, ConvFinQA, and Multihiertt. Through extensive experiments with various models and prompting techniques, we assess how LLMs adapt to complex tables and mathematical tasks. We focus on sensitivity to table complexity and performance variations with an increasing number of arithmetic reasoning steps. The results provide insights into LLMs' capabilities and limitations in handling complex mathematical scenarios for semi-structured tables. Ultimately, we introduce a novel prompting technique tailored to semi-structured documents, matching or outperforming other baselines in performance while providing a nuanced understanding of LLMs abilities for such a task.
翻译:大型语言模型(LLMs)在自然语言理解方面表现卓越,但其在结合结构化表格与非结构化文本时进行复杂数学推理的能力仍不明确。本研究针对四个金融表格问答数据集:TATQA、FinQA、ConvFinQA和Multihiertt,探索了LLMs的数学推理能力。通过采用多种模型和提示技术的大量实验,我们评估了LLMs如何适应复杂表格和数学任务,重点关注其对表格复杂性的敏感度以及随算术推理步骤增加而出现的性能变化。研究结果揭示了LLMs在处理半结构化表格的复杂数学场景时的能力与局限性,最终提出了一种针对半结构化文档的新型提示技术,该技术在性能上匹配或超越了其他基线方法,同时提供了对LLMs在此类任务中能力的细致理解。