Tables serve as a fundamental format for representing structured relational data. While current language models (LMs) excel at many text-based tasks, they still face challenges in table understanding due to the complex characteristics of tabular data, such as their structured nature. In this paper, we aim to enhance LMs for improved table understanding. We identify four key challenges: 1) difficulty in locating target data, 2) deficiency in table semantics, 3) numerical inaccuracies in textual reasoning, and 4) semantic inflexibility in symbolic reasoning. To address these issues, we propose TableMaster, a recipe and comprehensive framework that integrates multiple solutions to overcome these obstacles. TableMaster first extracts relevant table content and verbalizes it with enriched semantic context. Additionally, we introduce adaptive reasoning, a flexible approach that dynamically adjusts between textual and symbolic reasoning, tailoring the reasoning process to each query. Extensive analyses and experiments demonstrate our findings and the effectiveness of TableMaster. On the WikiTQ dataset, TableMaster achieves an accuracy of 78.13% using GPT-4o-mini, surpassing existing baselines. We hope this work will serve as a practical step toward more robust and reliable table understanding.
翻译:表格是表示结构化关系数据的基本格式。尽管当前的语言模型在许多基于文本的任务上表现出色,但由于表格数据的复杂特性(如其结构化本质),它们在表格理解方面仍面临挑战。本文旨在增强语言模型以改进其表格理解能力。我们识别了四个关键挑战:1) 定位目标数据困难,2) 表格语义表示不足,3) 文本推理中的数值不准确性,以及4) 符号推理中的语义不灵活性。为解决这些问题,我们提出了TableMaster,一个集成多种解决方案以克服这些障碍的方案和综合框架。TableMaster首先提取相关表格内容,并用丰富的语义上下文对其进行语言化描述。此外,我们引入了自适应推理,这是一种灵活的方法,能够在文本推理和符号推理之间动态调整,使推理过程适应每个查询。大量的分析和实验证明了我们的发现以及TableMaster的有效性。在WikiTQ数据集上,TableMaster使用GPT-4o-mini实现了78.13%的准确率,超越了现有基线。我们希望这项工作能为实现更稳健、可靠的表格理解迈出切实的一步。