In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities in parsing textual data and generating code. However, their performance in tasks involving tabular data, especially those requiring symbolic reasoning, faces challenges due to the structural variance and inconsistency in table cell values often found in web tables. In this paper, we introduce NormTab, a novel framework aimed at enhancing the symbolic reasoning performance of LLMs by normalizing web tables. We study table normalization as a stand-alone, one-time preprocessing step using LLMs to support symbolic reasoning on tabular data. Our experimental evaluation, conducted on challenging web table datasets such as WikiTableQuestion and TabFact, demonstrates that leveraging NormTab significantly improves symbolic reasoning performance, showcasing the importance and effectiveness of web table normalization for enhancing LLM-based symbolic reasoning tasks.
翻译:近年来,大语言模型(LLMs)在解析文本数据和生成代码方面展现出卓越能力。然而,在处理涉及表格数据的任务时,尤其是需要符号推理的任务,由于网络表格中普遍存在的结构差异和单元格值不一致问题,其性能面临挑战。本文提出NormTab,一种旨在通过规范化网络表格来增强LLMs符号推理性能的新框架。我们将表格规范化作为一个独立的、一次性的预处理步骤进行研究,利用LLMs来支持对表格数据的符号推理。在WikiTableQuestion和TabFact等具有挑战性的网络表格数据集上进行的实验评估表明,采用NormTab能显著提升符号推理性能,这证明了网络表格规范化对于增强基于LLM的符号推理任务的重要性和有效性。