Table question answering (TableQA) is a fundamental task in natural language processing (NLP). The strong reasoning capabilities of large language models (LLMs) have brought significant advances in this field. However, as real-world applications involve increasingly complex questions and larger tables, substantial noisy data is introduced, which severely degrades reasoning performance. To address this challenge, we focus on improving two core capabilities: Relevance Filtering, which identifies and retains information truly relevant to reasoning, and Table Pruning, which reduces table size while preserving essential content. Based on these principles, we propose EnoTab, a dual denoising framework for complex questions and large-scale tables. Specifically, we first perform Evidence-based Question Denoising by decomposing the question into minimal semantic units and filtering out those irrelevant to answer reasoning based on consistency and usability criteria. Then, we propose Evidence Tree-guided Table Denoising, which constructs an explicit and transparent table pruning path to remove irrelevant data step by step. At each pruning step, we observe the intermediate state of the table and apply a post-order node rollback mechanism to handle abnormal table states, ultimately producing a highly reliable sub-table for final answer reasoning. Finally, extensive experiments show that EnoTab achieves outstanding performance on TableQA tasks with complex questions and large-scale tables, confirming its effectiveness.
翻译:表查询问答(TableQA)是自然语言处理(NLP)中的基础任务。大型语言模型(LLMs)强大的推理能力为该领域带来了显著进展。然而,随着现实应用涉及日益复杂的问题和更大规模的表格,大量噪声数据被引入,严重降低了推理性能。为应对这一挑战,我们聚焦于提升两项核心能力:相关性过滤(识别并保留与推理真正相关的信息)和表格剪枝(在保留关键内容的同时减小表格规模)。基于这些原理,我们提出EnoTab——一种面向复杂问题和大规模表格的双重去噪框架。具体而言,我们首先通过将问题分解为最小语义单元,并基于一致性与可用性标准筛除与答案推理无关的单元,执行基于证据的问题去噪。随后,我们提出证据树引导的表格去噪方法,通过构建显式且透明的表格剪枝路径,逐步移除不相关数据。在每个剪枝步骤中,我们观测表格的中间状态,并应用后序节点回滚机制处理异常表格状态,最终生成高度可靠的子表格用于最终答案推理。最后,大量实验表明,EnoTab在包含复杂问题和大规模表格的TableQA任务中取得了优异性能,验证了其有效性。