Tabular question answering (TQA) presents a challenging setting for neural systems by requiring joint reasoning of natural language with large amounts of semi-structured data. Unlike humans who use programmatic tools like filters to transform data before processing, language models in TQA process tables directly, resulting in information loss as table size increases. In this paper we propose ToolWriter to generate query specific programs and detect when to apply them to transform tables and align them with the TQA model's capabilities. Focusing ToolWriter to generate row-filtering tools improves the state-of-the-art for WikiTableQuestions and WikiSQL with the most performance gained on long tables. By investigating headroom, our work highlights the broader potential for programmatic tools combined with neural components to manipulate large amounts of structured data.
翻译:表格问答(TQA)为神经系统的自然语言与大规模半结构化数据的联合推理提出了挑战。与人类在处理数据前会使用过滤器等程序化工具进行变换不同,TQA中的语言模型直接处理表格,导致随着表格规模增大而出现信息丢失。本文提出ToolWriter生成查询专用程序,并检测何时应用这些程序来变换表格,使其与TQA模型的能力对齐。聚焦于生成行过滤工具的ToolWriter在WikiTableQuestions和WikiSQL上实现了当前最优性能,尤其在长表格上表现突出。通过探究性能提升空间,我们的工作凸显了程序化工具与神经组件相结合以操控大规模结构化数据的更广泛潜力。