In the realm of natural language processing, the understanding of tabular data has perpetually stood as a focal point of scholarly inquiry. The emergence of expansive language models, exemplified by the likes of ChatGPT, has ushered in a wave of endeavors wherein researchers aim to harness these models for tasks related to table-based question answering. Central to our investigative pursuits is the elucidation of methodologies that amplify the aptitude of such large language models in discerning both the structural intricacies and inherent content of tables, ultimately facilitating their capacity to provide informed responses to pertinent queries. To this end, we have architected a distinctive module dedicated to the serialization of tables for seamless integration with expansive language models. Additionally, we've instituted a corrective mechanism within the model to rectify potential inaccuracies. Experimental results indicate that, although our proposed method trails the SOTA by approximately 11.7% in overall metrics, it surpasses the SOTA by about 1.2% in tests on specific datasets. This research marks the first application of large language models to table-based question answering tasks, enhancing the model's comprehension of both table structures and content.
翻译:在自然语言处理领域,对表格数据的理解始终是学术研究的重要焦点。以ChatGPT为代表的通用大语言模型的出现,引发了一系列研究热潮——学者们试图利用这些模型完成基于表格的问答任务。本研究的核心在于阐明如何增强此类大语言模型对表格结构复杂性与内容本质的辨识能力,最终提升其针对相关查询提供精准应答的效能。为此,我们设计了一种专用模块用于表格序列化,以实现与大语言模型的无缝集成,并进一步在模型中构建了纠错机制以修正潜在偏差。实验结果表明,尽管所提方法在整体指标上落后于当前最优模型约11.7%,但在特定数据集的测试中其表现超越最优模型约1.2%。本研究首次将大语言模型应用于表格问答任务,显著提升了模型对表格结构与内容的双重理解能力。