In the realm of natural language processing, the understanding of tabular data has perpetually stood as a focal point of scholarly inquiry. The emergence of expansive language models, exemplified by the likes of ChatGPT, has ushered in a wave of endeavors wherein researchers aim to harness these models for tasks related to table-based question answering. Central to our investigative pursuits is the elucidation of methodologies that amplify the aptitude of such large language models in discerning both the structural intricacies and inherent content of tables, ultimately facilitating their capacity to provide informed responses to pertinent queries. To this end, we have architected a distinctive module dedicated to the serialization of tables for seamless integration with expansive language models. Additionally, we've instituted a corrective mechanism within the model to rectify potential inaccuracies. Experimental results indicate that, although our proposed method trails the SOTA by approximately 11.7% in overall metrics, it surpasses the SOTA by about 1.2% in tests on specific datasets. This research marks the first application of large language models to table-based question answering tasks, enhancing the model's comprehension of both table structures and content.
翻译:在自然语言处理领域,表格数据的理解始终是学术研究的焦点。以ChatGPT为代表的大规模语言模型的涌现,引领了一波研究者尝试利用这些模型完成基于表格的问答任务的热潮。我们研究的核心在于阐明那些能够增强此类大型语言模型识别表格结构复杂性及内在内容能力的方法,最终促进它们为相关查询提供知情回应的能力。为此,我们专门设计了一个独特的模块,用于将表格序列化,以便与大型语言模型无缝集成。此外,我们在模型内建立了一种纠正机制,以修正潜在的不准确性。实验结果表明,尽管我们提出的方法在整体指标上比当前最优方法(SOTA)落后约11.7%,但在特定数据集的测试中,它超越了SOTA约1.2%。本研究首次将大型语言模型应用于基于表格的问答任务,增强了模型对表格结构和内容的理解。