Web tables contain a large amount of valuable knowledge and have inspired tabular language models aimed at tackling table interpretation (TI) tasks. In this paper, we analyse a widely used benchmark dataset for evaluation of TI tasks, particularly focusing on the entity linking task. Our analysis reveals that this dataset is overly simplified, potentially reducing its effectiveness for thorough evaluation and failing to accurately represent tables as they appear in the real-world. To overcome this drawback, we construct and annotate a new more challenging dataset. In addition to introducing the new dataset, we also introduce a novel problem aimed at addressing the entity linking task: named entity recognition within cells. Finally, we propose a prompting framework for evaluating the newly developed large language models (LLMs) on this novel TI task. We conduct experiments on prompting LLMs under various settings, where we use both random and similarity-based selection to choose the examples presented to the models. Our ablation study helps us gain insights into the impact of the few-shot examples. Additionally, we perform qualitative analysis to gain insights into the challenges encountered by the models and to understand the limitations of the proposed dataset.
翻译:网络表格蕴含大量宝贵知识,并催生了旨在解决表格理解任务的表格语言模型。本文分析了用于评估表格理解任务的广泛使用的基准数据集,特别聚焦于实体链接任务。分析表明该数据集过于简化,可能降低其作为深入评估工具的有效性,且未能准确反映真实场景中的表格形态。为弥补这一缺陷,我们构建并标注了一个更具挑战性的新数据集。除引入新数据集外,我们还提出一个旨在解决实体链接任务的新问题:单元格内命名实体识别。最后,我们设计了一个提示框架,用于评估新开发的大语言模型在该新型表格理解任务上的表现。我们在不同设置下对提示大语言模型进行了实验,采用随机选择和基于相似性的选择方法为模型选取示例。通过消融研究,我们深入分析了少样本示例的影响。此外,我们通过定性分析揭示了模型面临的挑战,并明确了所提出数据集的局限性。