As an important component of data exploration and integration, Column Type Annotation (CTA) aims to label columns of a table with one or more semantic types. With the recent development of Large Language Models (LLMs), researchers have started to explore the possibility of using LLMs for CTA, leveraging their strong zero-shot capabilities. In this paper, we build on this promising work and improve on LLM-based methods for CTA by showing how to use a Knowledge Graph (KG) to augment the context information provided to the LLM. Our approach, called RACOON, combines both pre-trained parametric and non-parametric knowledge during generation to improve LLMs' performance on CTA. Our experiments show that RACOON achieves up to a 0.21 micro F-1 improvement compared against vanilla LLM inference.
翻译:作为数据探索与集成的重要环节,列类型标注(CTA)旨在为表格中的列标注一个或多个语义类型。随着大型语言模型(LLM)的最新发展,研究者开始探索利用LLM强大的零样本能力进行CTA的可能性。本文基于这一前景广阔的研究,通过展示如何利用知识图谱(KG)增强提供给LLM的上下文信息,改进了基于LLM的CTA方法。我们提出的RACOON方法在生成过程中结合了预训练的参数化与非参数化知识,从而提升了LLM在CTA任务上的性能。实验结果表明,与原始LLM推理相比,RACOON实现了最高达0.21的微平均F-1分数提升。