We apply foundation models to data discovery and exploration tasks. Foundation models are large language models (LLMs) that show promising performance on a range of diverse tasks unrelated to their training. We show that these models are highly applicable to the data discovery and data exploration domain. When carefully used, they have superior capability on three representative tasks: table-class detection, column-type annotation and join-column prediction. On all three tasks, we show that a foundation-model-based approach outperforms the task-specific models and so the state of the art. Further, our approach often surpasses human-expert task performance. We investigate the fundamental characteristics of this approach including generalizability to several foundation models, impact of non-determinism on the outputs and syntactic/semantic signals. All in all, this suggests a future direction in which disparate data management tasks can be unified under foundation models.
翻译:我们将基础模型应用于数据发现与探索任务。基础模型是一类大型语言模型(LLMs),在与其训练任务无关的多样化任务上展现出优异性能。研究表明,这类模型高度适用于数据发现与数据探索领域。在谨慎使用时,它们在三个代表性任务上具有卓越能力:表格类检测、列类型标注与连接列预测。针对所有三项任务,我们发现基于基础模型的方法均优于特定任务模型,从而超越了当前技术水平。此外,我们的方法时常超越人类专家的任务表现。我们深入探究了该方法的基本特性,包括对多种基础模型的泛化能力、非确定性对输出的影响,以及句法/语义信号的作用。总体而言,这预示着一个未来方向:多样化的数据管理任务可被统一于基础模型框架之下。