We apply foundation models to data discovery and exploration tasks. Foundation models include large language models (LLMs) that show promising performance on a range of diverse tasks unrelated to their training. We show that these models are highly applicable to the data discovery and data exploration domain. When carefully used, they have superior capability on three representative tasks: table-class detection, column-type annotation and join-column prediction. On all three tasks, we show that a foundation-model-based approach outperforms the task-specific models and so the state of the art. Further, our approach often surpasses human-expert task performance. We investigate the fundamental characteristics of this approach including generalizability to several foundation models and the impact of non-determinism on the outputs. All in all, this suggests a future direction in which disparate data management tasks can be unified under foundation models.
翻译:我们将基础模型应用于数据发现与探索任务。基础模型包括大语言模型(LLMs),它们在与其训练无关的多种不同任务中展现出优异性能。研究表明,这些模型高度适用于数据发现与数据探索领域。当谨慎使用时,它们在三个代表性任务——表格类别检测、列类型标注与连接列预测——中展现出卓越能力。在这三项任务上,基于基础模型的方法均优于任务专用模型,从而超越了当前最优水平。此外,我们的方法通常能超越人类专家的任务表现。我们研究了该方法的基本特性,包括对多种基础模型的泛化能力以及非确定性对输出结果的影响。总体而言,这预示着未来可以将不同数据管理任务统一纳入基础模型框架的发展方向。