We explore the application of foundation models to data discovery and exploration tasks. Foundation models are large language models (LLMs) that show promising performance on a range of diverse tasks unrelated to their training. We show that these models are highly applicable to the data discovery and data exploration domain. When carefully used, they have superior capability on three representative tasks: table-class detection, column-type annotation and join-column prediction. On all three tasks, we show that a foundation-model-based approach outperforms the task-specific models and so the state of the art. Further, our approach often surpasses human-expert task performance. This suggests a future direction in which disparate data management tasks can be unified under foundation models.
翻译:我们探索了基础模型在数据发现与探索任务中的应用。基础模型是一种大型语言模型(LLM),在与其训练无关的各类不同任务上展现出显著性能。研究表明,这些模型高度适用于数据发现与数据探索领域。经过谨慎使用,它们在三项代表性任务——表格类别检测、列类型标注和连接列预测——上展现出卓越能力。在这三项任务中,基于基础模型的方法均超越了特定任务模型及当前最优水平。此外,我们的方法常能超越人类专家的任务表现。这预示着未来可借助基础模型统一各类分散的数据管理任务的发展方向。