Large language models have recently advanced the state of the art on many natural language processing benchmarks. The newest generation of models can be applied to a variety of tasks with little to no specialized training. This technology creates various opportunities for applications in the context of data management. The tutorial will introduce participants to basic background on language models, discuss different methods to use language models, and give an overview and short demonstration of available libraries and APIs. Models for generating natural language will be considered as well as models, such as GPT-3 Codex, which complete program code or generate code from natural language instructions. Finally, the tutorial will discuss recent research in the database community that exploits language models in the context of traditional database systems or proposes novel system architectures that are based on them. The tutorial is targeted at database researchers. No prior background on language models is required. The goal of the tutorial is to introduce database researchers to the latest generation of language models, and to their use cases in the domain of data management.
翻译:大语言模型近期在多项自然语言处理基准测试中取得了前沿成果。新一代模型只需极少甚至无需专门训练即可应用于多种任务。这类技术为数据管理场景创造了诸多应用机遇。本教程将引导参与者了解语言模型的基础背景知识,探讨使用语言模型的不同方法,并对现有库和应用程序接口进行概述与简短演示。我们将重点探讨自然语言生成模型,以及诸如GPT-3 Codex这类能够补全程序代码或根据自然语言指令生成代码的模型。最后,教程将讨论数据库社区近期在传统数据库系统中利用语言模型的相关研究,以及基于这些模型提出的新型系统架构。本教程面向数据库领域的研究人员,无需语言模型知识背景。其目标在于向数据库研究者介绍最新一代语言模型及其在数据管理领域的应用案例。