The Next Token Prediction paradigm (NTP, for short) lies at the forefront of modern large foundational models that are pre-trained on diverse and large datasets. These models generalize effectively and have proven to be very successful in Natural Language Processing (NLP). Inspired by the generalization capabilities of Large Language Models (LLMs), we investigate whether the same NTP paradigm can also be applied to DBMS design and optimization tasks. Adopting NTP directly for database optimization is non-trivial due to the fundamental differences between the domains. In this paper, we present a framework termed Probe and Learn (PoLe) for applying NTP to optimize database systems. PoLe leverages Decision Transformers and hardware-generated tokens to effectively incorporate NTP into database systems. Preliminary results from the main-memory index scheduling task demonstrate that adopting NTP can improve both performance and generalizability.
翻译:下一令牌预测范式(简称NTP)是现代大型基础模型的前沿技术,这些模型在多样化的海量数据集上进行预训练。此类模型展现出卓越的泛化能力,并在自然语言处理领域取得了显著成功。受大型语言模型泛化能力的启发,本研究探讨NTP范式是否同样适用于数据库管理系统设计与优化任务。由于领域间存在本质差异,将NTP直接应用于数据库优化并非易事。本文提出名为"探针学习"的框架,将NTP应用于数据库系统优化。该框架通过决策Transformer与硬件生成令牌的协同机制,将NTP有效整合至数据库系统中。基于主存索引调度任务的初步实验表明,采用NTP范式能够同时提升系统性能与泛化能力。