Ever since the Dennard scaling broke down in the early 2000s and the frequency of the CPUs stalled, vendors have started to increase the core count in each CPU chip at the expense of introducing heterogeneity, thus ushering the era of NUMA and Chiplet processors. Since then, the heterogeneity in the design space of hardware has only increased to the point that DBMS performance may vary significantly up to an order of magnitude in modern servers. An important factor that affects performance includes the location of the logical cores where the DBMS queries execute, and the location where the data resides. This paper introduces P-MOSS, a learned spatial scheduling framework that schedules query execution to specific logical cores, and co-locates data on the corresponding NUMA node. For cross-hardware and workload adaptability, P-MOSS leverages core principles from Large Language Models, such as Next Token prediction, Generative Pre-training, and Fine-tuning. In the spirit of hardware-software synergy, P-MOSS guides its scheduling decision solely based on the low-level hardware statistics collected from the hardware Performance Monitoring Unit with the aid of a Decision Transformer. Experimental evaluation is performed in the context of the B$^+$-Tree index. Performance results demonstrate that P-MOSS offers an improvement of up to $6\times$ over traditional schedules in terms of query throughput.
翻译:自21世纪初登纳德缩放定律失效及CPU频率停滞以来,厂商开始通过引入异构性来增加每个CPU芯片的核心数量,从而开启了NUMA与小芯片处理器时代。此后,硬件设计空间的异构性持续加剧,导致现代服务器中数据库管理系统性能差异可达数量级。影响性能的关键因素包括数据库查询执行所在逻辑核心的位置,以及数据存储的位置。本文提出P-MOSS——一种基于学习的空间调度框架,可将查询执行调度至特定逻辑核心,并将数据协同定位至对应的NUMA节点。为提升跨硬件与工作负载的适应性,P-MOSS借鉴了大型语言模型的核心原理,如下一词元预测、生成式预训练与微调。基于软硬件协同设计理念,P-MOSS借助决策Transformer,仅通过从硬件性能监控单元采集的低级硬件统计数据进行调度决策。实验评估在B$^+$树索引场景下进行,性能结果表明:在查询吞吐量方面,P-MOSS相比传统调度方法最高可提升$6\times$。