Gaussian-Process-based Bayesian optimization (GP-BO), is a prevailing model-based framework for DBMS auto-tuning. However, recent work shows GP-BO-based DBMS auto-tuners significantly outperformed auto-tuners based on SMAC, which features random forest surrogate models; such results motivate us to rethink and investigate the limitations of GP-BO in auto-tuner design. We find the fundamental assumptions of GP-BO are widely violated when modeling and optimizing DBMS performance, while tree-ensemble-BOs (e.g., SMAC) can avoid the assumption pitfalls and deliver improved tuning efficiency and effectiveness. Moreover, we argue that existing tree-ensemble-BOs restrict further advancement in DBMS auto-tuning. First, existing tree-ensemble-BOs can only achieve distribution-free point estimates, but still impose unrealistic distributional assumptions on uncertainty estimates, compromising surrogate modeling and distort the acquisition function. Second, recent advances in gradient boosting, which can further enhance surrogate modeling against vanilla GP and random forest counterparts, have rarely been applied in optimizing DBMS auto-tuners. To address these issues, we propose a novel model-based DBMS auto-tuner, Centrum. Centrum improves distribution-free point and interval estimation in surrogate modeling with a two-phase learning procedure of stochastic gradient boosting ensembles. Moreover, Centrum adopts a generalized SGBE-estimated locally-adaptive conformal prediction to facilitate a distribution-free uncertainty estimation and acquisition function. To our knowledge, Centrum is the first auto-tuner to realize distribution-freeness, enhancing BO's practicality in DBMS auto-tuning, and the first to seamlessly fuse gradient boosting ensembles and conformal inference in BO. Extensive physical and simulation experiments on two DBMSs and three workloads show Centrum outperforms 21 SOTA methods.
翻译:基于高斯过程的贝叶斯优化(GP-BO)是当前主流的基于模型的数据库管理系统自动调优框架。然而,近期研究表明,基于GP-BO的数据库自动调优器显著优于采用随机森林代理模型的SMAC方法;这一结果促使我们重新审视并探究GP-BO在自动调优器设计中的局限性。我们发现,在对数据库管理系统性能进行建模和优化时,GP-BO的基本假设常被违背,而基于树集成模型的贝叶斯优化方法(如SMAC)能够规避这些假设缺陷,从而提供更高的调优效率与效果。此外,我们认为现有的树集成贝叶斯优化方法限制了数据库自动调优的进一步发展。首先,现有树集成方法仅能实现无分布的点估计,但在不确定性估计中仍强加了不切实际的分布假设,这损害了代理建模并扭曲了采集函数。其次,梯度提升技术的最新进展虽能进一步提升代理建模能力(相较于传统高斯过程和随机森林方法),却极少被应用于优化数据库自动调优器。为解决这些问题,我们提出了一种新型基于模型的数据库自动调优器——Centrum。Centrum通过随机梯度提升集成模型的两阶段学习过程,改进了代理建模中的无分布点估计与区间估计。此外,Centrum采用广义SGBE估计的局部自适应保形预测方法,以实现无分布的不确定性估计与采集函数。据我们所知,Centrum是首个实现完全无分布特性的自动调优器,提升了贝叶斯优化在数据库自动调优中的实用性,同时也是首个将梯度提升集成与保形推理无缝融合于贝叶斯优化的方法。在两个数据库管理系统和三种工作负载上进行的广泛物理与仿真实验表明,Centrum在21种前沿方法中表现最优。