Centrum：基于模型且具有最小分布假设的数据库自动调优 (Centrum: Model-based Database Auto-tuning with Minimal Distributional Assumptions)

Gaussian-Process-based Bayesian optimization (GP-BO), is a prevailing model-based framework for DBMS auto-tuning. However, recent work shows GP-BO-based DBMS auto-tuners significantly outperformed auto-tuners based on SMAC, which features random forest surrogate models; such results motivate us to rethink and investigate the limitations of GP-BO in auto-tuner design. We find the fundamental assumptions of GP-BO are widely violated when modeling and optimizing DBMS performance, while tree-ensemble-BOs (e.g., SMAC) can avoid the assumption pitfalls and deliver improved tuning efficiency and effectiveness. Moreover, we argue that existing tree-ensemble-BOs restrict further advancement in DBMS auto-tuning. First, existing tree-ensemble-BOs can only achieve distribution-free point estimates, but still impose unrealistic distributional assumptions on uncertainty estimates, compromising surrogate modeling and distort the acquisition function. Second, recent advances in gradient boosting, which can further enhance surrogate modeling against vanilla GP and random forest counterparts, have rarely been applied in optimizing DBMS auto-tuners. To address these issues, we propose a novel model-based DBMS auto-tuner, Centrum. Centrum improves distribution-free point and interval estimation in surrogate modeling with a two-phase learning procedure of stochastic gradient boosting ensembles. Moreover, Centrum adopts a generalized SGBE-estimated locally-adaptive conformal prediction to facilitate a distribution-free uncertainty estimation and acquisition function. To our knowledge, Centrum is the first auto-tuner to realize distribution-freeness, enhancing BO's practicality in DBMS auto-tuning, and the first to seamlessly fuse gradient boosting ensembles and conformal inference in BO. Extensive physical and simulation experiments on two DBMSs and three workloads show Centrum outperforms 21 SOTA methods.

翻译：基于高斯过程的贝叶斯优化（GP-BO）是当前主流的基于模型的数据库管理系统自动调优框架。然而，近期研究表明，基于GP-BO的数据库自动调优器显著优于采用随机森林代理模型的SMAC方法；这一结果促使我们重新审视并探究GP-BO在自动调优器设计中的局限性。我们发现，在对数据库管理系统性能进行建模和优化时，GP-BO的基本假设常被违背，而基于树集成模型的贝叶斯优化方法（如SMAC）能够规避这些假设缺陷，从而提供更高的调优效率与效果。此外，我们认为现有的树集成贝叶斯优化方法限制了数据库自动调优的进一步发展。首先，现有树集成方法仅能实现无分布的点估计，但在不确定性估计中仍强加了不切实际的分布假设，这损害了代理建模并扭曲了采集函数。其次，梯度提升技术的最新进展虽能进一步提升代理建模能力（相较于传统高斯过程和随机森林方法），却极少被应用于优化数据库自动调优器。为解决这些问题，我们提出了一种新型基于模型的数据库自动调优器——Centrum。Centrum通过随机梯度提升集成模型的两阶段学习过程，改进了代理建模中的无分布点估计与区间估计。此外，Centrum采用广义SGBE估计的局部自适应保形预测方法，以实现无分布的不确定性估计与采集函数。据我们所知，Centrum是首个实现完全无分布特性的自动调优器，提升了贝叶斯优化在数据库自动调优中的实用性，同时也是首个将梯度提升集成与保形推理无缝融合于贝叶斯优化的方法。在两个数据库管理系统和三种工作负载上进行的广泛物理与仿真实验表明，Centrum在21种前沿方法中表现最优。

相关内容

MoDELS

关注 0

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日