The management of database system configurations is a challenging task, as there are hundreds of configuration knobs that control every aspect of the system. This is complicated by the fact that these knobs are not standardized, independent, or universal, making it difficult to determine optimal settings. An automated approach to address this problem using supervised and unsupervised machine learning methods to select impactful knobs, map unseen workloads, and recommend knob settings was implemented in a new tool called OtterTune and is being evaluated on three DBMSs, with results demonstrating that it recommends configurations as good as or better than those generated by existing tools or a human expert.In this work, we extend an automated technique based on Ottertune [1] to reuse training data gathered from previous sessions to tune new DBMS deployments with the help of supervised and unsupervised machine learning methods to improve latency prediction. Our approach involves the expansion of the methods proposed in the original paper. We use GMM clustering to prune metrics and combine ensemble models, such as RandomForest, with non-linear models, like neural networks, for prediction modeling.
翻译:数据库系统配置管理是一项具有挑战性的任务,因为系统中有数百个控制各个方面的配置旋钮。这些旋钮缺乏标准化、独立性和通用性,使得确定最优设置变得尤为复杂。一种采用监督与非监督机器学习方法的自动化方案被实现于名为OtterTune的新工具中,该方案能够筛选关键旋钮、映射未知工作负载并推荐配置参数。通过在三个数据库管理系统上的评估表明,该工具推荐的配置效果与现有工具或人类专家生成的配置相当或更优。在本研究中,我们基于OtterTune[1]扩展了一种自动化技术,利用先前会话中收集的训练数据,结合监督与非监督机器学习方法,优化新数据库管理系统部署的延迟预测。该方法在原始论文提案基础上进行拓展:我们采用高斯混合模型(GMM)聚类筛选指标,并融合随机森林(RandomForest)等集成模型与神经网络等非线性模型进行预测建模。