We use machine learning to optimize LSM-tree structure, aiming to reduce the cost of processing various read/write operations. We introduce a new approach Camal, which boasts the following features: (1) ML-Aided: Camal is the first attempt to apply active learning to tune LSM-tree based key-value stores. The learning process is coupled with traditional cost models to improve the training process; (2) Decoupled Active Learning: backed by rigorous analysis, Camal adopts active learning paradigm based on a decoupled tuning of each parameter, which further accelerates the learning process; (3) Easy Extrapolation: Camal adopts an effective mechanism to incrementally update the model with the growth of the data size; (4) Dynamic Mode: Camal is able to tune LSM-tree online under dynamically changing workloads; (5) Significant System Improvement: By integrating Camal into a full system RocksDB, the system performance improves by 28% on average and up to 8x compared to a state-of-the-art RocksDB design.
翻译:本研究利用机器学习优化LSM树结构,旨在降低各类读写操作的处理成本。我们提出了一种新方法Camal,其具备以下特征:(1) 机器学习辅助:Camal首次尝试将主动学习应用于基于LSM树的键值存储系统调优。学习过程与传统成本模型相结合以改进训练流程;(2) 解耦式主动学习:基于严格的理论分析,Camal采用基于参数解耦调优的主动学习范式,进一步加速学习进程;(3) 易扩展性:Camal采用有效机制实现模型随数据规模增长的增量更新;(4) 动态模式:Camal能够在动态变化的工作负载下在线调优LSM树;(5) 显著系统提升:将Camal集成至完整系统RocksDB后,相较于当前最优的RocksDB设计方案,系统性能平均提升28%,最高可达8倍。