Vector data management systems (VDMSs) have become an indispensable cornerstone in large-scale information retrieval and machine learning systems like large language models. To enhance the efficiency and flexibility of similarity search, VDMS exposes many tunable index parameters and system parameters for users to specify. However, due to the inherent characteristics of VDMS, automatic performance tuning for VDMS faces several critical challenges, which cannot be well addressed by the existing auto-tuning methods. In this paper, we introduce VDTuner, a learning-based automatic performance tuning framework for VDMS, leveraging multi-objective Bayesian optimization. VDTuner overcomes the challenges associated with VDMS by efficiently exploring a complex multi-dimensional parameter space without requiring any prior knowledge. Moreover, it is able to achieve a good balance between search speed and recall rate, delivering an optimal configuration. Extensive evaluations demonstrate that VDTuner can markedly improve VDMS performance (14.12% in search speed and 186.38% in recall rate) compared with default setting, and is more efficient compared with state-of-the-art baselines (up to 3.57 times faster in terms of tuning time). In addition, VDTuner is scalable to specific user preference and cost-aware optimization objective. VDTuner is available online at https://github.com/tiannuo-yang/VDTuner.
翻译:向量数据管理系统(VDMS)已成为大规模信息检索及大型语言模型等机器学习系统中不可或缺的基石。为提升相似性搜索的效率与灵活性,VDMS 向用户暴露了诸多可调索引参数与系统参数。然而,受 VDMS 固有特性的制约,其自动化性能调优面临若干关键挑战,现有自动调优方法难以有效应对。本文提出 VDTuner——一种基于学习的 VDMS 自动化性能调优框架,其核心技术为多目标贝叶斯优化。VDTuner 无需任何先验知识即可高效探索复杂多维参数空间,从而克服 VDMS 面临的挑战;同时,它能在搜索速度与召回率之间实现良好平衡,输出最优配置。大量评估表明:相较于默认设置,VDTuner 可显著提升 VDMS 性能(搜索速度提升 14.12%,召回率提升 186.38%);与现有最优基线方法相比,其调优效率更高(调优时间最高加速 3.57 倍)。此外,VDTuner 可扩展至特定用户偏好及成本感知优化目标。VDTuner 的开源代码已发布于 https://github.com/tiannuo-yang/VDTuner。