Vector search plays a crucial role in many real-world applications. In addition to single-vector search, multi-vector search becomes important for multi-modal and multi-feature scenarios today. In a multi-vector database, each row is an item, each column represents a feature of items, and each cell is a high-dimensional vector. In multi-vector databases, the choice of indexes can have a significant impact on performance. Although index tuning for relational databases has been extensively studied, index tuning for multi-vector search remains unclear and challenging. In this paper, we define multi-vector search index tuning and propose a framework to solve it. Specifically, given a multi-vector search workload, we develop algorithms to find indexes that minimize latency and meet storage and recall constraints. Compared to the baseline, our latency achieves 2.1X to 8.3X speedup.
翻译:向量搜索在众多实际应用中扮演着关键角色。除了单向量搜索外,在当前多模态和多特征场景中,多向量搜索变得尤为重要。在多向量数据库中,每一行代表一个项目,每一列对应项目的某一特征,每个单元格则是一个高维向量。在多向量数据库中,索引的选择会对性能产生显著影响。尽管关系数据库的索引调优已被广泛研究,但面向多向量搜索的索引调优仍不明确且充满挑战。本文定义了多向量搜索索引调优问题,并提出了一个相应框架。具体而言,针对给定的多向量搜索工作负载,我们开发了相关算法来寻找能够最小化延迟并满足存储与召回约束的索引方案。与基线相比,我们的方案实现了2.1倍至8.3倍的延迟加速。