Vector search plays a crucial role in many real-world applications. In addition to single-vector search, multi-vector search becomes important for multi-modal and multi-feature scenarios today. In a multi-vector database, each row is an item, each column represents a feature of items, and each cell is a high-dimensional vector. In multi-vector databases, the choice of indexes can have a significant impact on performance. Although index tuning for relational databases has been extensively studied, index tuning for multi-vector search remains unclear and challenging. In this paper, we define multi-vector search index tuning and propose a framework to solve it. Specifically, given a multi-vector search workload, we develop algorithms to find indexes that minimize latency and meet storage and recall constraints. Compared to the baseline, our latency achieves 2.1X to 8.3X speedup.
翻译:向量搜索在许多实际应用中扮演着关键角色。除了单向量搜索外,多向量搜索在当前多模态和多特征场景中变得日益重要。在多向量数据库中,每一行为一个条目,每一列代表条目的一个特征,每个单元格为一个高维向量。在多向量数据库中,索引的选择对性能具有显著影响。尽管关系数据库的索引调优已被广泛研究,但多向量搜索的索引调优仍不明确且具挑战性。本文定义了多向量搜索索引调优问题,并提出一个解决框架。具体而言,针对给定的多向量搜索工作负载,我们开发了算法来寻找能最小化延迟并满足存储与召回率约束的索引方案。与基线相比,我们的延迟实现了2.1倍至8.3倍的加速。