Applications increasingly leverage mixed-modality data, and must jointly search over vector data, such as embedded images, text and video, as well as structured data, such as attributes and keywords. Proposed methods for this hybrid search setting either suffer from poor performance or support a severely restricted set of search predicates (e.g., only small sets of equality predicates), making them impractical for many applications. To address this, we present ACORN, an approach for performant and predicate-agnostic hybrid search. ACORN builds on Hierarchical Navigable Small Worlds (HNSW), a state-of-the-art graph-based approximate nearest neighbor index, and can be implemented efficiently by extending existing HNSW libraries. ACORN introduces the idea of predicate subgraph traversal to emulate a theoretically ideal, but impractical, hybrid search strategy. ACORN's predicate-agnostic construction algorithm is designed to enable this effective search strategy, while supporting a wide array of predicate sets and query semantics. We systematically evaluate ACORN on both prior benchmark datasets, with simple, low-cardinality predicate sets, and complex multi-modal datasets not supported by prior methods. We show that ACORN achieves state-of-the-art performance on all datasets, outperforming prior methods with 2-1,000x higher throughput at a fixed recall.
翻译:应用系统日益依赖混合模态数据,需同时检索图像、文本、视频等嵌入向量数据及属性、关键词等结构化数据。现有混合搜索方法或性能低下,或仅支持严重受限的搜索谓词集(如仅支持小规模等值谓词),难以适用于众多应用场景。为此,我们提出ACORN——一种高性能、谓词无关的混合搜索方法。ACORN基于分层可导航小世界图(HNSW)这一前沿的图结构近似最近邻索引技术,可通过扩展现有HNSW库高效实现。其核心创新在于引入谓词子图遍历机制,模拟了理论上最优但实际不可行的理想混合搜索策略。ACORN的谓词无关构建算法专门设计以实现该高效搜索策略,同时支持广泛的谓词集合与查询语义。我们系统评估了ACORN在两类基准数据集上的表现:一是现有基准中包含简单低基数谓词集的数据集,二是现有方法无法处理的复杂多模态数据集。实验表明,ACORN在所有数据集上均达到最先进性能,在固定召回率下吞吐量较现有方法提升2-1000倍。