基于GPU的混合搜索一体化图索引 (All-in-one Graph-based Indexing for Hybrid Search on GPUs)

Hybrid search has emerged as a promising paradigm that combines lexical and semantic retrieval, enhancing accuracy for applications such as recommendations, information retrieval, and Retrieval-Augmented Generation. However, existing methods are constrained by a trilemma: they sacrifice flexibility for efficiency, suffer from accuracy degradation, or incur prohibitive storage overhead for flexible combinations of retrieval paths. This paper introduces Allan-Poe, a novel all-in-one graph index accelerated by GPUs for efficient hybrid search. We first analyze the limitations of existing retrieval paradigms and extract key design principles for an effective hybrid index. Guided by the principles, we architect a unified graph-based index that flexibly integrates three retrieval paths (dense vector, sparse vector, and full-text) within a single, cohesive structure. To enable efficient construction, we design a GPU-accelerated pipeline featuring a warp-level hybrid distance kernel, RNG-IP joint pruning, and keyword-aware neighbor recycling. For query processing, we introduce a dynamic fusion framework that supports any combination of retrieval paths and weights without index reconstruction, flexibly leveraging logical structures from the knowledge graph to resolve complex multi-hop queries. Extensive experiments on 6 real-world datasets demonstrate that Allan-Poe achieves superior end-to-end query accuracy and outperforms state-of-the-art methods by 1.5x-186.4x in throughput, while significantly reducing storage overhead.

翻译：混合搜索作为一种结合词汇检索与语义检索的新兴范式，在推荐系统、信息检索和检索增强生成等应用中展现出提升准确性的潜力。然而，现有方法面临三重困境：它们为追求效率而牺牲灵活性、存在准确性下降问题，或为实现检索路径的灵活组合而承受过高的存储开销。本文提出Allan-Poe——一种基于GPU加速的新型一体化图索引，用于实现高效的混合搜索。我们首先分析了现有检索范式的局限性，并提炼出构建高效混合索引的关键设计原则。在这些原则的指导下，我们设计了一种统一的图索引结构，将稠密向量、稀疏向量和全文检索三种检索路径灵活整合在单一、连贯的架构中。为实现高效构建，我们设计了GPU加速的流水线，其核心包括线程束级混合距离计算内核、RNG-IP联合剪枝以及关键词感知的邻居回收机制。针对查询处理，我们提出了动态融合框架，该框架支持任意检索路径与权重的组合而无需重建索引，并能灵活利用知识图谱的逻辑结构解析复杂的多跳查询。在6个真实数据集上的大量实验表明，Allan-Poe在端到端查询准确性上表现优异，其吞吐量超越现有最优方法1.5–186.4倍，同时显著降低了存储开销。