QuIVer: Rethinking ANN Graph Topology via Training-Free Binary Quantization

Approximate nearest neighbor (ANN) graph indices such as HNSW and Vamana construct their edge topology in full-precision or high-fidelity quantized metric spaces, relegating binary quantization (BQ) to a post-hoc distance estimator during search. We challenge this paradigm by asking: Can binary quantization build the graph, instead of merely accelerating graph search? We present QuIVer (Quantized Index for Vector Retrieval), a training-free ANN graph index that performs edge selection, pruning, and graph navigation entirely within a 2-bit Sign-Magnitude BQ metric space. QuIVer combines three mutually reinforcing mechanisms: (i) a 2-bit Sign-Magnitude encoding that preserves both sign and magnitude strength at 1/12 the memory of float32 vectors; (ii) Vamana alpha-diversity pruning executed directly on BQ distances, producing long-range navigational edges robust to quantization noise; and (iii) symmetric BQ beam search using only XOR/AND/Popcount, with a final float32 reranking step confined to a small candidate set. On MiniLM-1M (384-d), Cohere-1M (768-d), and DBpedia-OpenAI-1M (1536-d), QuIVer achieves >=91% Recall@10 at 16-39K QPS with 70-140-second construction and <0.9 GB hot memory -- outperforming hnswlib by ~16x and USearch HNSW by ~5x in throughput at comparable recall. Controlled experiments on six additional datasets -- including multimodal CLIP embeddings (RedCaps-512), word vectors (GloVe-100), CV features (SIFT-128, GIST-960), uniform random vectors, and a low-rank synthetic dataset -- precisely delineate QuIVer's applicability boundary: high recall requires cosine-native distributions with low effective dimensionality, while Vamana's graph reachability holds universally. Notably, multimodal CLIP embeddings achieve 78% recall at ef=64, revealing a continuous gradient between single-modality SOTA and non-contrastive usability.

翻译：近似最近邻（ANN）图索引（如HNSW和Vamana）在全精度或高保真量化度量空间中构建边缘拓扑结构，而二值量化（BQ）仅被用作搜索过程中的后置距离估计器。我们对该范式提出质疑：二值量化能否用于构建图结构，而非仅加速图搜索？我们提出QuIVer（面向向量检索的量化索引），一种完全在2位符号幅度二值量化度量空间内执行边缘选择、剪枝和图导航的免训练ANN图索引。QuIVer结合了三种相互增强的机制：(i) 2位符号幅度编码，在仅为float32向量1/12内存的条件下保留符号与幅度强度；(ii) 直接在BQ距离上执行的Vamana alpha多样性剪枝，生成对量化噪声鲁棒的长距离导航边缘；(iii) 仅使用XOR/AND/Popcount操作的对称BQ束搜索，并将最终float32重排序步骤限制在小型候选集内。在MiniLM-1M（384维）、Cohere-1M（768维）和DBpedia-OpenAI-1M（1536维）数据集上，QuIVer在16-39K QPS下达到≥91%的Recall@10，构建时间70-140秒，热内存<0.9 GB——在相似召回率下吞吐量比hnswlib高出约16倍，比USearch HNSW高出约5倍。在额外六个数据集（包括多模态CLIP嵌入RedCaps-512、词向量GloVe-100、计算机视觉特征SIFT-128和GIST-960、均匀随机向量及低秩合成数据集）上的受控实验精确划定了QuIVer的适用边界：高召回率需要具有低有效维度的余弦本征分布，而Vamana的图可达性具有普适性。值得注意的是，多模态CLIP嵌入在ef=64时达到78%的召回率，揭示了单模态SOTA与非对比可用性之间的连续梯度。