Approximate nearest neighbor (ANN) graph indices such as HNSW and Vamana construct their edge topology in full-precision or high-fidelity quantized metric spaces, relegating binary quantization (BQ) to a post-hoc distance estimator during search. We challenge this paradigm by asking: Can binary quantization build the graph, instead of merely accelerating graph search? We present QuIVer (Quantized Index for Vector Retrieval), a training-free ANN graph index that performs edge selection, pruning, and graph navigation entirely within a 2-bit Sign-Magnitude BQ metric space. QuIVer combines three mutually reinforcing mechanisms: (i) a 2-bit Sign-Magnitude encoding that preserves both sign and magnitude strength at 1/12 the memory of float32 vectors; (ii) Vamana alpha-diversity pruning executed directly on BQ distances, producing long-range navigational edges robust to quantization noise; and (iii) symmetric BQ beam search using only XOR/AND/Popcount, with a final float32 reranking step confined to a small candidate set. On MiniLM-1M (384-d), Cohere-1M (768-d), and DBpedia-OpenAI-1M (1536-d), QuIVer achieves >=91% Recall@10 at 16-39K QPS with 70-140-second construction and <0.9 GB hot memory -- outperforming hnswlib by ~16x and USearch HNSW by ~5x in throughput at comparable recall. Controlled experiments on six additional datasets -- including multimodal CLIP embeddings (RedCaps-512), word vectors (GloVe-100), CV features (SIFT-128, GIST-960), uniform random vectors, and a low-rank synthetic dataset -- precisely delineate QuIVer's applicability boundary: high recall requires cosine-native distributions with low effective dimensionality, while Vamana's graph reachability holds universally. Notably, multimodal CLIP embeddings achieve 78% recall at ef=64, revealing a continuous gradient between single-modality SOTA and non-contrastive usability.
翻译:近似最近邻(ANN)图索引(如HNSW和Vamana)在全精度或高保真量化度量空间中构建边缘拓扑结构,而二值量化(BQ)仅被用作搜索过程中的后置距离估计器。我们对该范式提出质疑:二值量化能否用于构建图结构,而非仅加速图搜索?我们提出QuIVer(面向向量检索的量化索引),一种完全在2位符号幅度二值量化度量空间内执行边缘选择、剪枝和图导航的免训练ANN图索引。QuIVer结合了三种相互增强的机制:(i) 2位符号幅度编码,在仅为float32向量1/12内存的条件下保留符号与幅度强度;(ii) 直接在BQ距离上执行的Vamana alpha多样性剪枝,生成对量化噪声鲁棒的长距离导航边缘;(iii) 仅使用XOR/AND/Popcount操作的对称BQ束搜索,并将最终float32重排序步骤限制在小型候选集内。在MiniLM-1M(384维)、Cohere-1M(768维)和DBpedia-OpenAI-1M(1536维)数据集上,QuIVer在16-39K QPS下达到≥91%的Recall@10,构建时间70-140秒,热内存<0.9 GB——在相似召回率下吞吐量比hnswlib高出约16倍,比USearch HNSW高出约5倍。在额外六个数据集(包括多模态CLIP嵌入RedCaps-512、词向量GloVe-100、计算机视觉特征SIFT-128和GIST-960、均匀随机向量及低秩合成数据集)上的受控实验精确划定了QuIVer的适用边界:高召回率需要具有低有效维度的余弦本征分布,而Vamana的图可达性具有普适性。值得注意的是,多模态CLIP嵌入在ef=64时达到78%的召回率,揭示了单模态SOTA与非对比可用性之间的连续梯度。