Distance Comparison Operations (DCOs), which decide whether the distance between a data vector and a query is within a threshold, are a critical performance bottleneck in vector similarity search. Recent DCO methods that avoid full-dimensional distance computations promise significant speedups, but their readiness for production vector database systems remains an open question. To address this, we conduct a comprehensive benchmark of 8 DCO algorithms across 10 datasets (with up to 100M vectors and 12,288 dimensions) and diverse hardware configurations (CPUs with/without SIMD, and GPUs). Our study reveals that these methods are not silver bullets: their efficiency is highly sensitive to data dimensionality, degrades under out-of-distribution queries, and is unstable across hardware. Yet, our evaluation also demonstrates often-overlooked merits: they can accelerate index construction and data updates. Despite these benefits, their unstable performance, which can be slower than a full-dimensional scan, leads us to conclude that recent algorithmic advancements in DCO are not yet ready for production deployment.
翻译:距离比较操作(DCO)通过判断数据向量与查询之间的距离是否在阈值内,成为向量相似性搜索中的关键性能瓶颈。近期避免全维度距离计算的DCO方法虽有望大幅加速,但其是否已准备好用于生产级向量数据库系统仍是一个开放性问题。为此,我们对8种DCO算法在10个数据集(含最多1亿向量与12288维)及多种硬件配置(支持/不支持SIMD的CPU与GPU)上进行了全面基准测试。研究揭示,这些方法并非万能药:其效率对数据维度高度敏感,在分布外查询下性能衰退,且在不同硬件间表现不稳定。然而,我们的评估亦展示了其常被忽视的优势——可加速索引构建与数据更新。尽管存在这些益处,其不稳定的性能(甚至可能慢于全维度扫描)使我们得出结论:近期DCO算法的进步尚未达到生产部署的成熟度。