Vector databases serve as the retrieval backbone of modern AI applications, yet their security remains largely unexplored. We propose the Black-Hole Attack, a poisoning attack that injects a small number of malicious vectors near the geometric center of the stored vectors. These injected vectors attract queries like a black hole and frequently appear in the top-k retrieval results for most queries. This attack is enabled by a phenomenon we term centrality-driven hubness: in high-dimensional embedding spaces, vectors near the centroid become nearest neighbors of a disproportionately large number of other vectors, while this centroid region is nearly empty in practice. The attack shows that vectors in a vector database cannot be blindly trusted: geometric defects in high-dimensional embeddings make retrieval inherently vulnerable. Our experiments show that malicious vectors appear in up to 99.85% of top-10 results. Additionally, we evaluate existing hubness mitigation methods as potential defenses against the Black-Hole Attack. The results show that these methods either significantly reduce retrieval accuracy or provide limited protection, which indicates the need for more robust defenses against the Black-Hole Attack.
翻译:向量数据库作为现代人工智能应用的检索基石,其安全性却鲜有研究。我们提出黑洞攻击——一种投毒攻击方式,通过在存储向量的几何中心附近注入少量恶意向量,使这些注入向量如同黑洞般吸引查询,并在大多数查询的前k个检索结果中频繁出现。该攻击源于我们称为中心性驱动枢纽效应现象:在高维嵌入空间中,靠近质心的向量会不成比例地成为大量其他向量的最近邻,而实践中这一质心区域几乎为空。该攻击表明,向量数据库中的向量不可盲目信任:高维嵌入的几何缺陷使得检索本身存在脆弱性。实验显示,恶意向量在99.85%的前10个检索结果中出现。此外,我们评估了现有枢纽缓解方法作为黑洞攻击潜在防御手段的效果,结果表明这些方法要么显著降低检索精度,要么提供有限的保护,这预示着需要更鲁棒的黑洞攻击防御机制。