As high-dimensional vector data increasingly surpasses the processing capabilities of traditional database management systems, Vector Databases (VDBs) have emerged and become tightly integrated with large language models, being widely applied in modern artificial intelligence systems. However, existing research has primarily focused on underlying technologies such as approximate nearest neighbor search, with relatively few studies providing a systematic architectural-level review of VDBs or analyzing how these core technologies collectively support the overall capacity of VDBs. This survey aims to offer a comprehensive overview of the core designs and algorithms of VDBs, establishing a holistic understanding of this rapidly evolving field. First, we systematically review the key technologies and design principles of VDBs from the two core dimensions of storage and retrieval, tracing their technological evolution. Next, we conduct an in-depth comparison of several mainstream VDB architectures, summarizing their strengths, limitations, and typical application scenarios. Finally, we explore emerging directions for integrating VDBs with large language models, including open research challenges and trends such as novel indexing strategies. This survey serves as a systematic reference guide for researchers and practitioners, helping readers quickly grasp the technological landscape and development trends in the field of vector databases, and promoting further innovation in both theoretical and applied aspects.
翻译:随着高维向量数据日益超越传统数据库管理系统的处理能力,向量数据库(VDBs)应运而生,并与大型语言模型紧密结合,广泛应用于现代人工智能系统中。然而,现有研究主要聚焦于近似最近邻搜索等底层技术,鲜有从系统架构层面对向量数据库进行系统性综述,或深入分析这些核心技术如何协同支撑向量数据库的整体能力。本综述旨在全面概述向量数据库的核心设计与算法,建立对该快速演进领域的整体认知。首先,我们从存储与检索两个核心维度系统梳理了向量数据库的关键技术与设计原理,追踪其技术演进历程。其次,我们对多种主流向量数据库架构进行了深入对比,总结了各自的优势、局限及典型应用场景。最后,我们探讨了向量数据库与大型语言模型集成的新兴方向,包括新型索引策略等开放研究挑战与趋势。本综述为研究人员和从业者提供了系统性参考指南,帮助读者快速把握向量数据库领域的技术格局与发展趋势,并推动理论与应用层面的进一步创新。