Vectors are universal mathematical objects that can represent text, images, speech, or a mix of these data modalities. That happens regardless of whether data is represented by hand-crafted features or learnt embeddings. Collect a large enough quantity of such vectors and the question of retrieval becomes urgently relevant: Finding vectors that are more similar to a query vector. This monograph is concerned with the question above and covers fundamental concepts along with advanced data structures and algorithms for vector retrieval. In doing so, it recaps this fascinating topic and lowers barriers of entry into this rich area of research.
翻译:向量是一种通用的数学对象,能够表示文本、图像、语音或这些数据模态的混合形式。无论数据是通过人工设计的特征还是通过学习的嵌入表示,这一性质均成立。当收集到足够数量的此类向量时,检索问题便变得尤为关键:即寻找与查询向量更相似的向量。本专著聚焦于上述问题,涵盖了向量检索的基本概念以及高级数据结构与算法。通过系统梳理这一引人入胜的课题,本书旨在降低进入这一丰富研究领域的门槛。