Managing large-scale vector datasets with disk-resident graph approximate nearest neighbor search (ANNS) systems incurs substantial storage overhead due to the co-location of vector data and auxiliary index metadata, which prevents the storage layer from exploiting their distinct compressibility. We present COMPASS, a component-aware compressed storage framework for disk-resident graph vector search. Leveraging data-index decoupling as a foundation, COMPASS losslessly compresses each component according to its distinct compressibility characteristics, thereby significantly reducing storage space. It further adapts the search and update paths to preserve their performance under compressed storage layouts. Evaluation on real-world public and proprietary billion-scale datasets shows that COMPASS reduces storage space by up to 58.7%, while delivering improved or competitive search and update performance compared to state-of-the-art disk-resident graph ANNS systems.
翻译:管理大规模向量数据集时,基于磁盘的图近似最近邻搜索系统因向量数据与辅助索引元数据的共置(co-location)而导致显著存储开销,这阻碍了存储层利用二者各自的压缩特性。我们提出了COMPASS——一种面向磁盘驻留图向量搜索的组件感知压缩存储框架。COMPASS以数据-索引解耦为基础,根据不同组件的压缩特性对其进行无损压缩,从而显著降低存储空间占用。该框架进一步适配了搜索与更新路径,以在压缩存储布局下保持其性能。在真实世界的公共与专有十亿级数据集上的评估表明,COMPASS最多可减少58.7%的存储空间,同时相较于最先进的磁盘驻留图ANNS系统,其搜索与更新性能相当或更优。