We present HNTL (Hierarchical No-pointer Tangent-Local), the core vector indexing and candidate generation framework of the Aperon vector memory system. Proximity graphs (e.g., HNSW) incur a heavy pointer tax in memory overhead and induce irregular memory accesses that stall CPU pipelines. HNTL resolves this by partitioning the high-dimensional space into local, coherent grains, representing vectors as low-dimensional coordinates on local tangent spaces, and scanning them sequentially using a pointerless Block-SoA (Structure-of-Arrays) layout. On anisotropic manifold data (d=768, N=10,000), local PCA captures 96.3% of the variance, allowing HNTL to achieve a final Rerank Recall@10 of 1.0000 with a candidate pool size of only C=20 vectors. Hardware profiling via Apple kperf CPU Performance Monitoring Unit (PMU) counters demonstrates a 3.61x speedup (4.137 ns/vector vs. 14.951 ns/vector) for our NEON auto-vectorized C++ Block-SoA scan engine over standard pointer-chasing graph traversals, driven by a 3.59x IPC (Instructions Per Cycle) and near-zero L1/L2 data cache misses.
翻译:我们提出HNTL(分层无指针切向局部搜索),这是Aperon向量存储系统的核心向量索引与候选生成框架。邻近图(如HNSW)在内存开销上会产生沉重的指针税,并导致不规则的内存访问,从而阻塞CPU流水线。HNTL通过将高维空间划分为局部、连贯的颗粒,将向量表示为局部切空间上的低维坐标,并采用无指针Block-SoA(结构数组)布局顺序扫描这些向量来解决这一问题。在异质流形数据(d=768, N=10,000)上,局部PCA捕获了96.3%的方差,使HNTL在仅使用C=20个向量的候选池大小下,最终重排序召回率@10达到1.0000。通过Apple kperf CPU性能监测单元计数器的硬件分析表明,我们采用NEON自动向量化C++ Block-SoA扫描引擎相比标准指针追踪图遍历实现了3.61倍加速(4.137纳秒/向量对比14.951纳秒/向量),其驱动因素为3.59倍IPC(每周期指令数)和接近零的L1/L2数据缓存未命中率。