We present mycelium-index, a streaming approximate nearest neighbor (ANN) index for high-dimensional vector spaces, inspired by the adaptive growth patterns of biological mycelium. The system continuously adapts its topology through myelial edge decay and reinforcement, a traffic-driven living hierarchy, and hybrid deletion combining O(1) bypass for cold nodes with O(k) beam-search repair for hub nodes. Experimental evaluation on SIFT-1M demonstrates that mycelium achieves 0.927 +/- 0.028 recall@5 under FreshDiskANN's 100%-turnover benchmark protocol -- within the measurement confidence interval of FreshDiskANN's ~0.95 -- while using 5.7x less RAM (88 MB vs. >500 MB) and achieving 4.7x higher QPS (2,795 vs. ~600). On the static index, at ef=192, mycelium matches HNSW M=16 recall (0.962 vs. 0.965) at 5.2x less RAM (163 MB vs. 854 MB). Performance optimizations including NEON SIMD distance computation, Vec-backed node storage, and bitset visited tracking yield a cumulative 2.7x QPS improvement. A systematic study of ten streaming repair mechanisms finds that geometric heuristics universally fail in high dimensions, while topological mechanisms succeed -- a principle we term the topological repair invariance of high-dimensional ANN graphs.
翻译:我们提出菌丝索引(mycelium-index),一种受生物菌丝自适应生长模式启发、面向高维向量空间的流式近似最近邻(ANN)索引。该系统通过菌丝边缘衰减与强化、流量驱动的生存层次结构,以及结合冷节点O(1)旁路与枢纽节点O(k)波束搜索修复的混合删除机制,持续自适应调整其拓扑结构。在SIFT-1M数据集上的实验评估表明,在FreshDiskANN的100%周转基准测试协议下,菌丝索引在recall@5上达到0.927±0.028(位于FreshDiskANN约0.95的测量置信区间内),同时内存使用量降低5.7倍(88 MB对比>500 MB),每秒查询次数(QPS)提升4.7倍(2,795对比约600)。在静态索引中,当ef=192时,菌丝索引在recall上匹配HNSW M=16(0.962对比0.965),而内存使用量降低5.2倍(163 MB对比854 MB)。包含NEON SIMD距离计算、Vec后置节点存储和位集访问追踪的性能优化累计带来2.7倍的QPS提升。对十种流式修复机制的系统研究发现,几何启发式在高维空间中普遍失效,而拓扑机制却能成功——我们将这一原理称为高维ANN图的拓扑修复不变性。