We study deterministic top-k retrieval under Longest Common Prefix (LCP) similarity for N sequences of length L. We prove a tight Omega(N) space lower bound (cell-probe model) and present a trie-based index using O(N*L) space with O(L+k) query time. We contrast this with pairwise materialization (Theta(N^2)), which hits a practical OOM wall at scale, while our indexed approach remains O(N) in memory. We then introduce Thermal-Aware Logic (TAL), which turns prefix structure into range-bounded scans. In hardware measurements, TAL reduces energy per query by 308x (0.0145 J vs 4.46 J) and cuts p95 latency by 329x (0.114 ms vs 37.5 ms) on a 20M-item range-scan benchmark, while sustaining near-peak utilization (~99%) under long runs. The result is a deterministic retrieval primitive with receipts in regimes where approximate methods are unacceptable.
翻译:本文研究在最长公共前缀相似度下对N个长度为L的序列进行确定性top-k检索的问题。我们证明了紧致的Omega(N)空间下界(细胞探测模型),并提出了一种基于字典树的索引结构,该结构使用O(N*L)空间实现O(L+k)查询时间。与此相比,成对物化方法需要Theta(N^2)空间,在实际大规模应用中会遭遇内存不足的瓶颈,而我们的索引方法在内存上保持O(N)复杂度。随后,我们提出了热感知逻辑,该技术将前缀结构转化为范围受限扫描。在硬件实测中,在2000万条目的范围扫描基准测试上,TAL将单查询能耗降低308倍(0.0145 J对比4.46 J),并将p95延迟降低329倍(0.114 ms对比37.5 ms),同时在长时间运行中保持接近峰值的利用率(约99%)。最终我们获得了一种确定性检索原语,在近似方法不可接受的场景中提供了可验证的检索保障。