Recent conversational memory systems invest heavily in LLM-based structuring at ingestion time and learned retrieval policies at query time. We show that neither is necessary. SmartSearch retrieves from raw, unstructured conversation history using a fully deterministic pipeline: NER-weighted substring matching for recall, rule-based entity discovery for multi-hop expansion, and a CrossEncoder+ColBERT rank fusion stage -- the only learned component -- running on CPU in ~650ms. Oracle analysis on two benchmarks identifies a compilation bottleneck: retrieval recall reaches 98.6%, but without intelligent ranking only 22.5% of gold evidence survives truncation to the token budget. With score-adaptive truncation and no per-dataset tuning, SmartSearch achieves 93.5% on LoCoMo and 88.4% on LongMemEval-S, exceeding all known memory systems under the same evaluation protocol on both benchmarks while using 8.5x fewer tokens than full-context baselines.
翻译:近期对话记忆系统在摄入时大量投入基于LLM的结构化处理,并在查询时采用学习型检索策略。本文证明两者均非必需。SmartSearch通过完全确定性的流程从原始非结构化对话历史中检索:采用NER加权的子串匹配实现召回,基于规则的实体发现进行多跳扩展,以及CrossEncoder+ColBERT排序融合阶段——这是唯一的学习组件——在CPU上以约650毫秒运行。在两个基准测试上的Oracle分析揭示了编译瓶颈:检索召回率达到98.6%,但若缺乏智能排序,在截断至令牌预算后仅有22.5%的关键证据得以保留。通过分数自适应截断技术且无需针对各数据集调优,SmartSearch在LoCoMo上达到93.5%准确率,在LongMemEval-S上达到88.4%准确率,在相同评估协议下超越两个基准测试中所有已知记忆系统,同时比全上下文基线少用8.5倍令牌数。