We describe ConvMemory v2, an opt-in token-evidence reranker that sits after the lightweight ConvMemory v1 reranker and reorders only v1's protected top-10 candidate set. v2 is a fine-tuned ms-marco-MiniLM-L-6-v2 cross-encoder (22,713,601 parameters, measured from the released checkpoint) applied to the ten (query, memory) pairs that v1 has already selected; it does not change which ten memories are returned, so Recall@10 and Hit@10 are identical to v1 by construction, not by statistical coincidence. On the LoCoMo conversational memory benchmark (5 seeds, n = 4955 test rows), v2 raises FULL MRR from v1's 0.5824 to 0.6560 (paired bootstrap +0.0734, 95% CI [+0.0645, +0.0827]) and H@1 from 0.4440 to 0.5474. v2 closes most but not all of the gap to a much more expensive full-pool cross-encoder reference (mxbai-rerank-large-v1 over the top-500, MRR 0.6688): on FULL MRR v2 sits 0.013 below mxbai_top500, but on two raw-dense-hard slices (where v1's protected top-10 has higher recall than mxbai's own top-10) v2 exceeds mxbai_top500. A four-arm load-bearing ablation shows candidate-specific memory text is the mechanism: removing, shuffling, or replacing it collapses MRR below raw dense retrieval. v2 is best understood as a standard recall-preserving cascade pattern with LoCoMo-specific fine-tuning, an explicit anti-shortcut inference contract, and disciplined load-bearing analysis; its advantage over mxbai is slice-specific rather than a general dominance claim. This report extends the v1 technical report (arXiv:2605.28062).
翻译:本文描述ConvMemory v2,一种可选标记证据重排序器,位于轻量级ConvMemory v1重排序器之后,仅对v1保护的Top-10候选集进行重新排序。v2是基于ms-marco-MiniLM-L-6-v2微调的双向编码器交叉编码器(参数规模22,713,601,以发布检查点计),应用于v1已选出的十组(查询,记忆)对;其不改变返回的十个记忆,故保留召回率Recall@10与命中率Hit@10在结构上(而非统计偶然)与v1完全一致。在LoCoMo对话记忆基准测试(5个随机种子,n=4955个测试行)中,v2将FULL MRR从v1的0.5824提升至0.6560(配对自助法提升+0.0734,95%置信区间[+0.0645, +0.0827]),H@1从0.4440提升至0.5474。v2虽缩小了与计算成本更高的全池交叉编码器参考模型(基于Top-500的mxbai-rerank-large-v1,MRR 0.6688)的大部分差距,但尚存部分鸿沟:在FULL MRR上v2比mxbai_top500低0.013,但在两个原始稠密困难切片(v1保护的Top-10召回率高于mxbai自身Top-10)上,v2超越mxbai_top500。四项负载消融实验表明,候选特定的记忆文本是其核心机制:移除、打乱或替换该文本将导致MRR崩溃至原始稠密检索水平。v2应理解为一种标准保留召回级联模式,结合LoCoMo特定微调、显式反捷径推理约束及严谨负载分析;其相对于mxbai的优势具有切片特异性,而非普适性优越性声明。本报告是对v1技术报告(arXiv:2605.28062)的扩展。