ConvMemory v2: A Recall-Preserving Top-10 Evidence Reranker for Conversational Memory Retrieval

from arxiv, 19 pages, 3 figures. Single-author technical report. Extends arXiv:2605.28062 (ConvMemory v1). Code and checkpoint: github.com/pth2002/ConvMemory

We describe ConvMemory v2, an opt-in token-evidence reranker that sits after the lightweight ConvMemory v1 reranker and reorders only v1's protected top-10 candidate set. v2 is a fine-tuned ms-marco-MiniLM-L-6-v2 cross-encoder (22,713,601 parameters, measured from the released checkpoint) applied to the ten (query, memory) pairs that v1 has already selected; it does not change which ten memories are returned, so Recall@10 and Hit@10 are identical to v1 by construction, not by statistical coincidence. On the LoCoMo conversational memory benchmark (5 seeds, n = 4955 test rows), v2 raises FULL MRR from v1's 0.5824 to 0.6560 (paired bootstrap +0.0734, 95% CI [+0.0645, +0.0827]) and H@1 from 0.4440 to 0.5474. v2 closes most but not all of the gap to a much more expensive full-pool cross-encoder reference (mxbai-rerank-large-v1 over the top-500, MRR 0.6688): on FULL MRR v2 sits 0.013 below mxbai_top500, but on two raw-dense-hard slices (where v1's protected top-10 has higher recall than mxbai's own top-10) v2 exceeds mxbai_top500. A four-arm load-bearing ablation shows candidate-specific memory text is the mechanism: removing, shuffling, or replacing it collapses MRR below raw dense retrieval. v2 is best understood as a standard recall-preserving cascade pattern with LoCoMo-specific fine-tuning, an explicit anti-shortcut inference contract, and disciplined load-bearing analysis; its advantage over mxbai is slice-specific rather than a general dominance claim. This report extends the v1 technical report (arXiv:2605.28062).

翻译：本文描述ConvMemory v2，一种可选标记证据重排序器，位于轻量级ConvMemory v1重排序器之后，仅对v1保护的Top-10候选集进行重新排序。v2是基于ms-marco-MiniLM-L-6-v2微调的双向编码器交叉编码器（参数规模22,713,601，以发布检查点计），应用于v1已选出的十组（查询，记忆）对；其不改变返回的十个记忆，故保留召回率Recall@10与命中率Hit@10在结构上（而非统计偶然）与v1完全一致。在LoCoMo对话记忆基准测试（5个随机种子，n=4955个测试行）中，v2将FULL MRR从v1的0.5824提升至0.6560（配对自助法提升+0.0734，95%置信区间[+0.0645, +0.0827]），H@1从0.4440提升至0.5474。v2虽缩小了与计算成本更高的全池交叉编码器参考模型（基于Top-500的mxbai-rerank-large-v1，MRR 0.6688）的大部分差距，但尚存部分鸿沟：在FULL MRR上v2比mxbai_top500低0.013，但在两个原始稠密困难切片（v1保护的Top-10召回率高于mxbai自身Top-10）上，v2超越mxbai_top500。四项负载消融实验表明，候选特定的记忆文本是其核心机制：移除、打乱或替换该文本将导致MRR崩溃至原始稠密检索水平。v2应理解为一种标准保留召回级联模式，结合LoCoMo特定微调、显式反捷径推理约束及严谨负载分析；其相对于mxbai的优势具有切片特异性，而非普适性优越性声明。本报告是对v1技术报告（arXiv:2605.28062）的扩展。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

【RecSys22教程】多阶段推荐系统的神经重排序，90页ppt

专知会员服务

27+阅读 · 2022年9月30日

【ICML2022】Branchformer:并行MLP-Attention架构，捕捉局部和全局上下文，用于语音识别和理解

专知会员服务

25+阅读 · 2022年7月8日

【NeurIPS2021】ResT:一个有效的视觉识别转换器

专知会员服务

23+阅读 · 2021年10月25日

【ICCV2021】无需检测器提取特征！LeCun团队提出MDETR：实现真正的端到端多模态推理

专知会员服务

19+阅读 · 2021年7月29日