Deploying large language models in long-horizon, goal-oriented interactions remains challenging because similar entities and facts recur under different latent goals and constraints, causing memory systems to retrieve context-mismatched evidence. We propose STITCH (Structured Intent Tracking in Contextual History), an agentic memory system that indexes each trajectory step with a structured retrieval cue, contextual intent, and retrieves history by matching the current step's intent. Contextual intent provides compact signals that disambiguate repeated mentions and reduce interference: (1) the current latent goal defining a thematic segment, (2) the action type, and (3) the salient entity types anchoring which attributes matter. During inference, STITCH filters and prioritizes memory snippets by intent compatibility, suppressing semantically similar but context-incompatible history. For evaluation, we introduce CAME-Bench, a benchmark for context-aware retrieval in realistic, dynamic, goal-oriented trajectories. Across CAME-Bench and LongMemEval, STITCH achieves state-of-the-art performance, outperforming the strongest baseline by 35.6%, with the largest gains as trajectory length increases. Our analysis shows that intent indexing substantially reduces retrieval noise, supporting intent-aware memory for robust long-horizon reasoning.
翻译:在长期、目标导向的交互中部署大规模语言模型仍具挑战性,因为相似的实体和事实会在不同潜在目标和约束下重复出现,导致记忆系统检索到上下文不匹配的证据。我们提出STITCH(结构化上下文历史意图追踪),这是一种智能体记忆系统,通过对每个轨迹步骤建立结构化检索线索和上下文意图的索引,并通过匹配当前步骤的意图来检索历史记录。上下文意图提供紧凑的信号以消解重复提及并减少干扰:(1)定义主题分段的当前潜在目标,(2)动作类型,以及(3)锚定关键属性的显著实体类型。在推理过程中,STITCH通过意图兼容性筛选并优先排序记忆片段,抑制语义相似但上下文不兼容的历史记录。为评估模型,我们引入CAME-Bench——一个针对现实动态目标导向轨迹中上下文感知检索的基准测试。在CAME-Bench和LongMemEval上,STITCH实现了当前最优性能,超越最强基线35.6%,且性能提升随轨迹长度增加而增大。分析表明,意图索引显著降低了检索噪声,支持意图感知记忆以实现稳健的长期推理。