Deploying large language models in long-horizon, goal-oriented interactions remains challenging because similar entities and facts recur under different latent goals and constraints, causing memory systems to retrieve context-mismatched evidence. We propose STITCH (Structured Intent Tracking in Contextual History), an agentic memory system that indexes each trajectory step with a structured retrieval cue, contextual intent, and retrieves history by matching the current step's intent. Contextual intent provides compact signals that disambiguate repeated mentions and reduce interference: (1) the current latent goal defining a thematic segment, (2) the action type, and (3) the salient entity types anchoring which attributes matter. During inference, STITCH filters and prioritizes memory snippets by intent compatibility, suppressing semantically similar but context-incompatible history. For evaluation, we introduce CAME-Bench, a benchmark for context-aware retrieval in realistic, dynamic, goal-oriented trajectories. Across CAME-Bench and LongMemEval, STITCH achieves state-of-the-art performance, outperforming the strongest baseline by 35.6%, with the largest gains as trajectory length increases. Our analysis shows that intent indexing substantially reduces retrieval noise, supporting intent-aware memory for robust long-horizon reasoning.
翻译:在长期目标导向的交互中部署大语言模型仍面临挑战,因为相似的实体和事实会在不同的潜在目标与约束条件下重复出现,导致记忆系统检索到情境不匹配的证据。我们提出STITCH(情境历史中的结构化意图追踪),这是一种智能体记忆系统,它通过结构化检索线索——情境意图——对每个轨迹步骤进行索引,并通过匹配当前步骤的意图来检索历史。情境意图提供紧凑的信号,以消除重复提及的歧义并减少干扰:(1)定义主题片段的当前潜在目标,(2)动作类型,以及(3)锚定关键属性相关的显著实体类型。在推理过程中,STITCH根据意图兼容性过滤和优先排序记忆片段,抑制语义相似但情境不兼容的历史。为进行评估,我们引入了CAME-Bench,这是一个用于现实、动态、目标导向轨迹中情境感知检索的基准测试。在CAME-Bench和LongMemEval上,STITCH实现了最先进的性能,以35.6%的优势超越了最强基线,且随着轨迹长度增加,其优势最为显著。我们的分析表明,意图索引显著降低了检索噪声,为鲁棒的长期推理提供了意图感知记忆支持。