With the growing adoption of Large Language Model (LLM) agents in persistent, real-world roles, they naturally encounter continuous streams of tasks and inevitable failures. A key limitation, however, is their inability to systematically learn from these mistakes, forcing them to repeat identical errors in similar contexts. Unlike prior training-free methods that primarily store raw instance-level experience or focus on retrieving successful trajectories, we propose Mistake Notebook Learning (MNL), a novel memory framework that enables agents to self-curate generalizable guidance from batch-clustered failures. This mechanism allows agents to distill shared error patterns into structured ``mistake notes,'' updating an external memory only when batch performance improves to ensure stability. To further amplify adaptability, we integrate MNL with test-time scaling, leveraging aggregated failure patterns to actively steer the search process away from known pitfalls. Experiments on mathematical reasoning, Text-to-SQL, and interactive agent benchmarks show that MNL achieves competitive performance compared to existing memory mechanisms and in-context methods in both effectiveness and efficiency. These findings position structured mistake abstraction as a critical lever for robust agent evolution, enabling continuous improvement without the cost of parameter updates.
翻译:随着大型语言模型(LLM)智能体在持久性现实场景中的日益广泛应用,它们不可避免地会面临连续的任务流和必然的失败。然而,一个关键局限在于它们无法系统性地从错误中学习,导致在相似情境下重复犯相同错误。与以往主要存储原始实例级经验或专注于检索成功轨迹的无训练方法不同,我们提出错误笔记本学习(MNL),这是一种新颖的记忆框架,使智能体能够从批量聚类的失败中自我提炼可泛化的指导。该机制允许智能体将共享的错误模式提炼为结构化的“错误笔记”,仅当批量性能提升时才更新外部记忆以确保稳定性。为进一步增强适应性,我们将MNL与测试时扩展技术结合,利用聚合的失败模式主动引导搜索过程避开已知陷阱。在数学推理、Text-to-SQL和交互式智能体基准测试上的实验表明,MNL在效果和效率方面均达到与现有记忆机制及上下文学习方法相当的性能。这些发现表明,结构化的错误抽象是实现智能体稳健进化的关键杠杆,使其能够在无需参数更新的情况下持续改进。